CHANGELOG

what's changed.

every shipped update, in reverse chronological order. bug fixes listed next to features because that's how the work actually went.

full disclosure: some of this codebase was co-written with an llm, which you can tell because nearly every "ship feature" entry below is followed within hours by a "patch n bugs the review missed" entry. apparently anthropic ships the same way.

    • onboarding got a full redesign. six steps instead of three: welcome, permissions, mic check, hotkey check, practice, done. the mic check has a live level meter so you can verify your input actually works before your first real dictation. the hotkey check shows a keycap that depresses when you hold your hotkey, so you can see the global listener is hearing you. the practice step asks you to dictate a sentence into textedit and confirm the text actually landed - catches the "everything looks fine but nothing transcribes" path that the old flow walked right past.
    • "update ready" now appears inline in the menu bar when a new version is available. one click to restart and install, no separate dialog popping in front of your work.
    • microphone submenu in the menu bar lets you swap input device without opening preferences. the built-in or wired-input mic gets tagged as recommended-for-speech, the bluetooth ones get tagged as may-reduce-accuracy.
    • "paste last transcript" menu item with a one-line preview of whats on deck. handy when you dictated something useful, switched windows, and want to drop it again somewhere else.
    • "reset & restart" diagnostic in preferences. three scopes: clear preferences only, clear preferences and license, or wipe everything. for when something gets into a stuck state and you want a clean slate without reinstalling.
    • the app now refuses to launch from anywhere other than /applications. used to be possible to run it from downloads or the desktop where various macos security paths behave oddly; if its not in the right place we now offer to move it and relaunch automatically.
    • sensitive on-disk data uses the data-protection keychain api now. fewer login-keychain prompts on first run, and the encrypted blobs stay encrypted across user-switch boundaries on shared macs.
    • a couple of polish items: license tab placeholder text actually shows grayed example text now (used to render the literal underscores), and the trigger words list collapses into per-word disclosure groups for easier editing of long lists - with inline-editable headers so you can rename without delete-and-readd.
    • new model recommendation engine picks the best whisperkit variant for your mac and your primary dictation language. the models tab shows a recommended badge next to the pick that hits the right accuracy / speed / ram tradeoff for your chip - and downgrades that recommendation gracefully on older silicon or when youre dictating in a language the smaller english-tuned models would butcher.
    • lots of new whisperkit variants in the picker. distil-whisper line, quantized large flavors, and smaller quantized small models. mostly aimed at people who want to trade accuracy for download size or runtime ram.
    • end of dictation no longer clips the last syllable. used to drop the trailing beat on words like "remember" or "later" if you released the hotkey right as you finished speaking; we now capture a beat past the release so trailing speech still lands.
    • audio mute now follows the system default output across mid-recording switches. unplug headphones halfway through a dictation and the mute tracks the new output device, instead of leaving the old one wrongly muted (or worse, the new speakers loud while youre mid-sentence).
    • apple speech engine surfaces a real "dictation isnt enabled in system settings" error instead of silently returning nothing. the silent-fail path was confusing - why does the mic indicator light up but nothing appears? now you know.
    • app-not-running case shows a notch toast: "textedit isnt running" or whatever the slot's target is. used to be that the dictation would happen, the text would go nowhere, and you'd be left wondering. now you know up front instead of after the fact.
    • whisperkit picker now lists ten models instead of five. tiny / base / small / medium in english-only and multilingual flavors, plus large v3 and large v3 turbo. turbo is the new recommended default: near-large accuracy at roughly four times the speed on apple silicon.
    • new "silence detection" slider under general settings → audio. controls how long a pause inside a dictation gets preserved before its treated as dead air. default 1.0 second, range 0.78 to 2.0. lower it if transcripts feel like they have unnecessary long gaps; raise it if punctuation feels running-on.
    • mic test widget in general settings → microphone. pick your input device, watch the live level meter, tap record-and-playback to hear exactly what the recognizer is receiving. cuts down on the "why did it think i said that" troubleshooting.
    • optional engine prewarm toggle in transcription settings. loads the model into memory at app launch instead of on your first hotkey press. costs a bit of ram, eliminates the warm-up pause on the first dictation of the session.
    • new app icon, refreshed for the macos 26 liquid glass aesthetic.
    • no more "heyclanker wants to use your keychain" password prompt at first launch.
    • preferences survive app updates more reliably. adding new settings in a future release wont quietly reset your existing tweaks to defaults.
    • last word of long dictations doesnt get clipped anymore. were measuring how often the audio callback actually fires and waiting just long enough on release for the trailing samples to arrive before tearing the recorder down. clamps between 20-80ms so it never feels sluggish.
    • screen no longer dims during long dictations. used to interrupt multi-paragraph notes if your display sleep timer was short; we hold a polite system assertion while youre recording. display only - the mac itself can still sleep when idle.
    • bluetooth headsets and some usb mics occasionally wake up with their input muted at the device level (separate from system input volume - yes its a different flag). we now check + clear that before each recording so you dont talk into the void.
    • first recording after waking your mac is snappier. used to potentially stall while tearing down a stale audio unit pointing at a now-removed device; we drop the stale state on wake so the rebuild is clean.
    • grant accessibility in system settings, tab back to heyclanker, and the hotkey works immediately. used to require an app restart because the cgevent tap had given up retrying after its 2-second window.
    • live preview no longer wipes itself when you pause to think. the system speech recognizer occasionally re-anchors its hypothesis on long silences; we now detect that and keep your accumulated text visible across the gap.
    • voice escape hatch: say "force quit now" and the app exits cleanly. useful when accessibility gets revoked mid-session, the hotkey path wedges, or you otherwise need the app gone without opening activity monitor. (the trigger phrase has to actually be transcribable by the speech engine - which is harder than it sounds.)
    • paste reliability into panel apps (alfred, raycast, intellibar) - the kind that read the clipboard asynchronously after cmd+v. we now wait for the system to actually commit the write before pasting, and give the target app a longer window to read before we restore your original clipboard.
    • live preview on the high-accuracy engines uses streaming speech recognition under the hood now, instead of re-running the heavyweight engine on a growing snapshot every few hundred ms. way smoother, much less wobble. final transcript still goes through your chosen engine on the complete audio.
    • long whisperkit recordings (over ~30 seconds) used to silently drop earlier text. turns out the internal segmenting was returning only the last segment when fed long buffers; we now chunk at 25 seconds and stitch ourselves. a 90-second dictation actually transcribes the whole 90 seconds now.
    • apple speech engine also chunks long recordings (50s windows) so you can dictate as long as you want without hitting the per-task cap. previous limit was effectively a minute of speech.
    • whisperkit kv-cache overflow on long recordings: fixed by disabling its prefill-prompt carry-over. the cache was growing across segments and silently truncating earlier audio.
    • qwen3-asr max output bumped to 4096 tokens (was 512). multi-paragraph dictations stopped getting cut off mid-sentence.
    • live preview ticker rendering: words now flow in a single horizontal line, fade off the left as new ones come in. used to be a multi-line block that jumped around as the model revised its hypothesis.
    • heyclanker now learns from your inline corrections. paste a transcript, fix one or two words, and the next time we hear the same wrong→right pair, we apply it automatically. captures 1-2 word swaps only (no entire sentences), works across windows + after you keep typing in the same field.
    • two new transcription engines: whisper "tiny.en" (english-only, ~75mb, fast) and qwen3-asr (50+ languages, on-device). the engine picker in preferences now has more options to suit different size/speed/accuracy tradeoffs.
    • three review-pass fixes that snuck in with correction learning: ranking of learned corrections (by occurrence count not just recency), the toggle in preferences actually disables learning when off, and the AX read for the post-paste baseline waits the right amount of time before reading.
    • cross-monitor capture for correction learning. used to lose track of the field if you alt-tabbed between the paste and the fix; now follows.
    • sidebar selection in the slots panel persists across tab switches.
    • license tab: clearer wording around what counts as "this mac" vs "another mac" for the 2-mac limit.
    • press your hotkey, fumble a modifier mid-hold, and we no longer treat that as a deliberate recording start. the hold timer now re-checks modifiers when it fires; a brief slip while youre still aiming for the right combo doesnt commit you to a recording.
    • trigger-word "execute" (or whatever you set as your submit trigger) actually presses Return now in TUI apps - claude code, REPLs, vim command line, etc. used to inject the literal newline character which TUIs interpret as Shift+Enter (a literal newline, not a submit). now sends the actual return keypress event.
    • rapid tap-tap of your hotkey no longer accidentally triggers the hold-to-record path. used to be possible to land both events inside the hold window if your tapping was unlucky. each fresh keydown now bumps a generation counter that the timer checks before firing.
    • system audio gets muted automatically while you're recording, so the music you forgot was playing or that one slack notification doesn't leak into the transcript. comes back at the original level when you release. on by default.
    • skips the mute when you're wearing headphones (built-in jack, airpods, any bluetooth) - no point muting if the audio isn't reaching your mic anyway. theres a setting to override if you want it muted regardless.
    • crash-safe: if the app dies mid-record, next launch puts your volume back. but only if its still at zero - if you already manually fixed it in the meantime, we leave your fix alone.
    • esc while recording stops leaking through to the active app. vim users were popping out of insert mode every time they cancelled a dictation. now its actively intercepted by the hotkey tap, only during a recording window.
    • live transcript on the high-accuracy engine appears in ~2 seconds instead of 13. was waiting for the first big sliding window to fill before showing anything - now it streams as soon as it has something to say. not a new feature, just unbreaking the one we already shipped.
    • engine settings show a progress bar under the model picker when something is downloading or loading. before this, switching models just sat there silently while half a gigabyte came down in the background.
    • bumped the helper text in settings from "squinting required" to "actually readable". apologies to anyone with good eyesight whos been wondering why we wrote captions in 11pt.
    • killed a system-level freeze where typing into a slow target app could pin your whole mac for ~24 seconds. now capped at half a second per attempt. yeah that one was rough.
    • removing hotkeys or trigger words used to crash. it doesn't anymore. you can also clear the lists down to zero entries now.
    • Cmd+Q actually cleans up after itself. no zombie state hanging around for the next launch.
    • menu bar preview is one line now, not a wall of your last dictation. nice for privacy too - no shoulder surfing.
    • fresh installs come with a default hotkey slot ready to use. and if you delete it on purpose, it stays deleted. no more "helpful" resurrection.
    • Esc actually stops the in-flight cleanup work now, instead of just hiding the result. was burning api calls in the background, oops.
    • renamed some settings keys under the hood. three startup warnings are gone with them.
    • Esc cancels recording mid-flight. works during the warm-up gap before recording starts, mid-record, and during the cleanup step right after release. no stray text gets injected.
    • recording indicator finally shows up on the screen you're actually looking at. before this it just slapped onto the primary display every time. (the whole point of this app is dictating from your gaming monitor into a terminal on the other - so yeah, definately needed.)
    • new installs use your mac's built-in engine by default. zero download, you can dictate your first sentence in the time it takes to read this. onboarding tells you how to upgrade later if you want.
    • live preview and final transcription share the model in memory now. used to load it twice on cold start - half the ram.
    • tightened the streaming confirmation window from 10s to 3s. side effect: short dictations (the common case) actually trigger the vocabulary boost now. it was kinda dead code for default users before, embarrassing.
    • press your hotkey while the engine is still loading? it tells you. used to just silently do nothing.
    • fat-fingering your hotkey for 200ms doesn't pop a scary error toast anymore. just nothing happens, like it should.
    • switching engines mid-recording used to park the app in a "transcription in progress" state forever. fixed.
    • onboarding asks for the speech recognition permission now (the new default engine needs it).
    • if you picked the high-accuracy engine before this update, it stays selected on update. no surprise re-downloads on launch.
    • added a new high-accuracy on-device speech engine. three flavors: multilingual (25 european languages), english-only, and a smaller fast one for english dictation.
    • live transcript shows up as you speak now. and importantly: the live preview matches what you actually get on release. used to use a different engine for live than for final, which was confusing.
    • models tab in settings: see what's on disk, total storage, download or delete per model. delete is blocked while a model is in use - the underlying ml runtime does not handle the file disappearing under it gracefully.
    • custom vocabulary lets you teach the recognizer your jargon, names, commands. there's a status indicator that tells you whether boosting is active or still waiting on the boost models to download.
    • recording indicator lives in the notch on m3/m4 macbooks. on everything else it's a small floating pill near the menu bar. way less obtrusive than the centered window we used to show.
    • auto-updates with proper signature verification (yes the kind that doesn't trust whatever bytes show up).
    • trial / license system. 14 days free, $2.99 once. revalidates every 7 days online with a 30-day offline grace period for when your wifi is being weird. clock rollback protection so changing your mac's date can't resurrect an expired trial.
    • trial state stored across two seperate slots so wiping one doesn't reset the timer.
    • optional launch at login.
    • dedicated about window with credits to the open-source libraries that ship inside the app.
    • real keychain integration for sensitive data. no more constant "allow access" dialogs in normal use.
    • cleaner key recorder ui - hover state, focus ring, accessibility labels.
    • recording starts faster on bluetooth and slow usb mics now (audio setup moved off the main thread).
    • paste fallback re-checks the focused app right before posting Cmd+V, so text never lands in the wrong window if focus changes mid-paste.
    • swept em dashes for hyphens across the codebase. minor but the inconsistency was bugging me.
    • three-step onboarding (welcome → permissions → all set) runs on launch when something's missing. auto-advances when you grant.
    • revoke a permission later and onboarding pops back open instead of just silently failing your next dictation.
    • microphone access requests are clearer and survive denials without leaving the app stuck.
    • audio recovery when you switch input devices mid-session. unplug a headset, swap to bluetooth, change mic input - no relaunch needed.
    • recording start reports actual success / failure to the ui instead of pretending everything's fine while capturing silence.
    • sensitive local data is encrypted at rest now, key derived from your machine.
    • fixed a race where two settings saves close together could lose one of them.
    • empty trigger words are ignored. they used to match every single transcription and inject random characters. fun bug.
    • icon-only buttons got accessibility labels so voiceover users can actually navigate the settings.
    • hotkey sidebar selection persists across tab switches now.
    • killed a class of bugs that could freeze the whole system for ~30 seconds during heavy dictation. yeah, the whole system, not just the app.
    • the optional cleanup step has a 5s safety timeout now. slow networks dont hang your transcription anymore.
    • text injection is more reliable in terminals like kitty and any other app that handles keyboard input wierd.
    • stale results from a prior recording can't collide with a new recording's results anymore.
    • cleanup step gracefully recovers from a rare double-callback case in the speech apis. one of those bugs you only see in production.
    • debug logs no longer contain the actual transcribed text, just lengths. safe to share for support tickets.
    • multi-slot hotkeys: bind different keys to different target apps. one for your terminal, one for your editor, one for game chat overlay. hold whichever.
    • pre-speech and post-speech automation keys. open the chat box, dictate, hit enter - all in one hold. up to two key combos before, two after.
    • trigger words: say "execute" or "send" at the end and it fires the action you defined. multiple trigger words supported with key-press, key-combo, or insert-text actions.
    • engine picker. built-in macos speech (27 languages, on-device toggle), a high-accuracy on-device option, or cloud. pick what fits your accuracy / privacy / latency tradeoff.
    • optional cleanup step that strips fillers ("uh", "like", "you know"), converts spoken punctuation, and fixes grammar. each toggleable independently in case you only want some of it.
    • tabbed settings: hotkeys, triggers, engine, refine, general. each gets its own panel.
    • modifier-combo hotkeys. Cmd+`, Ctrl+Shift+F1, anything you can press.
    • live transcript overlay while you talk.
    • friendly warning when your target app isn't running, instead of dictating into the void for 10 seconds and wondering why nothing showed up.
    • trigger words processed before the cleanup step. used to be the other way around and the cleanup would sometimes strip "enter" out before the trigger system saw it.
    • saying just "execute" (a trigger word with no other text) used to make the cleanup step hallucinate a phantom transcript like "this is the cleaned version of the text". fixed.
    • custom vocabulary list gives the cleanup step context for proper nouns and jargon.
    • hold-to-record gesture. press your hotkey, speak, release. quick taps pass through to the active app so a tap-bound hotkey still works for normal use.
    • text lands in the focused app without stealing focus from the app you're actually in.
    • live preview of what's being transcribed appears every 200ms while you talk.
    • pasteboard fallback for apps that won't accept direct typed input. old clipboard contents are restored after.
    • configurable execute trigger word. say it at the end of dictation to fire Enter.
    • audio capture stops fully on release - the macos mic indicator turns off the moment you let go. no mic-on light when you're not actually dictating.
    • [BLANK_AUDIO] artifacts no longer leak into your transcripts.
    • trailing space appended after each dictation so you can keep dictating without manually adding the space.
    • optional copy-to-clipboard alongside injection.
    • the global hotkey listener auto-retries if the os silently disables it.