Apple Speech
already on your Mac. good enough for chat-style sentences, weak on technical jargon. pick this if you want to try the app before downloading anything.
first-run, short dictation, casual use
help
engine picks, hotkey shapes, what every toggle does. read what you need, skip the rest.
four engines, sixteen total options. local engines run entirely on your Mac with no network calls. the cloud API engine is the only one that uploads your audio, and you opt in by entering your own OpenAI key.
zero download. built into macOS. picks up instantly after install.
already on your Mac. good enough for chat-style sentences, weak on technical jargon. pick this if you want to try the app before downloading anything.
first-run, short dictation, casual use
OpenAI Whisper running on the Neural Engine. 10 sizes, English-only and multilingual variants at every tier.
smallest and fastest. quality is limited; useful as a "is the pipeline working" sanity check.
low-RAM Macs, lots of small dictations
same speed, broader language coverage at the cost of English-only accuracy.
low-RAM Macs, mixed-language dictation
noticeably better than Tiny. reasonable default for English-only users on any Mac.
comfortable everyday English
same size, all languages. fresh installs land here.
comfortable everyday multilingual
big jump in accuracy on jargon, code, and proper nouns. sweet spot if you mostly dictate English.
English with technical vocabulary
same accuracy class as Small English but across languages.
higher-accuracy multilingual
great for paragraphs of dense English. noticeably slower than Small.
long-form English
high accuracy at any language; slower than Small.
long-form multilingual
near-Large accuracy at roughly four times faster than full Large. the best speed-to-quality ratio in the WhisperKit lineup on Apple Silicon. default pick if you do not know which model to use.
most people who want the best balance
highest accuracy WhisperKit offers. slower than Turbo. pick this only if Turbo is leaving accuracy on the table for your specific use case.
maximum accuracy regardless of speed
NVIDIA Parakeet TDT and Qwen3-ASR via FluidAudio. optimised for the Neural Engine. includes a vocabulary-boost option.
strong English plus French, German, Spanish, Italian, Polish, and most of EU. auto-detects language; no language picker needed.
high-accuracy European-language dictation
predecessor of v3, English-only. slightly different acoustic model. try if v3 has trouble with your voice.
English-only Parakeet workflows
smallest, fastest Parakeet. pairs with the vocabulary boost: type your terminology into the vocabulary list and the recognizer prefers it during decoding.
snappy English dictation on any Mac
widest language coverage in the local lineup. particularly strong on East Asian languages.
Mandarin, Cantonese, Japanese, Korean, plus most EU/SEA languages
bring your own OpenAI key. audio uploads per transcription. no model lives on your Mac.
nothing to download. you pay per minute via your own OpenAI account. disabled by default. if you set it, your audio is sent to OpenAI for that one transcription. everything else stays local.
old Macs, very long dictations, fallback when local models do not fit
after transcription, the raw text can be cleaned up: punctuation, capitalisation, filler words, and obvious grammar errors get patched. the refine step is fully optional. you pick one of three backends in settings → refine.
four rule toggles on the refine tab let you turn each cleanup category on or off individually: punctuation, spoken-punctuation conversion ("comma" → ","), filler removal ("um", "uh", "like"), and grammar fixes.
each hotkey slot is one pair: a key combination and a target app. hold the key, dictate, release. the text types into the target. your foreground app stays foreground.
set up multiple slots if you dictate into different apps. one for Discord while gaming, one for Notes while in a meeting, one for your terminal while reading docs. each slot has its own target app and an optional pair of automation key combos that fire around the dictation (see below).
your hotkey does two different things depending on how long you hold it. a quick tap (under ~350ms) passes through to whatever app you're in as a normal keystroke, so binding backtick or an F-key doesn't lock you out of ever typing that key for its normal purpose. holding past the threshold starts recording. release ends it.
this is what makes binding a bare key (backtick, caps lock, an F-key) usable instead of forcing you into a two-handed modifier combo like Cmd+Shift+X. accidental held presses below the threshold don't trigger anything.
two optional key combos per slot, fired around your dictation:
the full sequence on a recording: hold the hotkey → pre-speech keys fire → you talk → release → transcription runs → refine cleans up (if on) → text gets pasted into the target → post-speech keys fire. if a trigger word matches at the end of what you said, its action replaces the slot's post-speech keys for that recording only.
a trigger word is a phrase that ends your dictation with a specific keyboard action. three action types:
example. the default trigger word is "execute", set to press Return. you dictate "ship it tomorrow morning execute" into Slack. HeyClanker pastes "ship it tomorrow morning" and presses Return. the trigger phrase itself is stripped from what gets typed.
vocabulary. a list of words the recognizer often gets wrong. type them in the spelling you want to see: kubectl, psql, your colleague's name, your product name, internal acronyms. the recognizer biases toward your list during decoding. list lives on your Mac. nothing uploads. nothing trains.
learned corrections. optional, off by default. with the toggle on, HeyClanker watches the focused field for a few seconds after each paste. if you fix a word or two, it records the swap and applies it next time. one- and two-word substitutions only. never whole sentences. the pairs live on your Mac, ranked by how often each appears. the refiner uses them as hints. to delete or audit them, open settings → refine → learned corrections.
nothing to sign up for, no email to verify, no magic link that arrives in spam fourteen minutes late. the app launches, the hotkey works, that is the entire flow.
there is no HeyClanker backend in the middle. the app talks to your Mac and, if you opt in, your chosen LLM provider. no analytics, no usage events, no "anonymous diagnostics" feeding a dashboard somewhere. your dictation is not, under any circumstances, going to wind up in next quarter's "voice intelligence dataset" the company will swear is fully anonymised.
audio stays on your Mac with every engine except cloud API, which uploads only the audio for the one transcription you asked it to. API keys live in the macOS Keychain, not a plaintext config file you accidentally commit to a public repo. your vocabulary list and any learned corrections are local. none of it trains anything.
set a target app per hotkey slot in settings → hotkeys. when you hold the hotkey and speak, HeyClanker types into that target without bringing it forward. your current window stays focused. great for dictating into Discord while you stay in your game, or into a terminal on a second monitor while a meeting runs on the first.
if you want to try the app immediately, Apple Speech needs no download and works on any Mac. if you want better accuracy, download Whisper Large v3 Turbo. it is the recommended balance of accuracy and speed on Apple Silicon. East Asian languages? pick Qwen3-ASR. Cantonese is best handled by Qwen3-ASR.
optional cleanup that runs after transcription: punctuation, capitalisation, filler words, grammar. three backends (off, Apple Intelligence on-device, or your own cloud API key) and four toggles. default is off. full details in the refine section above.
the audio recorder waits a short, calibrated grace window after you release the hotkey to catch trailing samples. if you are consistently losing the last word, try speaking the last syllable a touch more slowly. Bluetooth headsets have higher input latency than wired mics; the recorder accounts for this automatically.
yes. the hotkey listener runs system-wide via macOS accessibility, so it fires inside fullscreen apps, Spaces, and Mission Control. the recording overlay floats above fullscreen apps without stealing focus. you stay in the game, the text lands in the target app for that hotkey.
yes. pick your input device in settings → general → microphone and HeyClanker reads from whatever you point it at. Bluetooth headsets that occasionally wake muted at the device level get unmuted automatically before each recording. a built-in level meter and 3-second record-and-playback in the same settings panel lets you confirm the right device is captured before you commit.
yes. HeyClanker does not claim exclusive access to the microphone, so OBS, Loom, QuickTime, or any screen recorder can capture from the same input at the same time. the recording overlay is a small floating element above fullscreen apps; if you do not want it in your stream output, set the recording indicator to live in the notch (on notched MacBooks) or capture only the game window in your scene.
press ESC while recording. the audio is discarded, nothing gets transcribed, nothing gets pasted. you can also say "force quit now" if the app ever gets stuck and you need it gone without opening Activity Monitor.
the default trigger is "execute": say it at the end of your dictation and HeyClanker presses Return after pasting. the trigger phrase is stripped from the transcript. configure them in settings → triggers; full details in the trigger words section above.
words the recognizer keeps getting wrong: terminal commands, internal product names, jargon, proper nouns. add them in settings → engine → vocabulary and the recognizer biases toward them during decoding. lives on your Mac, never uploaded.
only if you turn it on. with the toggle on, when you fix a word or two right after a paste, HeyClanker records the swap and applies it next time. one- and two-word corrections only, stored locally, off by default.
not by default. the default engines run entirely on your Mac. the cloud API engine is the one exception, and you opt into it explicitly by entering an OpenAI key; even then only the audio for that one transcription goes to OpenAI. there is no HeyClanker backend in the middle, no telemetry, no "anonymous usage data" feeding a dashboard somewhere.
controls how long a pause inside a dictation gets preserved before HeyClanker treats it as dead air to compress. default 1.0 second is calibrated for natural end-of-sentence pauses. lower it if transcriptions feel like they have unnecessary long gaps; raise it if punctuation feels too "running on" with no breath between sentences.
a toggle in settings → transcription that loads the chosen model into memory at app launch instead of on your first hotkey press. trades a small amount of RAM for no warm-up pause on the first dictation of the session. off by default; flip it on if you dictate frequently and the first-of-the-session lag is annoying.
the app itself is roughly 15 MB. Apple Speech is free. the smallest local model is 75 MB; the largest is 2.9 GB. you only download the ones you actually pick. switching to a model that is not yet downloaded shows a one-click download prompt in settings → models with progress.
grant it again in system settings → privacy & security → accessibility. the hotkey listener will start working immediately without a relaunch. if it still does not respond after a minute, quit and reopen the app.
quit the app and drag it to the trash. the 14-day trial expires locally if you do nothing. there is no account to close, no exit survey, no "sorry to see you go" email with a discount code attached.
didn't find what you needed?