Apple Speech
already on your Mac. good enough for chat-style sentences, weak on technical jargon. pick this if you want to try the app before downloading anything.
first-run, short dictation, casual use
help
engine picks, hotkey shapes, what every toggle does. read what you need, skip the rest.
four engines, fifteen total options. local engines run entirely on your Mac with no network calls. the cloud API engine is the only one that uploads your audio, and you opt in by entering your own OpenAI key.
zero download. built into macOS. picks up instantly after install.
already on your Mac. good enough for chat-style sentences, weak on technical jargon. pick this if you want to try the app before downloading anything.
first-run, short dictation, casual use
OpenAI Whisper running on the Neural Engine. 10 sizes, English-only and multilingual variants at every tier.
smallest and fastest. quality is limited; useful as a "is the pipeline working" sanity check.
low-RAM Macs, lots of small dictations
same speed, broader language coverage at the cost of English-only accuracy.
low-RAM Macs, mixed-language dictation
noticeably better than Tiny. reasonable default for English-only users on any Mac.
comfortable everyday English
same size, all languages. fresh installs land here.
comfortable everyday multilingual
big jump in accuracy on jargon, code, and proper nouns. sweet spot if you mostly dictate English.
English with technical vocabulary
same accuracy class as Small English but across languages.
higher-accuracy multilingual
great for paragraphs of dense English. noticeably slower than Small.
long-form English
high accuracy at any language; slower than Small.
long-form multilingual
near-Large accuracy at roughly four times faster than full Large. the best speed-to-quality ratio in the WhisperKit lineup on Apple Silicon. default pick if you do not know which model to use.
most people who want the best balance
highest accuracy WhisperKit offers. slower than Turbo. pick this only if Turbo is leaving accuracy on the table for your specific use case.
maximum accuracy regardless of speed
NVIDIA Parakeet TDT and Qwen3-ASR via FluidAudio. optimised for the Neural Engine. includes a vocabulary-boost option.
strong English plus French, German, Spanish, Italian, Polish, and most of EU. auto-detects language; no language picker needed.
high-accuracy European-language dictation
predecessor of v3, English-only. slightly different acoustic model. try if v3 has trouble with your voice.
English-only Parakeet workflows
smallest, fastest Parakeet. pairs with the vocabulary boost: type your terminology into the vocabulary list and the recognizer prefers it during decoding.
snappy English dictation on any Mac
widest language coverage in the local lineup. particularly strong on East Asian languages.
Mandarin, Cantonese, Japanese, Korean, plus most EU/SEA languages
bring your own OpenAI key. audio uploads, no model lives on your Mac.
nothing to download. you pay per minute via your own OpenAI account. disabled by default. if you set it, your audio is sent to OpenAI for that one transcription. everything else stays local.
old Macs, very long dictations, fallback when local models do not fit
after transcription, the raw text can be cleaned up: punctuation, capitalisation, filler words, and obvious grammar errors get patched. the refine step is fully optional. you pick one of three backends in settings → refine.
four rule toggles on the refine tab let you turn each cleanup category on or off individually: punctuation, spoken-punctuation conversion ("comma" → ","), filler removal ("um", "uh", "like"), and grammar fixes.
each hotkey slot is one pair: a key combination and a target app. hold the key, dictate, release. the text types into the target. your foreground app stays foreground.
set up multiple slots if you dictate into different apps. one for Discord while gaming, one for Notes while in a meeting, one for your terminal while reading docs. each slot has its own target app and an optional pair of pre- and post-speech key combos (Cmd+L to focus a chat box before you start, Return to send after you stop).
the hotkey listener distinguishes "hold to dictate" from "tap the key normally". tapping the hotkey types the key as usual. holding it for the configured threshold starts recording. the threshold is short and tuned to feel snappy.
a trigger word is a phrase that ends your dictation with a specific keyboard action. three action types:
example. the default trigger word is "execute", set to press Return. you dictate "ship it tomorrow morning execute" into Slack. HeyClanker pastes "ship it tomorrow morning" and presses Return. the trigger phrase itself is stripped from what gets typed.
vocabulary. a list of words the recognizer often gets wrong. type them in the spelling you want to see: kubectl, psql, your colleague's name, your product name, internal acronyms. the recognizer biases toward your list during decoding. list lives on your Mac. nothing uploads. nothing trains.
learned corrections. optional, off by default. with the toggle on, HeyClanker watches the focused field for a few seconds after each paste. if you fix a word or two, it records the swap and applies it next time. one- and two-word substitutions only. never whole sentences. the pairs live on your Mac, ranked by how often each appears. the refiner uses them as hints. to delete or audit them, open settings → refine → learned corrections.
nothing to sign up for, no email to verify, no magic link that arrives in spam fourteen minutes late. the app launches, the hotkey works, that is the entire flow.
there is no HeyClanker backend in the middle. the app talks to your Mac and, if you opt in, your chosen LLM provider. no analytics, no usage events, no "anonymous diagnostics" feeding a dashboard somewhere. your dictation is not, under any circumstances, going to wind up in next quarter's "voice intelligence dataset" the company will swear is fully anonymised.
audio stays on your Mac with every engine except cloud API, which uploads only the audio for the one transcription you asked it to. API keys live in the macOS Keychain, not a plaintext config file you accidentally commit to a public repo. your vocabulary list and any learned corrections are local. none of it trains anything.
set a target app per hotkey slot in settings → hotkeys. when you hold the hotkey and speak, HeyClanker types into that target without bringing it forward. your current window stays focused. great for dictating into Discord while you stay in your game, or into a terminal on a second monitor while a meeting runs on the first.
if you want to try the app immediately, Apple Speech needs no download and works on any Mac. if you want better accuracy, download Whisper Large v3 Turbo. it is the recommended balance of accuracy and speed on Apple Silicon. East Asian languages? pick Qwen3-ASR. Cantonese is best handled by Qwen3-ASR.
optional. after the transcription is done, HeyClanker can pass it through a cleanup model to fix punctuation, capitalisation, filler words, and grammar mistakes. three backends: off (no cleanup), Apple Intelligence (on-device, free, requires macOS 26 and Apple Intelligence enabled), and cloud API (your own OpenAI or Anthropic key, charges your account per call). default is off.
the audio recorder waits a short, calibrated grace window after you release the hotkey to catch trailing samples. if you are consistently losing the last word, try speaking the last syllable a touch more slowly. Bluetooth headsets have higher input latency than wired mics; the recorder accounts for this automatically.
yes. the hotkey listener runs system-wide via macOS accessibility, so it fires inside fullscreen apps, Spaces, and Mission Control. the recording overlay floats above fullscreen apps without stealing focus. you stay in the game, the text lands in the target app for that hotkey.
press ESC while recording. the audio is discarded, nothing gets transcribed, nothing gets pasted. you can also say "force quit now" if the app ever gets stuck and you need it gone without opening Activity Monitor.
you set up phrases that end your dictation with a specific action. the default trigger is "execute": say it at the end of your dictation and HeyClanker presses Return after pasting, ending the line or starting a new paragraph depending on the app. the trigger phrase itself is stripped from the transcript. configure them in settings → triggers. three action types: a single key, a key combo (like Cmd+Return), or a literal text insert.
words the recognizer often gets wrong: terminal commands, internal product names, jargon, proper nouns. type them into settings → engine → vocabulary one per line, in the spelling you want to see. the recognizer biases toward your terms during decoding. the list lives on your Mac and is never uploaded.
only if you turn it on. with the toggle on, when you paste a transcript and then fix a word or two in the focused field, HeyClanker quietly records that swap and applies it next time. one-to-two-word corrections only; never whole sentences. stored locally. off by default to match the "100% local, nothing trains on you" stance.
because the default engines run entirely on your Mac. the cloud API engine is the one exception. you opt into it explicitly by entering an OpenAI key, and even then only the audio for that one transcription goes to OpenAI. there is no HeyClanker backend in the middle, no telemetry, no "anonymous usage data" feeding a dashboard somewhere.
controls how long a pause inside a dictation gets preserved before HeyClanker treats it as dead air to compress. default 1.0 second is calibrated for natural end-of-sentence pauses. lower it if transcriptions feel like they have unnecessary long gaps; raise it if punctuation feels too "running on" with no breath between sentences.
the app itself is roughly 15 MB. Apple Speech is free. the smallest local model is 75 MB; the largest is 2.9 GB. you only download the ones you actually pick. switching to a model that is not yet downloaded shows a one-click download prompt in settings → models with progress.
grant it again in system settings → privacy & security → accessibility. the hotkey listener will start working immediately without a relaunch. if it still does not respond after a minute, quit and reopen the app.
quit the app and drag it to the trash. the 14-day trial expires locally if you do nothing. there is no account to close, no exit survey, no "sorry to see you go" email with a discount code attached.
didn't find what you needed?