help

how does this thing work?

engine picks, hotkey shapes, what every toggle does. read what you need, skip the rest.

getting started

download and open the app. the icon shows up in your menu bar.
grant accessibility in system settings → privacy & security → accessibility. required for the global hotkey to work in any app.
grant microphone access when prompted. without it, there is nothing to dictate.
pick a hotkey and a target app in settings → hotkeys. the default is the backtick (`) key targeting TextEdit; change to whatever pairs you actually use.
pick a transcription engine in settings → engine. Apple Speech needs no download; everything else needs a one-click model fetch in settings → models.
hold the hotkey, speak, release. the text appears in your target app while you stay in whatever you were doing.

engines & models

four engines, sixteen total options. local engines run entirely on your Mac with no network calls. the cloud API engine is the only one that uploads your audio, and you opt in by entering your own OpenAI key.

Apple Speech

zero download. built into macOS. picks up instantly after install.

Apple Speech

0 MB · 27 locales

already on your Mac. good enough for chat-style sentences, weak on technical jargon. pick this if you want to try the app before downloading anything.

first-run, short dictation, casual use

WhisperKit

OpenAI Whisper running on the Neural Engine. 10 sizes, English-only and multilingual variants at every tier.

Tiny English

75 MB · English

smallest and fastest. quality is limited; useful as a "is the pipeline working" sanity check.

low-RAM Macs, lots of small dictations

Tiny Multilingual

75 MB · ~99 langs

same speed, broader language coverage at the cost of English-only accuracy.

low-RAM Macs, mixed-language dictation

Base English

145 MB · English

noticeably better than Tiny. reasonable default for English-only users on any Mac.

comfortable everyday English

Base Multilingual

145 MB · ~99 langs

same size, all languages. fresh installs land here.

comfortable everyday multilingual

Small English

465 MB · English

big jump in accuracy on jargon, code, and proper nouns. sweet spot if you mostly dictate English.

English with technical vocabulary

Small Multilingual

465 MB · ~99 langs

same accuracy class as Small English but across languages.

higher-accuracy multilingual

Medium English

1.4 GB · English

great for paragraphs of dense English. noticeably slower than Small.

long-form English

Medium Multilingual

1.4 GB · ~99 langs

high accuracy at any language; slower than Small.

long-form multilingual

Large v3 Turbo (Recommended)

1.6 GB · ~99 langs

near-Large accuracy at roughly four times faster than full Large. the best speed-to-quality ratio in the WhisperKit lineup on Apple Silicon. default pick if you do not know which model to use.

most people who want the best balance

Large v3

2.9 GB · ~99 langs

highest accuracy WhisperKit offers. slower than Turbo. pick this only if Turbo is leaving accuracy on the table for your specific use case.

maximum accuracy regardless of speed

Parakeet (FluidAudio)

NVIDIA Parakeet TDT and Qwen3-ASR via FluidAudio. optimised for the Neural Engine. includes a vocabulary-boost option.

Parakeet TDT v3

~500 MB · 25 European langs

strong English plus French, German, Spanish, Italian, Polish, and most of EU. auto-detects language; no language picker needed.

high-accuracy European-language dictation

Parakeet TDT v2

~500 MB · English

predecessor of v3, English-only. slightly different acoustic model. try if v3 has trouble with your voice.

English-only Parakeet workflows

Parakeet TDT-CTC 110m

~110 MB · English

smallest, fastest Parakeet. pairs with the vocabulary boost: type your terminology into the vocabulary list and the recognizer prefers it during decoding.

snappy English dictation on any Mac

Qwen3-ASR 0.6B

~900 MB · 50+ langs

widest language coverage in the local lineup. particularly strong on East Asian languages.

Mandarin, Cantonese, Japanese, Korean, plus most EU/SEA languages

Cloud API

bring your own OpenAI key. audio uploads per transcription. no model lives on your Mac.

OpenAI Whisper API

nothing local · ~99 langs

nothing to download. you pay per minute via your own OpenAI account. disabled by default. if you set it, your audio is sent to OpenAI for that one transcription. everything else stays local.

old Macs, very long dictations, fallback when local models do not fit

not sure which to pick?

just installed, want to try it? stay on Apple Speech. zero download, works right now.
want to feel the difference? download Whisper Large v3 Turbo. best speed/accuracy ratio on Apple Silicon.
European languages? Parakeet TDT v3.
Mandarin / Cantonese / Japanese / Korean? Qwen3-ASR.
old Mac with little RAM? Parakeet TDT-CTC 110m. smallest, fastest local option.
have an OpenAI key and don't want to download anything? cloud API. pay per minute via your account.

the refine step

after transcription, the raw text can be cleaned up: punctuation, capitalisation, filler words, and obvious grammar errors get patched. the refine step is fully optional. you pick one of three backends in settings → refine.

off (default). the raw transcript goes straight into your target app. fastest. no cleanup.
Apple Intelligence. free, on-device. requires macOS 26 with Apple Intelligence turned on. no network calls.
cloud API. your own OpenAI or Anthropic key. charges your account per refine. the transcript text is sent to the provider; your audio is not.

four rule toggles on the refine tab let you turn each cleanup category on or off individually: punctuation, spoken-punctuation conversion ("comma" → ","), filler removal ("um", "uh", "like"), and grammar fixes.

hotkeys & target apps

each hotkey slot is one pair: a key combination and a target app. hold the key, dictate, release. the text types into the target. your foreground app stays foreground.

set up multiple slots if you dictate into different apps. one for Discord while gaming, one for Notes while in a meeting, one for your terminal while reading docs. each slot has its own target app and an optional pair of automation key combos that fire around the dictation (see below).

tap vs hold

your hotkey does two different things depending on how long you hold it. a quick tap (under ~350ms) passes through to whatever app you're in as a normal keystroke, so binding backtick or an F-key doesn't lock you out of ever typing that key for its normal purpose. holding past the threshold starts recording. release ends it.

this is what makes binding a bare key (backtick, caps lock, an F-key) usable instead of forcing you into a two-handed modifier combo like Cmd+Shift+X. accidental held presses below the threshold don't trigger anything.

automation keys (pre & post-speech)

two optional key combos per slot, fired around your dictation:

pre-speech fires the moment recording starts, before you talk. use it to prep the target app: Cmd+L to focus a chat input box, Cmd+T to open a new tab, Cmd+F to open search. whatever your target needs in order to land your dictation in the right field.
post-speech fires after the transcript has been pasted into the target. use it to commit the action: Return to send a message in Slack or Discord, Cmd+S to save a note, Cmd+Return for apps that want modifier+enter to submit.

the full sequence on a recording: hold the hotkey → pre-speech keys fire → you talk → release → transcription runs → refine cleans up (if on) → text gets pasted into the target → post-speech keys fire. if a trigger word matches at the end of what you said, its action replaces the slot's post-speech keys for that recording only.

trigger words

a trigger word is a phrase that ends your dictation with a specific keyboard action. three action types:

key. a single key press after paste. common: Return.
key combo. modifiers + key. common: Cmd+Return to send a chat message, Cmd+S to save a note.
insert text. a literal string appended after the transcript. common: a signature line, a newline, an emoji.

example. the default trigger word is "execute", set to press Return. you dictate "ship it tomorrow morning execute" into Slack. HeyClanker pastes "ship it tomorrow morning" and presses Return. the trigger phrase itself is stripped from what gets typed.

vocabulary & learned corrections

vocabulary. a list of words the recognizer often gets wrong. type them in the spelling you want to see: kubectl, psql, your colleague's name, your product name, internal acronyms. the recognizer biases toward your list during decoding. list lives on your Mac. nothing uploads. nothing trains.

learned corrections. optional, off by default. with the toggle on, HeyClanker watches the focused field for a few seconds after each paste. if you fix a word or two, it records the swap and applies it next time. one- and two-word substitutions only. never whole sentences. the pairs live on your Mac, ranked by how often each appears. the refiner uses them as hints. to delete or audit them, open settings → refine → learned corrections.

privacy posture

nothing to sign up for, no email to verify, no magic link that arrives in spam fourteen minutes late. the app launches, the hotkey works, that is the entire flow.

there is no HeyClanker backend in the middle. the app talks to your Mac and, if you opt in, your chosen LLM provider. no analytics, no usage events, no "anonymous diagnostics" feeding a dashboard somewhere. your dictation is not, under any circumstances, going to wind up in next quarter's "voice intelligence dataset" the company will swear is fully anonymised.

audio stays on your Mac with every engine except cloud API, which uploads only the audio for the one transcription you asked it to. API keys live in the macOS Keychain, not a plaintext config file you accidentally commit to a public repo. your vocabulary list and any learned corrections are local. none of it trains anything.

FAQ

how do I dictate into another app without leaving the one I am in?

set a target app per hotkey slot in settings → hotkeys. when you hold the hotkey and speak, HeyClanker types into that target without bringing it forward. your current window stays focused. great for dictating into Discord while you stay in your game, or into a terminal on a second monitor while a meeting runs on the first.

which model should I download first?

if you want to try the app immediately, Apple Speech needs no download and works on any Mac. if you want better accuracy, download Whisper Large v3 Turbo. it is the recommended balance of accuracy and speed on Apple Silicon. East Asian languages? pick Qwen3-ASR. Cantonese is best handled by Qwen3-ASR.

what is the refine step?

optional cleanup that runs after transcription: punctuation, capitalisation, filler words, grammar. three backends (off, Apple Intelligence on-device, or your own cloud API key) and four toggles. default is off. full details in the refine section above.

why does my dictation not pick up the trailing word?

the audio recorder waits a short, calibrated grace window after you release the hotkey to catch trailing samples. if you are consistently losing the last word, try speaking the last syllable a touch more slowly. Bluetooth headsets have higher input latency than wired mics; the recorder accounts for this automatically.

does the app work in fullscreen games?

yes. the hotkey listener runs system-wide via macOS accessibility, so it fires inside fullscreen apps, Spaces, and Mission Control. the recording overlay floats above fullscreen apps without stealing focus. you stay in the game, the text lands in the target app for that hotkey.

does it work with USB mics, Bluetooth headsets, or studio audio interfaces?

yes. pick your input device in settings → general → microphone and HeyClanker reads from whatever you point it at. Bluetooth headsets that occasionally wake muted at the device level get unmuted automatically before each recording. a built-in level meter and 3-second record-and-playback in the same settings panel lets you confirm the right device is captured before you commit.

does it work alongside OBS, screen recorders, or streaming apps?

yes. HeyClanker does not claim exclusive access to the microphone, so OBS, Loom, QuickTime, or any screen recorder can capture from the same input at the same time. the recording overlay is a small floating element above fullscreen apps; if you do not want it in your stream output, set the recording indicator to live in the notch (on notched MacBooks) or capture only the game window in your scene.

what if I want to cancel a recording mid-thought?

press ESC while recording. the audio is discarded, nothing gets transcribed, nothing gets pasted. you can also say "force quit now" if the app ever gets stuck and you need it gone without opening Activity Monitor.

how do trigger words work?

the default trigger is "execute": say it at the end of your dictation and HeyClanker presses Return after pasting. the trigger phrase is stripped from the transcript. configure them in settings → triggers; full details in the trigger words section above.

what is the vocabulary list for?

words the recognizer keeps getting wrong: terminal commands, internal product names, jargon, proper nouns. add them in settings → engine → vocabulary and the recognizer biases toward them during decoding. lives on your Mac, never uploaded.

does HeyClanker learn from my corrections?

only if you turn it on. with the toggle on, when you fix a word or two right after a paste, HeyClanker records the swap and applies it next time. one- and two-word corrections only, stored locally, off by default.

is my voice ever sent to a cloud server?

not by default. the default engines run entirely on your Mac. the cloud API engine is the one exception, and you opt into it explicitly by entering an OpenAI key; even then only the audio for that one transcription goes to OpenAI. there is no HeyClanker backend in the middle, no telemetry, no "anonymous usage data" feeding a dashboard somewhere.

what does the advanced silence detection slider do?

controls how long a pause inside a dictation gets preserved before HeyClanker treats it as dead air to compress. default 1.0 second is calibrated for natural end-of-sentence pauses. lower it if transcriptions feel like they have unnecessary long gaps; raise it if punctuation feels too "running on" with no breath between sentences.

what does the engine prewarm toggle do?

a toggle in settings → transcription that loads the chosen model into memory at app launch instead of on your first hotkey press. trades a small amount of RAM for no warm-up pause on the first dictation of the session. off by default; flip it on if you dictate frequently and the first-of-the-session lag is annoying.

how big are the downloads, really?

the app itself is roughly 15 MB. Apple Speech is free. the smallest local model is 75 MB; the largest is 2.9 GB. you only download the ones you actually pick. switching to a model that is not yet downloaded shows a one-click download prompt in settings → models with progress.

I revoked accessibility permission. now the hotkey does nothing.

grant it again in system settings → privacy & security → accessibility. the hotkey listener will start working immediately without a relaunch. if it still does not respond after a minute, quit and reopen the app.

how do I cancel my trial?

quit the app and drag it to the trash. the 14-day trial expires locally if you do nothing. there is no account to close, no exit survey, no "sorry to see you go" email with a discount code attached.

didn't find what you needed?

open an issue on GitHub.