HEY! CLANKER! OVER THERE!
voice dictation for macOS, one hotkey per app. each key points at its own target: backtick into Discord, equal into Claude Code, whichever combo you've set for Notes. hold, speak, release. the text lands in the right app while you stay in whatever you were just doing. your game doesn't pause, your cursor doesn't jump, your standup doesn't notice. other dictation apps spent this year shipping subscription tiers, AI agents that pitched themselves as your replacement on a billboard, vibe-coded settings panels that didn't survive their own launch tweet, and "thought partners" that need to journal about the meeting before transcribing it.
we added the app routing everyone was missing.
type into the app behind the app.
every other voice tool yanks focus to type. HeyClanker doesn't. pair-program with Claude Code while answering in Discord. write a Slack reply without leaving your editor. dictate into Notes on a second monitor while Logic keeps playing on the first. no alt-tab, no AI agent kindly switching windows for you and then volunteering to also rewrite the message in a more impactful tone. your active window doesn't move. apparently not interrupting you is now a feature.
your voice never leaves your Mac.
transcription runs locally via WhisperKit. Apple Intelligence cleans it up, also on-device. no backend, no dashboard, no privacy policy buried with "we retain transcripts for quality and training purposes". want a cloud engine instead? bring an API key. your keys, your data, your choice - nothing rides on ours.
lives in your menu bar. the rest of the screen stays yours.
SwiftUI. AVFoundation. WhisperKit. the dependency tree is your operating system - not 47 npm packages and a Chromium runtime booted to render a mic settings panel. ~15MB, not 240. no splash screen, no "how can I help today"? modal that pops up before you've even said anything, no standalone window quietly suggesting you upgrade to a teams plan. one icon in the menu bar. you hold a key, it listens. you release it, it shuts up.
every hotkey. every target. every trigger.
multiple hotkeys, each pointing at a different target app. trigger words that fire key combos or drop in canned text. automation before you speak and after. workflows you wired up on purpose - not ones a clanker vibe-coded into a critical security vulnerability at 2am while you were asleep.
speaks 100+ languages. none of them call home.
three on-device engines between them cover everything from Mandarin to Norwegian to that very specific dialect of English that engineers use. none call a server. none charge per minute. none require a "multilingual pro" tier that magically appeared in last quarter's pricing update. the cloud option exists too, if you'd rather pay OpenAI per syllable - most people won't need it.
knows that "kubectl" is not "cube control".
type your jargon into a list - terminal commands, internal product names, the proper nouns every other dictation tool turns into something embarrassing. the recognizer biases toward your terms while you talk. the list lives on your Mac. nothing trains. nothing uploads. nothing winds up in next quarter's "voice intelligence dataset" the company will swear is fully anonymized.
no login. no magic link. no idea who you are.
there's no account because there's nothing to put in one. no "verify your email" that arrived fourteen minutes late and went to spam. no onboarding wizard with seven slides explaining what a microphone is. no Mixpanel quietly noting which preference panes you've opened this week. no intercom popup asking how your first session went. the app launches. the hotkey works. that's the entire flow.
$2.99. that's all the tiers.
14 days free. then $2.99. once. forever. one tier, because nobody should have to read a feature comparison chart for a $3 Mac app. no "we've reluctantly added AI and unfortunately need to move to a subscription model" email next quarter. no exit survey if you delete the app, no "sorry to see you go" email with a 25% discount attached to win you back, no AI agent reaching out to "understand the friction". you pay once and you keep using it. forever isn't marketing language here - it's the actual deal.
hold. speak. ship!
14 days free. then $2.99, once. if it doesn't earn a hotkey on your keyboard, drag it to the trash. no exit survey, no LinkedIn DM from a founder you've never heard of asking to "jump on a quick call about your dictation journey".
you bought an app. you have an app. that's the entire relationship.
everything.
engines, models, refine modes, hotkey shapes, what's local, what's optional. no marketing fluff.
- Hotkeys
- multiple slots, each wired to its own target app. backtick into Discord, equal into Claude Code, slash into Slack, backslash into Notes, whatever your fingers already know. pre-speech keys can pop open a chat bar before you talk; post-speech keys can hit Return after you're done. ESC during a recording bails out. tap the hotkey instead of holding it and it just types like a normal key. the clanker isn't fighting you for the keyboard.
- Transcription
- four engines. Apple Speech (built into macOS). WhisperKit (Whisper on the Neural Engine, 10 sizes). FluidAudio (Parakeet and Qwen3-ASR, 4 variants). one cloud API path for your own OpenAI key. in the default setup, only the cloud path ever uploads audio. the other three run on your Mac, the way speech-to-text used to work before everything had to be a 'platform'.
- Whisper models
- ten sizes from Tiny English (75 MB) to Large v3 (2.9 GB). pick Large v3 Turbo (1.6 GB) if you want the one we actually recommend: near-Large accuracy at roughly four times faster on Apple Silicon. English-only variants from Tiny through Medium, and multilingual variants at every size, so you're not paying disk space on 99 languages when all you do is English.
- Parakeet & Qwen3
- NVIDIA Parakeet, served through FluidAudio. TDT v3 for 25 European languages. TDT v2 for English only. TDT-CTC 110m if you want the smallest, fastest English option. plus Qwen3-ASR 0.6B for 50+ languages including Mandarin, Cantonese, Japanese, Korean. all four run on your Mac's Neural Engine, not on someone else's GPU cluster billed per minute.
- Languages
- 27 Apple Speech locales. 99 WhisperKit languages. 50+ via Qwen3-ASR. 25 European via Parakeet v3. Cantonese works via Qwen3, which is the only one in the lineup that actually handles it. pick the engine that's strong on the language you actually speak, not the one some leaderboard rated highest on AISHELL.
- Refinement
- optional cleanup, off by default. three backends: off (raw transcript), Apple Intelligence (on-device, free, macOS 26+), or cloud API (your own OpenAI or Anthropic key, charged to your account, not ours). four toggles for punctuation, spoken punctuation, filler removal, and grammar. the refiner gets a small hint about which app the text is headed into, so it treats your terminal output differently than your Slack message. it cleans up what you said. it does not, however, decide what you should have said instead.
- Trigger words
- phrases at the end of your dictation that fire an action. three kinds: a single key (e.g. Return), a key combo (Cmd+Return to send a message), or a literal text insert. say 'send this please' and HeyClanker pastes the rest of the transcript, strips the trigger phrase, and presses Return. the agent didn't do that. you did, with three syllables.
- Learning
- off by default. switch it on and HeyClanker watches the focused text field for a few seconds after each paste. fix a word or two and it records the swap, then applies the correction next time you say the same thing. lives on your Mac, ranked by how often each correction reappears. nothing trains. nothing about your typing habits is ending up in someone's quarterly productivity-insights deck.
- Vocabulary
- one list of terms the recognizer keeps getting wrong: kubectl, your coworker named Sahra, the internal product codename the model is sure means something else. the refiner gets the list as a cleanup hint on every engine. Parakeet also biases its decoder toward your terms during recognition when they're phonetically similar to what it heard. lives on your Mac. doesn't go off to make some future model smarter.
- Audio
- recording that won't pause your music or video. microphone picker with a built-in test (live level meter and a 3-second record-and-playback) so you can verify the right device before you commit. auto-unmutes Bluetooth headsets that wake muted at the device level, which they do more often than they should. optional system audio mute during recording, skipped automatically when headphones are detected. live 32-band frequency meter. recovers if your audio device gets unplugged mid-sentence.
- Overlay
- lives in the notch on notched MacBooks. floats above fullscreen apps on every other display. shows the live transcript while you speak and the cleaned text after. pins to whichever monitor has keyboard focus, which is useful if you're dictating into Discord on the left screen while your game runs full-screen on the right.
- Injection
- accessibility-first: the target app never steals focus from the one you're actually in. paste fallback for apps that don't expose a writable text field through accessibility. preserves and restores your existing clipboard. optional auto-paste toggle for the case where you only want the transcript on the clipboard, not typed in.
- Live preview
- streaming transcript shown in the overlay while you speak. confirmed-stable words at full opacity, volatile candidates dimmed so you can see what's about to settle. works across all four engines. when the chosen engine doesn't stream natively, Apple Speech provides the preview behind the scenes while the main engine handles the final pass.
- Privacy
- nothing to sign up for. no email to verify, no telemetry feeding a dashboard somewhere counting how often you said 'uh'. the app talks to your Mac, and if you opt in, to your chosen LLM provider. audio stays local on every engine except cloud API. API keys live in the macOS Keychain. vocabulary and any learned corrections never leave. there is no HeyClanker backend the data has to pass through, because there is no HeyClanker backend.
- Licensing
- 14-day free trial. then $2.99 once, and you keep it. no subscription, no upgrade tier, no email next quarter asking how your first year together has been or whether you'd like to renew at a 'limited-time' 20% discount. per-machine activation, transferable via deactivation if you switch to a new Mac.
- Reliability
- your screen won't dim or sleep in the middle of a recording. the hotkey works the same before and after you close the lid. updates download quietly in the background and install when you next quit, never mid-recording, never with a 'restart now to apply' nag in the middle of your standup.
- Requirements
- macOS 26.2. Apple Silicon recommended. about 15 MB for the app itself, plus whichever model you choose to download (75 MB to 2.9 GB). microphone access and accessibility permission. the accessibility one is what lets the global hotkey fire inside fullscreen games, other Spaces, and any app that thinks it owns the input system.
about.
I made this for myself.
the setup that broke me was being deep in a game on the main monitor with Claude in a split-pane terminal next to me, and wanting to dictate something to Claude to keep the vibe going, without leaving the game and switching apps. or the reverse: head down in terminal with Discord open across the desk and wanting to drop a quick line into a channel without losing my place in whatever I was writing. either direction, the workflow was alt-tab to the target, click into the text field, type, alt-tab back, and by then my character was dead or my half-written command had evaporated.
nothing on macOS did the boring version of this. dictate into a specific app without leaving the one you're in. every dictation app wanted to be a personality, every voice assistant wanted to be a coworker, and none of them just sent your voice somewhere quietly without first wanting to ideate on my vibe.
so I built it. hotkeys, one per app. hold the key, talk, let go, the text shows up where it's supposed to and my active window doesn't move.
the name comes from what I caught myself saying out loud the first time I used it, which was "hey clanker, deploy", straight into kitty.
it doesn't try to be an assistant. it doesn't remember conversations, have opinions about my writing, or suggest we journal about the meeting. it sends my voice to the app the key is bound to. I use it every day.