The funeral nobody noticed
I build a Mac dictation app. I have been watching this whole arc with an unhealthy level of professional interest, because the entire reason my product exists is the input layer that nobody seems to want to put on the cover of Wired. So when I saw the clips from Karpathy's Sequoia AI Ascent 2026 talk last month, I sat up.
The headline framing was that he had killed "vibe coding." That's not quite right. What he actually did was demote it. In his own words, paraphrased from the fireside chat with Sequoia partner Stephanie Zhan: vibe coding raises the floor, agentic engineering extrapolates the ceiling. One is the entry-level case where you describe what you want and accept what the model produces. The other is "the professional discipline of coordinating fallible agents while preserving correctness, security, taste, and maintainability." If you're shipping production software, you do the second thing. If you're hacking on a weekend project, the first thing is fine.
Fair enough. The TERM got buried. But here's what nobody is writing about: the input layer Karpathy used to coin the original phrase has not changed at all. It has only gotten more dominant. The model is no longer the news. The throat is.
Reread the original tweet
Go back to February 2, 2025. The tweet that started it all. Karpathy is half-joking, in that classic shower-of-thoughts register he gets when he's letting the engineer brain leak through the tweet limit. Here it is verbatim:
"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard."
Read the second sentence again. Slowly. Also I just talk to Composer with SuperWhisper.
The voice tool is mentioned almost as an afterthought — a parenthetical, a "by the way" — but it is doing all the work in that sentence. Without it, "vibe coding" is just "I prompt instead of typing code." With it, "vibe coding" is "I describe software with my mouth and watch it appear." Those are very different propositions. The first one is autocomplete with a marketing budget. The second is a new interface to computers.
That tweet got 4.5 million views. It became Collins Dictionary's 2025 word of the year. And in essentially every retrospective I read, the voice part was either skimmed past or omitted entirely. Everyone wanted to talk about the model. Almost nobody wanted to talk about the microphone.
What Karpathy actually said in May
By the time he sat down with Zhan at Sequoia, the picture had changed. Karpathy described December 2025 as the inflection point — the moment when, in his words, "the chunks just came out fine. I couldn't remember the last time I corrected it." In November he was writing about 80% of his own code. By December, that ratio had inverted. The agents had taken over.
The honest version of his current workflow, as he described it across that talk and a couple of follow-up posts, is that he hasn't written a line of code by hand in months. He spends his day directing AI agents in natural language. He has so much spare cycle from not typing that his "side-projects folder" is, his word, "extremely full." He built MenuGen — an app that lets you photograph a restaurant menu and generate images of every dish — apparently for fun, in the kind of timeframe that used to mean a long weekend.
This is the part nobody wants to sit with: if the agent can implement faster than you can describe what to implement, then the constraint is no longer the model. It's the meat. It's how fast the human brain can compress an idea into a sequence of words and push them out into the world. The model is staring at the cursor waiting for you to finish the sentence.
And typing tops out somewhere around 60-80 words per minute for most professional engineers, and that's after years of practice. Speaking is 150-200 wpm, also after years of practice — practice everyone already has, since you've been talking since you were two.
Voice quietly took over while we were watching the model
Look at what happened in the eight weeks bracketing Karpathy's talk:
On February 26, 2026, OpenAI shipped voice mode in Codex. A week later, on March 3, 2026, Anthropic rolled out voice mode in Claude Code — push-to-talk via the /voice slash command, hold spacebar to speak, release to send. Initial rollout to about 5% of users. Free for Pro, Max, Team, and Enterprise plans. By April, it was on 20 languages.
Anthropic shipped voice mode for Claude Code on March 3, 2026 — push-to-talk via spacebar. Image via TechCrunch.
Sam Abraham, an engineer who tried it that week, posted a long writeup that ended with the line: "I don't think I'm going back." The Anthropic engineer who announced it on X, Thariq Shihipar, framed it almost like an afterthought — voice mode is rolling out, here's the slash command, have fun. No press release. Just a tweet.
That same window: Wispr Flow, the cloud-based dictation app that has been eating Wispr's lunch in the consumer voice space, closed a $25M Series A extension at a $700M valuation from Notable Capital. They had signed 270 of the Fortune 500. Hans Tung — the partner who backed Affirm, Airbnb, Slack, Coinbase, Anthropic, and TikTok — joined their board as an observer. Wispr's pitch is that the average user is producing more than 50% of their characters through the app after three months. Half their typing, gone, replaced by speech.
Cursor 2.0 dropped in the same window. Roughly a thousand "I tried voice in [tool] for a week" Substack posts dropped with it. None of this was about smarter models. The models were the same. What changed was the mouth-to-machine pipeline.
The contrarians showed up — and proved the point
Of course, where there's hype, there are takedowns. The most sharply-written one in this cycle is "Vibe coding killed Cursor" by Anton Morgunov. His argument, briefly: by chasing the vibe coding crowd, Cursor optimized for cost and added "tunnel vision" to its agent — and that tunnel vision broke the tool for full-time engineers doing serious work on big codebases. He recommends OpenCode instead, where every change shows up as a git diff you can actually inspect.
Cursor's own CEO, Michael Truell, was quoted in December 2025 saying that vibe coding builds "shaky foundations" that eventually crumble. The CEO of the company most associated with vibe coding spent his Christmas warning people off vibe coding. That's the arc.
Notice what nobody in the contrarian camp is saying: nobody is saying voice was the problem. The complaints are all about the model layer — token efficiency, context windows, agent autonomy, code review. The mouth-to-machine pipeline is invisible in the critique. It's never the bug. It's just there.
That is the tell. When everybody — proponents and detractors — silently agrees that voice is fine and the argument is about everything else, voice has become infrastructure. It's stopped being a feature and started being plumbing.
The actual editorial: voice is the new typing speed
Here's the part where I tell you what I actually think, since I have been building for this exact moment for a couple of years now and I have opinions.
Typing speed used to be a developer skill. Not a glamorous one — nobody put "100 wpm" on a resume the way they put "Rust" — but it was real. The fastest engineers I knew in the 2010s could type at the speed of thought. Their fingers were a roughly transparent layer between what they wanted and what the editor showed. Slow typists were just slower. Same brain, same talent, but every idea took longer to get from inside their head to the file. It compounded across years.
That skill is now in the process of being retired the same way mental arithmetic got retired by calculators. Not because typing is gone, but because the ceiling moved. The new ceiling is articulation. Voice is the new typing speed. The engineers who can describe what they want in clean, structured natural language — the way you'd brief a sharp junior engineer — will produce more software than the engineers who can't. This is true even with the same model and the same tools.
And you know who is best at this skill right now? Not the AI Twitter mafia. Not the YouTube tutorial people. It's the senior engineers who have spent 15 years writing tickets, design docs, and Slack messages explaining systems to other humans. The skill they were already paid for has just become the bottleneck skill for the new workflow.
This also flips the old relationship between dictation and developer tooling. For most of the last decade, voice tools were assistive technology — for people with RSI, for people who couldn't type for medical reasons, for the visually impaired. They were considered a workaround for a deficit. Now they're the input method for the people pushing the frontier of what software gets built. Developers are dictating not because they have to, but because typing is now the slow part of their day.
Where Karpathy and I disagree
Here's the one thing I'll push back on, gently, in the great man's direction.
Karpathy uses SuperWhisper. He's been open about this since the original tweet. SuperWhisper is a great product — I have no beef with the team, they ship a polished app and they were early to the local-Whisper-on-Mac party. He probably also uses Wispr Flow now and then, given it's the consumer momentum leader. Both are fine choices for what he does, because what he does is mostly research, mostly personal projects, mostly things he tweets about anyway.
But for the people I actually talk to — engineers writing proprietary code at companies, people working on financial systems or medical data or unannounced products or anything covered by an NDA — the calculus is different. Sending audio of yourself describing your codebase to a third-party cloud transcription service is a non-starter. It doesn't matter how good the privacy policy is. It doesn't matter that the company swears they don't train on your data. The moment that audio leaves your machine, you've created an audit trail that didn't exist before, in a place you don't control, that your security team has not approved, that you cannot prove the deletion of.
This is why I built EmberType. Same input layer, no audit trail. OpenAI's Whisper, running entirely on your Mac, no audio leaving the device. Same dictation experience as the cloud apps — you push a hotkey, you talk, the text appears wherever your cursor is. Cursor, Claude Code, Terminal, Slack, anywhere. The difference is that nothing about your voice or what you said ever exists outside your laptop. And it's a one-time $49, not a subscription.
I'm not arguing Karpathy is wrong to use SuperWhisper. For his work, it's the right tool. I'm arguing that the product that wins long-term won't be the one you can't ship at work. The voice layer is becoming critical infrastructure for software development. Critical infrastructure that runs in someone else's cloud, with someone else's privacy policy, at someone else's mercy, will eventually get banned by the same enterprise IT departments that banned ChatGPT in 2023 and then quietly unbanned it once Claude Enterprise existed.
The local version is the one that survives.
SuperWhisper, the dictation app Karpathy used in the original tweet. Great product. Also, audio leaves your device. Image via superwhisper.com.
What this actually means if you write code for a living
Strip out the editorial and there are three concrete things worth doing in the next 30 days, regardless of which voice tool you end up on.
One: spend an honest week dictating. Not "try it for an afternoon." A week. The reason is that the first three days are awful — your brain has not learned how to compose code-shaped prose out loud, and you'll feel like an idiot. By day four, the muscle starts to form. By day seven, you are noticeably faster than typing for some classes of work (long PR descriptions, ticket writing, briefing the agent, refactor explanations) and roughly tied for others (actual code-character-typing in places where AI completion isn't strong). The skill is real and it transfers, but it has a learning curve, just like touch typing did.
Two: learn the slash commands. Whether you're in Claude Code with /voice, in Cursor, or in your editor of choice with a system-wide Mac dictation app like EmberType or SuperWhisper or MacWhisper, the workflow is the same: push to talk, release to send. The friction is in the first 20 minutes of getting the hotkeys right and the last bit of resistance to actually talking out loud at your desk. Both are gone within a day.
Three: decide where your audio goes. This is the consequential one. If you work on personal projects or open-source code or research, use whatever feels best — Wispr Flow is the polished consumer experience, SuperWhisper is the OG, both are perfectly fine. If you write proprietary code at any kind of company with even a passing interest in security, find a tool that runs the model locally. EmberType is mine. There are others. The selection criteria is "does the audio leave my Mac?" If yes, your security team will eventually care.
The real story of the last six months
The story everyone is telling about the last six months is "vibe coding rose, then got demoted; agentic engineering is the future; Cursor is in trouble; OpenCode is the new hotness; the models keep getting better." All of that is true and all of that is the surface.
The story underneath it, the one I have been watching from the seat of building a dictation app, is simpler: voice quietly became the default way serious developers brief their machines, and almost nobody put that on the cover of anything. Karpathy himself has been saying it since day one. He just buried it in the second sentence of a tweet that everyone misread.
The term "vibe coding" died because it was always a joke about the model. The voice layer survived because it was never about the model — it was about the meat. And the meat, it turns out, is the bit that doesn't get refactored every six weeks by a new release from Anthropic or OpenAI.
The word changes. The throat doesn't.
The Voice Layer That Stays on Your Mac
EmberType runs Whisper AI locally — no audio leaves your laptop. Works system-wide in Cursor, Claude Code, terminal, IDE, anywhere. The dictation app you can actually ship at work.
Download EmberType Free7-day free trial. $49 one-time after. No subscription, no account, no audio uploaded. macOS 14+, Apple Silicon.
Sources I leaned on for this
I want to be honest about what I read to write this — both because the editorial above only works if the facts under it are right, and because half the fun of an essay like this is following the thread back yourself.
- Karpathy's own writeup of his Sequoia AI Ascent 2026 talk on his bearblog — the canonical source for the "agentic engineering" reframe and the December 2025 inflection point.
- The original "vibe coding" tweet from February 2, 2025 — the SuperWhisper line is in the second sentence.
- "From Vibe Coding to Agentic Engineering" — the YouTube recording of the Sequoia fireside chat.
- TechCrunch on Claude Code voice mode, March 3, 2026 — the launch announcement and rollout details.
- Anthropic's official Claude Code voice docs — the slash command, the push-to-talk model, the language list.
- TechCrunch on Wispr Flow's $25M raise at a $700M valuation, led by Notable Capital.
- Anton Morgunov's "Vibe coding killed Cursor" — the contrarian essay that Hacker News went feral over.
- Collins Dictionary's word of the year 2025 announcement — for the cultural reach claim.
Frequently Asked Questions
Free Mac Dictation Tips
Get tips on voice-to-text, dictation workflows, and productivity. No spam.
Unsubscribe anytime. We never share your email.
You're in! Check your inbox.
The Local Voice Layer for Mac
EmberType is 100% offline voice-to-text for macOS. Whisper AI on-device. No cloud, no subscription, no audit trail. The dictation app for engineers who can't ship cloud audio at work.
Download EmberType FreemacOS 14+ required. Apple Silicon only. $49 after 7-day trial.
