Karpathy Killed "Vibe Coding" Last Month. The Voice Layer Survived.

A year ago, Andrej Karpathy fired off a half-baked tweet that ended up as Collins Dictionary's word of the year. Last month, on the Sequoia AI Ascent stage, he quietly buried it. The model news got the headlines. The actual interface — the part where a human still has to get an idea out of their head — survived the whole arc and is now the bottleneck nobody was supposed to be talking about.

Andrej Karpathy speaking at Sequoia Capital's AI Ascent 2026 conference, where he reframed vibe coding as agentic engineering

Karpathy at Sequoia AI Ascent 2026, where the term he coined got demoted. Image via Analytics Drift.

The funeral nobody noticed

I build a Mac dictation app. I have been watching this whole arc with an unhealthy level of professional interest, because the entire reason my product exists is the input layer that nobody seems to want to put on the cover of Wired. So when I saw the clips from Karpathy's Sequoia AI Ascent 2026 talk last month, I sat up.

The headline framing was that he had killed "vibe coding." That's not quite right. What he actually did was demote it. In his own words, paraphrased from the fireside chat with Sequoia partner Stephanie Zhan: vibe coding raises the floor, agentic engineering extrapolates the ceiling. One is the entry-level case where you describe what you want and accept what the model produces. The other is "the professional discipline of coordinating fallible agents while preserving correctness, security, taste, and maintainability." If you're shipping production software, you do the second thing. If you're hacking on a weekend project, the first thing is fine.

Fair enough. The TERM got buried. But here's what nobody is writing about: the input layer Karpathy used to coin the original phrase has not changed at all. It has only gotten more dominant. The model is no longer the news. The throat is.

Reread the original tweet

Go back to February 2, 2025. The tweet that started it all. Karpathy is half-joking, in that classic shower-of-thoughts register he gets when he's letting the engineer brain leak through the tweet limit. Here it is verbatim:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard."

Read the second sentence again. Slowly. Also I just talk to Composer with SuperWhisper.

The voice tool is mentioned almost as an afterthought — a parenthetical, a "by the way" — but it is doing all the work in that sentence. Without it, "vibe coding" is just "I prompt instead of typing code." With it, "vibe coding" is "I describe software with my mouth and watch it appear." Those are very different propositions. The first one is autocomplete with a marketing budget. The second is a new interface to computers.

That tweet got 4.5 million views. It became Collins Dictionary's 2025 word of the year. And in essentially every retrospective I read, the voice part was either skimmed past or omitted entirely. Everyone wanted to talk about the model. Almost nobody wanted to talk about the microphone.

What Karpathy actually said in May

By the time he sat down with Zhan at Sequoia, the picture had changed. Karpathy described December 2025 as the inflection point — the moment when, in his words, "the chunks just came out fine. I couldn't remember the last time I corrected it." In November he was writing about 80% of his own code. By December, that ratio had inverted. The agents had taken over.

The honest version of his current workflow, as he described it across that talk and a couple of follow-up posts, is that he hasn't written a line of code by hand in months. He spends his day directing AI agents in natural language. He has so much spare cycle from not typing that his "side-projects folder" is, his word, "extremely full." He built MenuGen — an app that lets you photograph a restaurant menu and generate images of every dish — apparently for fun, in the kind of timeframe that used to mean a long weekend.

This is the part nobody wants to sit with: if the agent can implement faster than you can describe what to implement, then the constraint is no longer the model. It's the meat. It's how fast the human brain can compress an idea into a sequence of words and push them out into the world. The model is staring at the cursor waiting for you to finish the sentence.

And typing tops out somewhere around 60-80 words per minute for most professional engineers, and that's after years of practice. Speaking is 150-200 wpm, also after years of practice — practice everyone already has, since you've been talking since you were two.

Voice quietly took over while we were watching the model

Look at what happened in the eight weeks bracketing Karpathy's talk:

On February 26, 2026, OpenAI shipped voice mode in Codex. A week later, on March 3, 2026, Anthropic rolled out voice mode in Claude Code — push-to-talk via the /voice slash command, hold spacebar to speak, release to send. Initial rollout to about 5% of users. Free for Pro, Max, Team, and Enterprise plans. By April, it was on 20 languages.

Claude Code terminal interface — Anthropic launched push-to-talk voice mode via the /voice slash command in March 2026

Anthropic shipped voice mode for Claude Code on March 3, 2026 — push-to-talk via spacebar. Image via TechCrunch.

Sam Abraham, an engineer who tried it that week, posted a long writeup that ended with the line: "I don't think I'm going back." The Anthropic engineer who announced it on X, Thariq Shihipar, framed it almost like an afterthought — voice mode is rolling out, here's the slash command, have fun. No press release. Just a tweet.

That same window: Wispr Flow, the cloud-based dictation app that has been eating Wispr's lunch in the consumer voice space, closed a $25M Series A extension at a $700M valuation from Notable Capital. They had signed 270 of the Fortune 500. Hans Tung — the partner who backed Affirm, Airbnb, Slack, Coinbase, Anthropic, and TikTok — joined their board as an observer. Wispr's pitch is that the average user is producing more than 50% of their characters through the app after three months. Half their typing, gone, replaced by speech.

Cursor 2.0 dropped in the same window. Roughly a thousand "I tried voice in [tool] for a week" Substack posts dropped with it. None of this was about smarter models. The models were the same. What changed was the mouth-to-machine pipeline.

The contrarians showed up — and proved the point

Of course, where there's hype, there are takedowns. The most sharply-written one in this cycle is "Vibe coding killed Cursor" by Anton Morgunov. His argument, briefly: by chasing the vibe coding crowd, Cursor optimized for cost and added "tunnel vision" to its agent — and that tunnel vision broke the tool for full-time engineers doing serious work on big codebases. He recommends OpenCode instead, where every change shows up as a git diff you can actually inspect.

Cursor's own CEO, Michael Truell, was quoted in December 2025 saying that vibe coding builds "shaky foundations" that eventually crumble. The CEO of the company most associated with vibe coding spent his Christmas warning people off vibe coding. That's the arc.

Notice what nobody in the contrarian camp is saying: nobody is saying voice was the problem. The complaints are all about the model layer — token efficiency, context windows, agent autonomy, code review. The mouth-to-machine pipeline is invisible in the critique. It's never the bug. It's just there.

That is the tell. When everybody — proponents and detractors — silently agrees that voice is fine and the argument is about everything else, voice has become infrastructure. It's stopped being a feature and started being plumbing.

The actual editorial: voice is the new typing speed

Here's the part where I tell you what I actually think, since I have been building for this exact moment for a couple of years now and I have opinions.

Typing speed used to be a developer skill. Not a glamorous one — nobody put "100 wpm" on a resume the way they put "Rust" — but it was real. The fastest engineers I knew in the 2010s could type at the speed of thought. Their fingers were a roughly transparent layer between what they wanted and what the editor showed. Slow typists were just slower. Same brain, same talent, but every idea took longer to get from inside their head to the file. It compounded across years.

That skill is now in the process of being retired the same way mental arithmetic got retired by calculators. Not because typing is gone, but because the ceiling moved. The new ceiling is articulation. Voice is the new typing speed. The engineers who can describe what they want in clean, structured natural language — the way you'd brief a sharp junior engineer — will produce more software than the engineers who can't. This is true even with the same model and the same tools.

And you know who is best at this skill right now? Not the AI Twitter mafia. Not the YouTube tutorial people. It's the senior engineers who have spent 15 years writing tickets, design docs, and Slack messages explaining systems to other humans. The skill they were already paid for has just become the bottleneck skill for the new workflow.

This also flips the old relationship between dictation and developer tooling. For most of the last decade, voice tools were assistive technology — for people with RSI, for people who couldn't type for medical reasons, for the visually impaired. They were considered a workaround for a deficit. Now they're the input method for the people pushing the frontier of what software gets built. Developers are dictating not because they have to, but because typing is now the slow part of their day.

Where Karpathy and I disagree

Here's the one thing I'll push back on, gently, in the great man's direction.

Karpathy uses SuperWhisper. He's been open about this since the original tweet. SuperWhisper is a great product — I have no beef with the team, they ship a polished app and they were early to the local-Whisper-on-Mac party. He probably also uses Wispr Flow now and then, given it's the consumer momentum leader. Both are fine choices for what he does, because what he does is mostly research, mostly personal projects, mostly things he tweets about anyway.

But for the people I actually talk to — engineers writing proprietary code at companies, people working on financial systems or medical data or unannounced products or anything covered by an NDA — the calculus is different. Sending audio of yourself describing your codebase to a third-party cloud transcription service is a non-starter. It doesn't matter how good the privacy policy is. It doesn't matter that the company swears they don't train on your data. The moment that audio leaves your machine, you've created an audit trail that didn't exist before, in a place you don't control, that your security team has not approved, that you cannot prove the deletion of.

This is why I built EmberType. Same input layer, no audit trail. OpenAI's Whisper, running entirely on your Mac, no audio leaving the device. Same dictation experience as the cloud apps — you push a hotkey, you talk, the text appears wherever your cursor is. Cursor, Claude Code, Terminal, Slack, anywhere. The difference is that nothing about your voice or what you said ever exists outside your laptop. And it's a one-time $49, not a subscription.

I'm not arguing Karpathy is wrong to use SuperWhisper. For his work, it's the right tool. I'm arguing that the product that wins long-term won't be the one you can't ship at work. The voice layer is becoming critical infrastructure for software development. Critical infrastructure that runs in someone else's cloud, with someone else's privacy policy, at someone else's mercy, will eventually get banned by the same enterprise IT departments that banned ChatGPT in 2023 and then quietly unbanned it once Claude Enterprise existed.

The local version is the one that survives.

SuperWhisper modes interface — the cloud-based Mac dictation app Karpathy used when he coined the term vibe coding

SuperWhisper, the dictation app Karpathy used in the original tweet. Great product. Also, audio leaves your device. Image via superwhisper.com.

What this actually means if you write code for a living

Strip out the editorial and there are three concrete things worth doing in the next 30 days, regardless of which voice tool you end up on.

One: spend an honest week dictating. Not "try it for an afternoon." A week. The reason is that the first three days are awful — your brain has not learned how to compose code-shaped prose out loud, and you'll feel like an idiot. By day four, the muscle starts to form. By day seven, you are noticeably faster than typing for some classes of work (long PR descriptions, ticket writing, briefing the agent, refactor explanations) and roughly tied for others (actual code-character-typing in places where AI completion isn't strong). The skill is real and it transfers, but it has a learning curve, just like touch typing did.

Two: learn the slash commands. Whether you're in Claude Code with /voice, in Cursor, or in your editor of choice with a system-wide Mac dictation app like EmberType or SuperWhisper or MacWhisper, the workflow is the same: push to talk, release to send. The friction is in the first 20 minutes of getting the hotkeys right and the last bit of resistance to actually talking out loud at your desk. Both are gone within a day.

Three: decide where your audio goes. This is the consequential one. If you work on personal projects or open-source code or research, use whatever feels best — Wispr Flow is the polished consumer experience, SuperWhisper is the OG, both are perfectly fine. If you write proprietary code at any kind of company with even a passing interest in security, find a tool that runs the model locally. EmberType is mine. There are others. The selection criteria is "does the audio leave my Mac?" If yes, your security team will eventually care.

The real story of the last six months

The story everyone is telling about the last six months is "vibe coding rose, then got demoted; agentic engineering is the future; Cursor is in trouble; OpenCode is the new hotness; the models keep getting better." All of that is true and all of that is the surface.

The story underneath it, the one I have been watching from the seat of building a dictation app, is simpler: voice quietly became the default way serious developers brief their machines, and almost nobody put that on the cover of anything. Karpathy himself has been saying it since day one. He just buried it in the second sentence of a tweet that everyone misread.

The term "vibe coding" died because it was always a joke about the model. The voice layer survived because it was never about the model — it was about the meat. And the meat, it turns out, is the bit that doesn't get refactored every six weeks by a new release from Anthropic or OpenAI.

The word changes. The throat doesn't.

The Voice Layer That Stays on Your Mac

EmberType runs Whisper AI locally — no audio leaves your laptop. Works system-wide in Cursor, Claude Code, terminal, IDE, anywhere. The dictation app you can actually ship at work.

Download EmberType Free

7-day free trial. $49 one-time after. No subscription, no account, no audio uploaded. macOS 14+, Apple Silicon.

Sources I leaned on for this

I want to be honest about what I read to write this — both because the editorial above only works if the facts under it are right, and because half the fun of an essay like this is following the thread back yourself.


Frequently Asked Questions

Did Karpathy actually retire the term "vibe coding"?
Sort of. At Sequoia Capital's AI Ascent 2026 fireside chat with Stephanie Zhan, Karpathy reframed "vibe coding" as the entry-level case and introduced "agentic engineering" as the professional discipline of coordinating fallible agents while preserving correctness, security, and maintainability. His own framing: "Vibe coding raises the floor. Agentic engineering is about extrapolating the ceiling." The viral term didn't get a funeral, but it got demoted from "the future" to "the prototype phase."
Did Karpathy really use voice when he coined "vibe coding"?
Yes. The original viral tweet from February 2, 2025 said it explicitly: "It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper." Voice was always the input layer. The model was the news. The throat was the unglamorous bit nobody wrote headlines about — but it's the bit that survived the term going out of fashion.
Does Claude Code really have a voice mode now?
Yes. Anthropic rolled out push-to-talk voice mode in Claude Code starting March 3, 2026, initially to about 5% of users on Pro, Max, Team, and Enterprise plans. You activate it with the /voice slash command, hold the spacebar to talk, release to send. OpenAI's Codex shipped its own voice mode about a week earlier, on February 26. Voice has moved from differentiating feature to baseline expectation across developer tooling in roughly 60 days.
Why is voice the bottleneck and not the model?
Because the models are now good enough that the human is the slow part. Karpathy says he hasn't written a line of code since December 2025 — he spends his day directing agents in natural language. The agents can implement faster than he can describe. The constraint moves to how fast you can articulate intent. Typing tops out around 80 words per minute. Speaking is 150-200. When the model isn't the limit, your throat is.
Why not just use SuperWhisper or Wispr Flow like Karpathy?
Use them if your code is personal projects, research, or open-source. They're great products. But for engineers writing proprietary code at companies — finance, healthcare, defense, anything covered by an NDA or compliance regime — sending audio of yourself describing your codebase to a third-party cloud transcription service is a non-starter. The product that wins long-term in the enterprise won't be the one you can't ship at work.
Is there a Mac dictation app that runs locally?
Yes. EmberType runs OpenAI's Whisper model entirely on your Mac — no audio leaves your device, no cloud transcription, no audit trail. It works system-wide in Cursor, Claude Code, terminal, IDE, anywhere you can type. $49 one-time, no subscription. It exists specifically to be the voice layer that's safe to use at work.
Steve Mount, builder of EmberType

Steve Mount

Builder of EmberType

I make EmberType, the offline dictation app for Mac — and I write everything on this blog myself, usually by dictating the first draft. Every comparison and recommendation here comes from running the tools on my own Macs, not from reading other people's reviews. More about me →

The Local Voice Layer for Mac

EmberType is 100% offline voice-to-text for macOS. Whisper AI on-device. No cloud, no subscription, no audit trail. The dictation app for engineers who can't ship cloud audio at work.

Download EmberType Free

macOS 14+ required. Apple Silicon only. $49 after 7-day trial.