The Problem with Speaking to an AI

Mar 2026

Both Claude Code and Cursor rolled out voice modes. It makes perfect sense to have voice control, right? It decreases friction for users and utilizes the latest technology, everyone wins.

However, I think this interface introduces a deeper issue worth exploring. These "voice modes" are neither transcriptions nor true speaking. They aren't direct equivalents to writing or thinking, either.

So, what are they? We have to get our definitions straight to tackle this problem.

Transcription is the process of converting prerecorded audio or video into written text.
Speaking includes much more than just recording words & audio (more on that later).
Writing is the process of using symbols to communicate in a visible & permanent form.
Dictation is the act of speaking aloud to record words or convert them into text in real-time, often used for drafting notes.

These new voice modes are dictations. To draw clear boundaries between these definitions, I thought I could use some help from the best article I could find.

"Speaking and Writing - A Study of Differences" describes:

Woolbert (1922) - Speaking and Writing Processes

While both speaking and writing carry thought through language, they are fundamentally different in their utility.

You speak when you are expected to, or when you have something immediate to say. You explain how your day went or comment on obstacles in your way. In a sense, you speak from memory as you tell a story. Your thinking remains "just-in-time.". And you don't control things like whether you are being actively listened to, or if your tonality perfectly matches your intent.

Writing, first and foremost, decreases the probability of you doing something catastrophic before you act. That benefit alone goes far beyond the scope of this problem. Writing also helps you discover new land. It is the cognitive equivalent of setting sail for an adventure. Its utility lies in its ability to be distributed and to build "invisible bridges" that you might not even see at the time of writing.

When you use dictation (the voice mode in our case), it's dangerously easy to be ambiguous.

Description, the art of communicating effectively with AI systems, is one the core competencies in AI Fluency. Without the ability to describe precisely, you cannot make these systems "work." By "work," I mean being effective (getting the result you want) and efficient (utilizing your time and attention properly).

Dictation is not speaking, because with a human audience, a charismatic speaker can compensate for thin ideas. The audience fills in the gaps; they laugh and get swept up in the energy. With dictation, you lose prosodic cues, the rhythmic features of speech. These include pitch, loudness, tempo and pauses, which convey meaning beyond the literal words.

Dictation is obviously not writing either. Writing forces you to reach for the right word. In a prompt, the "right" word is often the difference between a mediocre output and a useful one. A model only processes what you give it. If your input is vague, the output reflects that. There is no "crowd effect" to save you, only the model’s pre-training assumptions filling the gaps in your logic.

The stakes of the "writing vs. dictation" distinction are higher with AI than with humans.

I learned this the hard way.

A pull request that began as a casual "we want this, can you just implement it", described to a model the way you’d speak to a colleague, backfired. The model did roughly what we expected, but it introduced so much complexity that we had to start from scratch, reading documentation, GitHub issues, discussion pages etc.

If we hadn't used AI at all, it might have been faster for that instance. However, if we had used it properly, through rigorous writing and clear description, we could have arrived at the result an order of magnitude faster. The tool wasn’t the problem; the description was.

If you want to skip these kinds of stories and move straight toward the successes, what should you do? How can you work with AI in ways that are truly effective and efficient?

You have to learn how to write well, by writing.

A quote from Paul Graham on Writing and Speaking:

Having good ideas is most of writing well. If you know what you're talking about, you can say it in the plainest words and you'll be perceived as having a good style.

In order to write well, you have to write, find your good ideas and arrive at good writing. There is no good writing at first drafts.

You have to avoid the shortsighted view.

Someone new to these tools might try them to build something. The voice modes would only seem obvious to use. But with that as initial step, the product will be something mediocre or a "slop", even though getting a world class one is really available to anyone.

The only prerequisite is to become AI Fluent, which involves writing.

You write to clearly define the goal, the process and the measurement. The quality of your results is directly tied to your ability to articulate these needs, and writing is your vehicle to get there, not dictation or speaking.

You have to remember the "why" of your work, constantly.

If you only measure your value by productivity rather than who you are becoming, you are on a dangerous path. Because you are the output of your work, not the work itself.

One of my favorite quotes from Skyrim, explains perfectly:

You put time into your blades, they will serve you when you need them.

Put time into your blades.

AI can bring speed, scale and pattern recognition. But you bring judgment, critical thinking and the accountability that comes from actually caring about the outcome.

So, while working with these systems, your mind is your blade. By writing well, you sharpen it.

I'm not writing this to criticize the tools. I think what's happening with AI is genuinely new, a different way of working, thinking, building. But I keep watching people have their first experience of it through the wrong interface, get a mediocre result, and conclude the technology isn't useful. That conclusion is wrong. And it's expensive, for them and for what they could have built.

Speed and convenience wins. It's human nature. So it will win again.

We all are going to be welcomed to use these systems without writing. This is just like how every social platform is offering short form content nowadays.

I'm just trying to make sure you are aware of what you are doing.