AI Doesn't Hallucinate. It Makes Mistakes.

Calling AI errors “hallucinations” humanizes machines and inflates expectations. Language is the UI for trust; misuse becomes a shipped bug with churn, support cost, and legal risk. Treat wording like code: define terms, show process, and label errors precisely.

AI Doesn't Hallucinate. It Makes Mistakes.

Why the Difference is a Multi-Million Dollar Problem.


A Thoughtful Question

A thoughtful comment on a recent post of mine posed a brilliant question:

When we say an AI "hallucinates," is it a helpful metaphor or a dangerous distortion?

I love this question because it’s subtle, and the danger is completely unintentional.

This isn't about AI claiming to be human – it's about us giving it human qualities. And because we're the ones doing it, it feels safer.

It’s not.


The Real Stakes for Product Leaders

This debate opens up a profound challenge for anyone building or leading AI products.

The language we use isn't just fluff — it’s the user interface for expectation.

And when expectations don't match reality, you haven’t just created confusion. You’ve shipped a bug.


The Seductive Case for Metaphor

Let’s be fair. I use metaphors constantly. They are cognitive shortcuts.

  • We talk about the "flow" of electricity
  • Or a computer "virus"

Electricity doesn’t literally flow like a river. Your laptop isn’t catching a cold.

These are rhetorical tools to make the complex understandable.

The argument is that "hallucination" does the same for AI:

  • It gives people a hook to grasp the idea of a Large Language Model
  • Confidently generating plausible but entirely fabricated information

The goal? Enlightenment. The hope? That people will recognize the limitations.

I hope so, too. But hope is not a strategy.

The Expert’s Curse and The User’s Reality

The flaw in that hope is a classic bias: The Curse of Knowledge.

  • Engineers, PMs, and UX folks understand that “hallucination” is shorthand for "probabilistically coherent but factually incorrect synthesis."
  • End-users do not.

Just like they don’t know:

  • The physics of the cloud
  • The mechanics of their smart speaker
  • Or the biochemistry of medication

Nor should they have to. They rely on language to build a mental model. And those words, especially when they anthropomorphize the system, carry more influence than we admit.


Anthropomorphic Design = Inflated Expectations

A 2025 study (Frontiers in Computer Science) found chatbot users rated systems as more empathetic and capable simply because they looked and sounded more human, not because they actually performed better.

Anthropomorphic design inflated expectations even when performance didn’t improve.


We’ve Seen This Movie Before: Facebook’s Redefinition of “Friend”

A one-click, low-friction action. The result? A subtle devaluation of the original, high-trust word.

Now we’re doing it with "hallucination", a serious term tied to mental health, by applying it to a machine's error state.

The Smart Paradox

We call a device smart, then it says:

“Sorry, I didn’t understand that.” Ten. Times. In. A. Row.

The word sets an expectation. The product breaks it. The result? Frustration, not trust.


From Word Choice to Business Risk

This isn't a philosophical exercise. This is business risk, disguised as semantics.

When we use words like:

  • “think”
  • “understand”
  • “hallucinate”

We’re writing a contract with the user about what the product can do.

And when it fails?

The user doesn’t say:

"The autoregressive model failed to probabilistically align to ground truth."

They say:

“Your product is broken.”
“This thing is stupid.”
“It’s unreliable.”

You've introduced a bug, not in code, but in the user’s expectations.


This bug causes:

  • Customer Dissatisfaction & Churn A user expecting a thinking partner who gets a glorified autocomplete will feel misled.
  • Increased Support Costs Misinterpretation = Support tickets. Expensive support tickets.
  • Reputational Damage Every viral "hallucination" = lost trust, mockery, brand hit.
  • Catastrophic Legal and Compliance Failures Not hypothetical.

Adjacent lessons about expectation-setting


Volkswagen’s “Clean Diesel” Lie
  • Language wasn’t optimistic. It was deceptive.
  • The result? Billions in fines, long-term brand damage.
Microsoft’s “Unlimited” Storage Trap
  • People believed it.
  • Microsoft walked it back.
  • The backlash was swift.

"Unlimited" became a trust-breaking metaphor.

These aren’t metaphor fails, they’re expectation bombs.


High Stakes in AI

Imagine your AI assistant gives medical advice.

It sounds confident. It’s dangerously wrong. You add a disclaimer.

Too late.

Disclaimers protect you from lawsuits. They do not protect you from user loss, internal audits, or headlines.

Why Product and UX Leaders Should Care

This is not just a marketing issue. This is a core product issue.

Product and UX leaders are responsible for how users interpret and trust the systems they build.

When you write:

  • “The AI is thinking”
  • “It hallucinated”

You’re not being clever. You’re shaping user behavior.


Poor language choices:

  • Create hidden UX debt
  • Trigger support tickets and churn
  • Increase legal and compliance risk
Your words are part of your product’s functionality. If you don’t define the language, the user will – and you’ll lose control of the narrative.

The Solution: A QA for Language

The fix isn't to eliminate metaphors. The fix is to be ruthlessly intentional.

Treat language like code.

QA your lexicon.


Instead of saying the AI "thinks" –> say:

  • Show Processing
  • Show Plan
  • Show Steps

Instead of saying the AI "hallucinates" –> say:

  • Generated an Inconsistent Output
  • Cited Fabricated Information
  • This is a Synthesis Error

Product Language = Expectation Management

This is about:

  • Managing expectations
  • Mitigating reputational and legal risk
  • Protecting user trust

It may feel easier to use shiny, human-like words.

But it’s a lot more work to fix what happens when users believe them.


The LLM is powerful enough as a processor of information. We don’t need to pretend it’s a thinker.

💬
Your turn
What other "harmless" words are we using that might be introducing invisible bugs into our user experience?

References:

  • Ma, N., Khynevych, R., Hao, Y., & Wang, Y. (2025). Effect of anthropomorphism and perceived intelligence in chatbot avatars of visual design on user experience: Accounting for perceived empathy and trust. Frontiers in Computer Science.
  • Merken, S. (June 22, 2023). New York lawyers sanctioned for using fake ChatGPT cases in legal brief. Reuters.