When AI Sounds Right

When AI Sounds Right

The question seems straightforward. ChatGPT’s answer is confident and well-structured. Asked to check its reasoning, the model changes its mind. Pushed back, it changes again, sometimes circling to where it started. Each version sounds equally right. There is no convergence: just oscillation, the model generating plausible alternatives with nothing to stop the cycle and no way to tell which was correct. The loop is familiar to anyone who has used AI for something that matters, a structural feature of how these systems produce text.

The concepts needed to understand why already exist. They come from epistemology, the study of how knowledge works and what makes belief trustworthy, developed over centuries by Hume, Peirce, James, Davidson, and McDowell. Simon Blackburn’s Truth (2017) pulls this tradition together. He was writing about human knowledge, not AI. But the framework fits, in ways that should bother us.

Why self-critique fails

When the model is asked to critique its own answer, no new information enters the conversation. The model generates a new coherent narrative built on the premise that the previous answer had problems. Question that, and it generates another. Each revision is internally consistent. None bring in anything new. McDowell called this frictionless spinning in a void: a system maintaining internal consistency without touching the world. Blackburn picks up the image in Truth.

The mechanism behind this is coherence: internal consistency. AI output is coherent. Claims support each other, the argument flows, everything holds together. That’s what makes it convincing, and it may be the only thing AI output reliably has. Coherence is cheap. A conspiracy theory can be remarkably coherent. So can a well-told fiction. The model can produce equally coherent arguments for opposite positions, because coherence is a property of the text, not of its relationship to reality.

What the output has never survived

AI output, by default, has been tested against nothing. It was generated, not checked. Blackburn’s central idea is controlled coherence, coherence disciplined by contact with something outside the system. The “something outside” provides friction: resistance the system cannot generate for itself. “It is when we get nasty surprises,” Blackburn writes, “that the world bares its teeth.”

Coherent and correct look the same as coherent and wrong. The only way to tell the difference is to introduce friction from outside. The question is not “is this output correct?” but: has it been exposed to anything that could have revealed it to be wrong?

This is not a call for more caution but a vocabulary for being precisely careful, for understanding what kind of checking matters, especially when acting on output or presenting it to others.

Where friction comes from

Most AI failures are quiet. The model didn’t err spectacularly. Nobody exposed the output to anything at all. The answer looked right, so it was accepted. Nobody ran the code or checked the citation, and the person who would know was never asked.

AI is asked to help prepare a presentation. It provides a statistic attributed to a specific study. The study doesn’t exist. That gap between claim and reality is friction in its strongest form: the world answering back, indifferent to what was claimed about it. The reference exists or it doesn’t, and that kind of friction resolves questions.

Changing the prompt strategy (asking “what could go wrong?” instead of “is this good?”, requesting three options instead of one recommendation) activates different reasoning paths and partially counteracts the model’s tendency toward agreement and uniform confidence. This can help. Asking a model for a business plan and then for its three biggest risks often surfaces problems the original response glossed over: a competitor it ignored, a dependency it assumed away.

But this kind of structural friction is weaker in principle. Blackburn, drawing on Peirce, distinguishes genuinely independent inquiry (different people with different knowledge) from structural independence, where different frameworks are applied by the same mind. Asking the model to argue both sides is structural independence: different paths, same underlying limitations. It’s not the same as a second opinion. A belief earns its place, in Peirce’s terms, through “proper pedigree — it should be the result of some processes of enquiry and interpretation that have earned their keep.” Better structure improves the pedigree. It doesn’t substitute for contact with the world.

The friction only a person can provide

AI drafts a reply to a difficult client. The draft is polished, professional, and completely wrong for the relationship. It concedes a point that should never be conceded. It uses a tone that would alarm a client known for years. The model had no access to any of this context. But the person reading the draft does.

The most important friction is not a test or a technique. It’s the judgment of the person reading the output: context, history, constraints that aren’t written down. Hume described the good evaluator: “Strong sense, united to delicate sentiment, improved by practice, perfected by comparison, and cleared of all prejudice.” Blackburn returns to this portrait throughout Truth. Applied to AI, this means clear thinking about the output, sensitivity to when something is off before articulating why, experience built from checking and discovering where AI errs, and willingness to reject a confident answer that contradicts established knowledge.

That’s the qualified critic, shaped by doing the work of checking, not by knowing everything. The model generates candidates; the human provides the selection pressure. This capacity improves only through practice. Passive acceptance builds nothing.

Judge the method, not the result

“Is this output true?” is often unanswerable. The expertise, time, or means to verify every claim may not be available. The pragmatist tradition (Peirce, James) poses a different question: was it produced by a process that has earned its keep?

Peirce: “The opinion which is fated to be ultimately agreed to by all who investigate, is what we mean by the truth.” The emphasis is on “all who investigate.” Truth emerges from the convergence of independent inquirers applying sound methods. A single person checking a single model falls short of this ideal, which is why external friction matters more than any prompting technique.

If the model was prompted and the first response accepted, no friction entered the process. If the claims were checked, pushed back on, and the output survived contact with independent knowledge, the process has more to recommend it. Nothing has been proven true, but the output has been subjected to methods that tend to catch errors. The pragmatist conclusion: these are methods of inquiry, not truth-finding machines. The measure is the quality of the inquiry, not whether it delivers certainty.

This essay is no exception. It’s coherent text making claims about epistemology. The friction is in Blackburn’s book.

When to stop

The loop at the beginning of this article had no stopping rule. Nothing indicated when to stop because nothing indicated whether the process was getting closer. The opposite failure is real too: checking every claim, verifying every citation, ending up slower than working without AI at all. If friction is the medicine, overdose is paralysis.

The pragmatist framework provides a stopping rule for both. Peirce:

Enquiry is not standing upon the bedrock of fact. It is walking upon a bog, and can only say, this ground seems to hold for the present. Here I will stay until it begins to give way.

There is never bedrock. But there is a point where additional checking wouldn’t change what happens with the output. The statistic was checked, verified against the source, and a colleague confirmed the framing. Would one more check change the presentation? The ground holds.

The ground gives way when something changes. New information contradicts what was accepted. An expert disagrees. The output fails in use. When that happens, the work resumes from where it is, with what is now known.

There is no need to reach truth with a capital T. “This output is reliable enough to act on” and “this output is true” may be the same claim. That small doubt about a confident AI answer, the one that leads to asking the model to check itself, now has a name. What is missing is not more coherence but friction, and friction is found not in the conversation but outside it.