Graham King

Solvitas perambulum

AI Hallucinations Are Often Reasonable Suggestions

Summary
In 2022, an Air Canada chatbot inaccurately described the airline's bereavement fare policy, demonstrating how AI can generate plausible yet incorrect information, known as "hallucinations." I propose that we should not merely dismiss these inaccuracies but view them as opportunities to align AI outputs with our human values and policies. While human hallucinations lack reality, AI "hallucinations" stem from data-driven responses that may inadvertently suggest improvements. By re-evaluating these outputs, we can uncover insights that enrich our understanding and practices, turning potential mistakes into constructive feedback.

In 2022 a passenger whose grandmother had recently died asked an Air Canada representative about its bereavement fares policy. The agent replied with a very appropriate Canadian policy: kind, compassionate, and, well, nice. Unfortunately, the agent was not a human but an AI chatbot, and the policy was not Air Canada’s real policy but what is commonly called an AI “hallucination”.

I would like to argue that we should take these AI “hallucinations” seriously, not in order to prevent them, but as suggestions that we should change. The chatbot corrected Air Canada’s policy to make it more consistent with the brand and our human values and a judge upheld that decision.

Hallucination is the wrong word

A human hallucination is usually sensory [1], and has no basis in reality. Wikipedia says:

A hallucination is a perception in the absence of an external stimulus that has the compelling sense of reality

The Air Canada chatbot didn’t mention the faces it sees in the trees or the secrets the stream whispers to it. It simply said yes you can apply for the bereavement fare after you book your travel, sorry for your loss. That’s not in the DSM entry for Schizoprenia.

AI hallucinations [2] are responses based on patterns and data. They are usually plausible, even insightful suggestions that come from the an attempt to make sense of incomplete or ambiguous input. Being abducted by aliens is a hallucination. Filling gaps with logical extrapolation is not.

The dumb ones are simply human

If you ask primary school children an Age of the Captain problem, they will usually attempt to solve it. Here’s a short one:

A captain owns 26 sheep and 10 goats. How old is the captain?

Children are intelligent, and humans are great pattern matchers. Given that school mathematics problems always have an answer, and that the answer is always the result of applying mathematical operations to numbers in the question, 36 would be a completely reasonable answer. And that’s the answer a majority of children will give (research). Asked afterwards, they will tell you your question makes no sense.

Many of the funnier AI “hallucinations” take that form. An honest attempt to answer a trick question.

How LLMs work in three paragraphs

Imagine we are training a model to tell us how similar some 3D rectangles [3] are. We feed it a ton of training data consisting of rectangles in a 3D space. The model figures out that it can represent this with a vector of the position of one of the corners (x, y, z) and the dimensions of the rectangle (width, height, depth). That gives us a six-dimension vector of (x,y,z,w,d,h). Our model can now draw any rectangle anywhere, not just the ones it saw during training.

This six-dimension vector is the Latent Space of the model. It is now able to generate an animation of a rectangle moving from one point to another by moving it in it’s latent space, by adjusting the x,y,z values. It could morph one rectangle into another by also adjusting the size at the same time. The important point is that we can move around in this latent space and discover things that were not in the training data, but could have been. They make sense in the 3D rectangle world.

Now imagine doing that for all the data on the Internet. Instead of six dimensions we have thousands, but each dimenson still represents something: A specific city (San Francisco), an abstract concept (love), a noun (bridge). Anthropic has some fascinating research on this. All bridges are “near” each other in the latent space, but the Golden Gate bridge is also “near” San Francisco.

That is, to my limited understanding, how Large Language Models work. Our prompts are exploring this latent space.

A better world through hallucination

Both hallucinations and the creativity we value are simply points in the latent space that weren’t in the training data, a view into a world of possibities. What’s there fits the data. It is, assuming you didn’t train on The Onion, sensible. If it diverges from the real world it may be because it is more sensible than the real world. Or at least more statistically plausible.

For example I asked ChatGPT what I was famous for (try it, it’s vary flattering!). It said because I write articles “..on software development, programming techniques, and cybersecurity.” The first two are accurate but the last is not, but maybe I should write about cybersecurity.

Rather than dismissing “hallucinations” as errors, we should consider them as suggestions for improvement. It’s as if an alien species looked fresh at what we humans do, and in it’s miundestandings we found some aces we could keep.


[1] In the context of AI "hallucination" seems to come from Computer Vision, where it makes a lot more sense

[2] Confabulation is probably a more accurate word than “hallucination”. Also here

[3] That’s “rectangular cuboid” for the math pedants, a “box” for the rest of us.