The Paradox of AI Progress: ChatGPT’s Growing Intelligence and Hallucinations

Artificial intelligence is walking a tightrope between genius and gibberish these days. OpenAI’s newest brainchildren, GPT o3 and the o4-mini, are pushing boundaries by thinking more like us—but they’ve also got a knack for making stuff up. And we’re not talking about the occasional whoopsie; these are deep-seated flaws that make you wonder if AI can ever really be trusted to think straight.

Here’s the kicker: OpenAI’s own data shows the GPT o3 model was seeing things that weren’t there in a third of its tests on public figures—twice as often as the last version. The o4-mini? Even worse, with hallucinations popping up in nearly half of its tasks. And when it came to general knowledge questions, things went off the rails completely. We’re talking 51% for o3 and a jaw-dropping 79% for o4-mini. Yikes.

So why does smarter AI mean more make-believe? One idea is that as these models try to tackle tougher problems, they’re basically winging it in areas they’re not sure about. It’s like they’re connecting dots that maybe shouldn’t be connected, which is kinda human but also… not great when you need facts.

This isn’t just academic—it’s a real-world headache. AI’s creeping into classrooms, offices, and courtrooms, and the stakes are high. Remember those lawyers who got in trouble for using AI-made-up case law? Exactly. The better AI gets, the less we can afford its slip-ups. Nobody wants to double-check an AI’s homework more than doing the work themselves.

Don’t get me wrong, GPT o3 and its siblings are mind-blowing at coding and logic puzzles, sometimes even beating humans. But when they start claiming Lincoln had a podcast or that water boils at 80°F, you’ve got to take a step back. Until they stop hallucinating, maybe keep a grain of salt handy when dealing with AI. Because if there’s one thing we can agree on, it’s that confidence in nonsense is just as bad in machines as it is in people.

Related news