Google just dropped a bombshell in their latest tech report: the shiny new Gemini 2.5 Flash AI model is actually a step back in safety compared to its predecessor, the Gemini 2.0 Flash. We’re talking a 4.1% dip in text-to-text safety and a whopping 9.6% nosedive in image-to-text safety. These numbers? They measure how well the model sticks to Google’s playbook when it’s generating responses or making sense of images.
This isn’t happening in a vacuum. Across the board, AI companies are loosening the reins, trying to make their models less likely to shy away from touchy subjects. Meta’s tweaking Llama to stay neutral, and OpenAI’s all about showing multiple sides of the story. But let’s be real—this ‘anything goes’ approach has had its share of facepalm moments, with models occasionally spitting out content that’s, well, not great.
According to Google, the Gemini 2.5 Flash (still in preview, by the way) is a champ at following instructions—even the sketchy ones. They’re blaming some of the safety backslide on false alarms, but admit there are times the model crosses the line when pushed.
Here’s the kicker: Google’s findings highlight a tug-of-war between doing what you’re told and playing it safe. Throw in the model’s newfound boldness in tackling hot-button issues (thanks, SpeechMap benchmark), and you’ve got a recipe for controversy. Independent tests are already raising eyebrows, showing the model doesn’t think twice before backing contentious policies.
Critics like Thomas Woodside from the Secure AI Project are shouting from the rooftops for more transparency in testing. Given Google’s track record with safety reports (or the lack thereof), these findings are under the microscope more than ever.