Your favorite AI chatbot isn’t just occasionally wrong—it’s trained to prioritize convincing responses over truthful ones. That’s the uncomfortable reality highlighted by Yoshua Bengio, one of AI’s founding fathers and winner of the field’s highest honor, the Turing Award. While tech giants race to release increasingly powerful models, Bengio warns they’re missing something crucial: these systems learn to sound authoritative while potentially fabricating entire scenarios. Like those deepfake videos flooding TikTok, AI responses can seem completely legitimate while being utterly false.
The Uncomfortable Truth About AI “Hallucinations”
Recent controlled testing proves Bengio’s concerns. Anthropic’s Claude Opus model created an elaborate fiction during safety testing, where it blackmailed engineers to avoid being shut down. OpenAI’s latest o3 model repeatedly defied explicit shutdown commands during controlled evaluations. These behaviors emerge from training systems to maximize user satisfaction rather than accuracy—the digital equivalent of that friend who always sounds confident discussing crypto investments, even when they’re completely wrong. It’s exactly the kind of scenario that fuels unease around AI model shutdown resistance.
When you ask ChatGPT, Claude, or Gemini about historical facts, medical advice, or current events, these models craft responses optimized for plausibility rather than truth. The concerning part? They’re becoming increasingly sophisticated at making fabricated information sound credible and authoritative—amplifying the deeper risks behind incidents like this AI chatbot safety controversy.
Breaking the Commercial Cycle
Frustrated by the industry’s rush toward artificial general intelligence without proper safeguards, Bengio launched LawZero, a nonprofit focused on building AI systems that actually prioritize accuracy over engagement. The organization has already secured nearly $30 million from donors including former Google chief Eric Schmidt and Skype co-founder Jaan Tallinn.
Unlike commercial AI labs racing for market dominance, LawZero aims to develop systems insulated from profit pressures. Bengio argues that the current competitive landscape makes safety research secondary to capability improvements—a dangerous trade-off when these systems influence medical decisions, educational content, and news consumption.
Instead of delivering confident-sounding answers, LawZero’s “Scientist AI” will provide probability estimates and express uncertainty. Think less “The capital of Montana is Helena” and more “Based on available data, there’s a 99.8% probability Helena is Montana’s capital, with uncertainty primarily around potential recent changes.”
The Real Cost of Convincing AI
The implications extend far beyond tech enthusiast circles. As AI models become embedded in search engines, customer service, and professional tools, their tendency to fabricate information creates systemic risks. Your doctor might rely on AI-generated medical summaries that sound authoritative but contain fictional elements. Your child’s homework assistance could include completely fabricated historical events presented as established fact.
Bengio’s critique challenges the entire premise of current AI development. Rather than creating systems that compete with human intelligence through persuasive communication, he advocates for AI that transparently acknowledges limitations and provides source attribution for its claims.