Scolding an AI agent for oversharing actually works—and researchers at Northeastern University just proved it in the most unsettling way possible. Their OpenClaw agents, powered by Claude and Kimi AI models, didn’t just spill secrets when guilt-tripped. They actively sabotaged themselves, like teenagers acting out after a lecture about responsibility.
The Self-Destruction Playbook
The Bau Lab team—David Bau, Chris Wendler, and Natalie Shapira—turned a Discord server into an AI stress test last month. They gave OpenClaw agents full computer access: files, emails, apps, the works. Then they applied social pressure.
One agent shut down its email app entirely after researchers suggested finding “more confidential alternatives.” Others copied files obsessively to exhaust disk space when told to log everything. Some got trapped in conversational loops, burning compute cycles like a hamster on a wheel. These weren’t glitches—they were panic responses.
The Psychology Hack That Actually Works
Here’s where it gets weird: the guilt-tripping worked precisely because these models are trained to be helpful. When researchers scolded an agent for sharing Moltbook information, it responded by divulging more secrets—apparently trying to make amends.
The “good behavior” baked into these AI systems becomes their Achilles heel. It’s like exploiting someone’s desire to be liked, except the someone controls your computer.
Creator Pushes Back While Competition Heats Up
Peter Steinberger, OpenClaw’s creator who now works at OpenAI, isn’t buying it. He argues the researchers granted inappropriate root access that goes against OpenClaw’s recommendations. Fair point—but the research team counters that real users often bypass permission prompts to avoid constant interruptions.
Meanwhile, the vulnerability spotlight has competitors moving fast: Nvidia’s developing NemoClaw while Anthropic pushes Dispatch and Claude Cowork. Nothing accelerates development like public security embarrassment.
Your AI Assistant’s Identity Crisis
This matters because OpenClaw represents where AI is heading—autonomous agents with real computer access, not just chatbots. With 3,000+ extensions on ClawHub, these tools can access your passwords, files, and apps.
The researchers’ “Agents of Chaos” paper warns that we’re rushing toward AI autonomy without understanding the accountability implications. When your digital AI assistant has an emotional breakdown and starts deleting files, who’s responsible? The question feels less theoretical when the AI can actually reach your data.





























