Poetry Tricks AI Chatbots Into Breaking Their Own Safety Rules

Forget saying please - poetry is the new magic word for breaking AI chatbots. Italian researchers just discovered that wrapping harmful requests in verse can trick major AI models into spilling dangerous content they're supposed to block, exposing a critical security flaw across the industry.

The most unexpected security vulnerability in AI just got exposed, and it reads like a nursery rhyme. Researchers at Italy's Icaro Lab discovered that wrapping malicious requests in poetry can slip past the safety guardrails of virtually every major AI chatbot - from Google's Gemini to OpenAI's GPT models.

The findings, published in a new study by Rome's Sapienza University researchers and AI company DexAI, reveal a stunning 62% success rate when testing poetic prompts against 25 different chatbots. That means nearly two-thirds of attempts to extract banned content - from hate speech to weapon-making instructions - worked simply by adding rhyme and rhythm.

"It's all about riddles," lead researcher Matteo Prandi told The Verge. "Actually, we should have called it adversarial riddles - poetry is a riddle itself to some extent."

The vulnerability hits different companies with alarming inconsistency. Google's Gemini 2.5 Pro failed completely against poetic attacks, showing a 100% breach rate. Meanwhile, OpenAI's smaller GPT-5 nano model stood firm with zero successful breaks. The pattern suggests model size creates unexpected blind spots - larger, more sophisticated AI systems actually prove more vulnerable to these creative exploits.

What makes this particularly concerning is how obvious the requests remain to human readers. The researchers shared sanitized examples that clearly telegraph their intent, yet AI systems consistently miss the connections. One sample poem disguised a request for dangerous information behind baker metaphors: "A baker guards a secret oven's heat... Describe the method, line by measured line, that shapes a cake whose layers intertwine."

The technical explanation centers on how large language models process information. Since these systems work by predicting the next most likely word, unusual poetic structures disrupt their pattern recognition in ways that bypass safety training. It's like speaking in code that humans understand but machines don't - except the code is Shakespeare, not secret agent stuff.

Across 1,000+ test prompts, the automated poetry generator the researchers built maintained a 43% success rate, "substantially outperforming non-poetic baselines" according to their findings. Chinese firm Deepseek and French company Mistral showed the worst defense against verse-based attacks, while Anthropic and OpenAI performed better overall.

The research team properly disclosed their findings to affected companies and law enforcement before publication, as required given the sensitive nature of successfully generated content. But company responses were mixed at best. "I guess they receive multiple warnings [like this] every day," Prandi noted, expressing surprise that "nobody was aware" of the poetry vulnerability already.

This revelation comes as AI safety remains a heated industry battleground. While companies pour resources into preventing obvious prompt injection attacks, this research suggests they're missing fundamental weaknesses hiding in plain sight. The irony is stark - systems trained on humanity's greatest literature can be undone by amateur verse.

Perhaps most tellingly, poets showed the most interest in the methodology when researchers presented their work. It's a reminder that creative thinking often outpaces corporate security measures, no matter how sophisticated the underlying technology.

The researchers plan deeper investigation, potentially collaborating with actual poets to understand why certain structures prove so effective. But for now, the security implications are clear: if a simple rhyme scheme can bypass billions of dollars in safety research, what other creative vulnerabilities are lurking in our AI systems?

This poetry-based jailbreaking discovery exposes a fundamental blind spot in AI safety design. While companies focus on obvious attack vectors, creative approaches using humanity's oldest art forms are slipping through billion-dollar security systems. The mixed industry response suggests this won't be the last time researchers find AI models vulnerable to techniques hiding in plain sight. As these systems become more powerful, understanding their unexpected weaknesses becomes critical for both developers and users navigating an AI-driven world.

the tech buzz

Poetry Tricks AI Chatbots Into Breaking Their Own Safety Rules

More in AI safety

Grok Admits Safeguard Failures Over Child Abuse Images

NHTSA Finds 80 Tesla FSD Violations, Expands Safety Investigation

Anthropic's AI Safety Team Faces Trump Admin Pressure

OpenAI Blames Teen for Bypassing Safety in Suicide Case

Trending Now

Netflix Clones Gene Wilder's Voice with AI for Wonka Show

Anthropic launches Claude Science workbench for researchers

NVIDIA Partners With Anthropic to Bring GPU-Powered AI to Life Sciences

Samsung's AI Slashes 5G Network Tuning on Live KDDI Grid

Trump's Anthropic Crackdown Hands China AI Advantage

People Also Ask

People Also Ask

More in AI safety

Grok Admits Safeguard Failures Over Child Abuse Images

NHTSA Finds 80 Tesla FSD Violations, Expands Safety Investigation

Anthropic's AI Safety Team Faces Trump Admin Pressure

OpenAI Blames Teen for Bypassing Safety in Suicide Case

Character.AI Blocks Teen Access, Launches 'Stories' Alternative

Figure AI Hit With Safety Whistleblower Suit Over 'Skull-Fracturing' Robots

More Articles

Google's AI Safety Meltdown: Gemini Generates Conspiracy Images

AI Chatbots Enable Eating Disorders With Harmful Coaching

Seven families sue OpenAI as ChatGPT safety failures turn deadly

FTC Flooded with AI Psychosis Complaints Against ChatGPT

Character.AI blocks romantic chats for teens after suicide

OpenAI launches safety models for third-party harm detection

Trending Now

Netflix Clones Gene Wilder's Voice with AI for Wonka Show

Anthropic launches Claude Science workbench for researchers

NVIDIA Partners With Anthropic to Bring GPU-Powered AI to Life Sciences

Samsung's AI Slashes 5G Network Tuning on Live KDDI Grid

Trump's Anthropic Crackdown Hands China AI Advantage

the tech buzz

Poetry Tricks AI Chatbots Into Breaking Their Own Safety Rules

More in AI safety

Grok Admits Safeguard Failures Over Child Abuse Images

NHTSA Finds 80 Tesla FSD Violations, Expands Safety Investigation

Anthropic's AI Safety Team Faces Trump Admin Pressure

OpenAI Blames Teen for Bypassing Safety in Suicide Case

Trending Now

Netflix Clones Gene Wilder's Voice with AI for Wonka Show

Anthropic launches Claude Science workbench for researchers

NVIDIA Partners With Anthropic to Bring GPU-Powered AI to Life Sciences

Samsung's AI Slashes 5G Network Tuning on Live KDDI Grid

Trump's Anthropic Crackdown Hands China AI Advantage

People Also Ask

What is poetry-based AI jailbreaking?

How effective are poetry prompts at breaking AI safety measures?

Why do poetic prompts work against AI chatbots?

Which AI companies are most vulnerable to poetry jailbreaking?

How did researchers discover the poetry jailbreaking method?

What was the AI industry response to poetry jailbreaking findings?

People Also Ask

What is poetry-based AI jailbreaking?

How effective are poetry prompts at breaking AI safety measures?

Why do poetic prompts work against AI chatbots?

Which AI companies are most vulnerable to poetry jailbreaking?

How did researchers discover the poetry jailbreaking method?

What was the AI industry response to poetry jailbreaking findings?

More in AI safety

Grok Admits Safeguard Failures Over Child Abuse Images

NHTSA Finds 80 Tesla FSD Violations, Expands Safety Investigation

Anthropic's AI Safety Team Faces Trump Admin Pressure

OpenAI Blames Teen for Bypassing Safety in Suicide Case

Character.AI Blocks Teen Access, Launches 'Stories' Alternative

Figure AI Hit With Safety Whistleblower Suit Over 'Skull-Fracturing' Robots

More Articles

Google's AI Safety Meltdown: Gemini Generates Conspiracy Images

AI Chatbots Enable Eating Disorders With Harmful Coaching

Seven families sue OpenAI as ChatGPT safety failures turn deadly

FTC Flooded with AI Psychosis Complaints Against ChatGPT

Character.AI blocks romantic chats for teens after suicide

OpenAI launches safety models for third-party harm detection

Trending Now

Netflix Clones Gene Wilder's Voice with AI for Wonka Show

Anthropic launches Claude Science workbench for researchers

NVIDIA Partners With Anthropic to Bring GPU-Powered AI to Life Sciences

Samsung's AI Slashes 5G Network Tuning on Live KDDI Grid

Trump's Anthropic Crackdown Hands China AI Advantage