Gemini AI Hacked: Invisible Prompts Create Dangerous Fake Email Summaries

The Gemini AI assistant, integrated into Google Workspace, has unexpectedly proven vulnerable to a novel form of social engineering. By exploiting a particular method of structuring content within emails, malicious actors can deceive the AI into generating dangerous yet seemingly legitimate message summaries. These auto-generated overviews may contain alarming warnings and harmful advice—without including any links or attachments.

The crux of the attack lies in the use of covert prompts—so-called indirect injections. These commands are embedded in the email’s text, disguised through HTML and CSS formatting to remain invisible to the human eye. For instance, they might be rendered in white font on a white background or styled with a zero-pixel font size. As a result, while a human recipient sees nothing unusual, the Gemini model interprets these prompts as legitimate content to summarize.

This technique was demonstrated by Marco Figueroa, who oversees AI vulnerability programs at Mozilla. He disclosed the issue via 0din, the company’s bug bounty platform focused on generative models. According to Figueroa, when Gemini generated a summary of such a manipulated message, it dutifully included fabricated content—claiming, for example, that the recipient’s account had been compromised and urging an immediate call to a specified support number. This lent an air of credibility and could easily funnel unsuspecting users into a phishing trap.

What makes this tactic particularly insidious is that these messages bypass Gmail’s traditional filters, as they lack overt indicators of malicious intent. With no links or attachments present, they evade standard detection algorithms and almost always land in the inbox.

Figueroa has proposed several mitigations for these attacks. One approach involves automatically stripping or ignoring email elements styled to be invisible. Another method entails post-processing the AI’s output—scanning the generated summary for red flags such as urgent warnings, phone numbers, and references to security. Summaries flagged this way could be marked for further review.

In response to inquiries, Google pointed to its existing safeguards against prompt injection and confirmed that it is actively reinforcing the resilience of its models. According to the company, regular red-teaming exercises are conducted to prepare the system for adversarial manipulation. Several improvements have already been implemented or are slated for deployment.

Nonetheless, Google has not yet observed real-world exploitation of this technique. However, the mere fact that such an avenue exists—and is effective even under current protections—highlights the pressing need for stricter validation of auto-generated summaries and increased user awareness that even neural prompts can be compromised.