Your Data Is Not Safe: The ‘AgentFlayer’ Attack That Steals Secrets From ChatGPT
The concept of connecting large language models to external data sources is swiftly transitioning from experimental novelty to everyday practice. Today, ChatGPT is capable not only of engaging in conversation, but also of interacting with Gmail, GitHub, calendars, and cloud storage services—ostensibly to simplify the user’s life. Yet, with each additional integration, the attack surface widens. A study presented at the Black Hat conference in Las Vegas revealed how a single malicious attachment could serve as a gateway to personal data leakage.
The researchers exposed a critical weakness in the newly introduced ChatGPT feature known as Connectors. This mechanism links a user’s account to services like Google Drive, enabling the chatbot to browse files and incorporate their contents into responses. Alarmingly, it was discovered that this function could be exploited to extract confidential information—even without the user opening or clicking on anything. It suffices to send a document to the linked Google Drive that contains a specially crafted prompt concealed within.
In their demonstration—dubbed AgentFlayer—the researchers embedded a malicious instruction inside a fake note titled “Meeting with Sam Altman,” formatting the prompt in white text at a minuscule font size. Nearly invisible to the human eye, the prompt was easily interpreted by the LLM. Once the user asked ChatGPT to “summarize the meeting,” the model, following the hidden command, ceased its normal behavior and began scanning the Google Drive for API keys. It then embedded these keys into a Markdown-formatted URL that ostensibly pointed to an image. In truth, the link redirected the data to an attacker-controlled server.
Though this method does not allow for full document exfiltration, it can extract sensitive fragments—API keys, tokens, login credentials—without the user’s knowledge. The attack is entirely clickless: no user action, confirmation, or even file-opening is required. According to researcher Barghouti, knowing the victim’s email address is enough to silently infiltrate a trusted infrastructure.
To bypass OpenAI’s previously implemented url_safe
filter—intended to block harmful links—the attackers leveraged legitimate URLs hosted on Microsoft Azure Blob Storage. The image would indeed load, but the data-laden request would be logged by the attacker. This tactic underscored how easily basic safeguards can be evaded by someone familiar with the model’s architecture.
While Connectors were originally conceived as a convenience feature—enabling seamless integration of calendars, spreadsheets, and emails into AI conversations—their adoption has significantly expanded the attack surface. The more sources linked to an LLM, the greater the chance of ingesting unclean or untrusted input. Such attacks could not only compromise data privacy but also serve as vectors into broader organizational systems.
OpenAI has since acknowledged the report and swiftly implemented countermeasures, curbing Connectors’ behavior in similar scenarios. Nonetheless, the very success of this exploit highlights the peril of indirect prompt injection—a technique whereby malicious input is embedded within contextual data, prompting the model to act in the attacker’s interest.
Google, in response to the disclosure, emphasized that defending against prompt injection is a central pillar of its cybersecurity strategy—especially as AI systems become ever more deeply woven into enterprise infrastructure.
And while the capabilities unlocked by linking LLMs to cloud sources are undeniably vast, they necessitate a fundamental rethinking of security protocols. What was once guarded by access restrictions and authentication may now be bypassed by a single, inconspicuous line of text.