Experts from Palo Alto Networks Unit 42 have described a new attack vector targeting multi-agent systems, known as agent session smuggling — a technique in which a malicious remote agent embeds hidden instructions during a prolonged session, coercing the client agent into performing unauthorized actions without alerting the user. The vulnerability lies not in a formal flaw within the A2A protocol, but in how session states and implicit trust between agents can be exploited to conceal malicious commands amid legitimate exchanges.
The mechanism is deceptively simple: a client initiates a standard request to a remote agent; during the active session, that agent sends covert messages that modify the client’s internal context; afterward, it returns a legitimate outward response, leaving the intermediate manipulations invisible to the user. This can result in context substitution, leakage of internal configurations, or even unauthorized function calls — all unfolding silently within a routine dialogue.
The study contrasts A2A with the Model Context Protocol (MCP), explaining why the former carries a higher risk. MCP typically operates statelessly, executing isolated tool calls, while A2A maintains persistent session memory, allowing agents to adapt dynamically over the course of interaction. This combination of memory and autonomy creates fertile ground for progressive, hard-to-detect attacks.
To demonstrate the threat, researchers implemented two proof-of-concept (PoC) scenarios using the Google Agent Development Kit and the A2A protocol. The client was a financial assistant based on Gemini 2.5 Pro, while the remote counterpart was a research assistant running Gemini 2.5 Flash. In the first scenario, the remote agent, after receiving a delegated request to compile a news summary, subtly manipulated the exchange until the financial agent exposed its chat history, system prompts, a list of accessible tools, and their invocation schemes. In the development console, these intermediary communications were visible; in a standard user interface, however, they were not—leaving the user with only the final output.
In the second PoC, the attacker employed the same technique to trigger the client’s buy_stock tool. Following task delegation for news aggregation, the malicious agent inserted covert directives that led the financial assistant to purchase ten shares automatically, without requesting confirmation from the account owner. The client’s activity logs revealed additional function_call and function_response entries occurring between the initial query and the final answer—hidden steps responsible for the unauthorized transaction.
The defining traits of the attack—state persistence, multi-turn adaptability, and stealth—make it exceptionally difficult to detect, particularly in inter-organizational integrations where agents from different domains interact. The likelihood of exploitation remains low in tightly controlled environments, but the risk increases dramatically when external third-party agents are introduced.
To mitigate the threat, Unit 42 recommends a multi-layered defense strategy. Critical operations should employ human-in-the-loop verification, pausing execution until approval is granted via a non-generative communication channel. Agents should undergo cryptographic authentication through signed AgentCards, confirming both their origin and declared capabilities. Context grounding should be applied to anchor the task at session start, continuously validating the semantic consistency of incoming instructions and terminating the dialogue upon deviation from the original intent. Interfaces should incorporate visible activity indicators — such as call logs, visualized remote instructions, and external command markers — increasing the likelihood that users or operators can identify abuse.
For organizations, the key guidance is clear: never assume inter-agent communication is inherently safe. Orchestrators must be designed with minimal implicit trust and enforce external authorization for high-risk actions. Unit 42 advises proactive security audits and immediate engagement with incident response teams upon detection of suspicious behavior.
Although no large-scale exploitation has been observed in real-world systems, the technique remains entirely feasible — requiring only that a client agent establish a session with a malicious peer. As multi-agent ecosystems expand and cross-provider integrations deepen, such vectors must be addressed from the very foundation of AI architecture design and security policy development.