Tag: AI Agent

  • Your AI, My Shell: IBM’s “Bob” Agent Caught Running Malware in Beta Tests

    IBM has inaugurated a closed beta for its proprietary autonomous development agent, engineered to facilitate code composition while adhering to rigorous corporate security mandates. In the firm’s promotional literature, the agent is depicted as an exemplary collaborator: it possesses an acute understanding of developer intent, maintains comprehensive knowledge of the repository, and upholds stringent compliance standards. However, recent scrutiny has unveiled a disconcerting vulnerability: should an adversary furnish the agent with a meticulously formatted text, the system may unwittingly proceed to execute a malicious script.

    The tool in question is “Bob,” unveiled by IBM in October and currently undergoing evaluation in two modalities: a command-line interface (CLI) and an integrated development environment (IDE) featuring a specialized agentic terminal mode. Researchers from PromptArmor analyzed Bob prior to its public debut and asserted that the CLI is susceptible to prompt injection, potentially leading to the execution of arbitrary payloads on a victim’s machine. Furthermore, they contend that the IDE is vulnerable to data exfiltration scenarios typical of AI applications, where information is siphoned through rendering idiosyncrasies and network requests.

    This fragility is not unique to IBM’s offering. Agentic AI systems, endowed with tool access and the autonomy to act iteratively, have long been regarded as inherently precarious. Researchers such as Johann Rehberger have repeatedly demonstrated that such agents can be compromised through instruction overriding, jailbreaking, or classical vulnerabilities that culminate in remote code execution. In practice, many vendors tacitly acknowledge these risks by implementing a “human-in-the-loop” safeguard, requiring manual confirmation for high-risk actions.

    IBM’s documentation suggests a reliance on similar preventative measures. The company issues a caveat: permitting the agent to autonomously execute commands from a high-risk registry may lead to deleterious operations. As a mitigation strategy, IBM advocates for the use of an “allow-list” and the avoidance of wildcard patterns, expecting that the agent will solicit user authorization in ambiguous instances.

    However, PromptArmor maintains that these defenses are porous. In a controlled experiment, researchers provided Bob with a repository containing a clandestine malicious scenario within the README.md file. Masquerading as a tutorial for anti-phishing training, the file contained a sequence of commands for the agent to execute. Initial commands appeared benign, limited to simple echo operations, and Bob dutifully sought permission: to execute once, to permit indefinitely, or to suggest a revision. Subsequently, the exploit leveraged user complacency; the third command, while ostensibly another echo, attempted to download and launch a malicious script. If the user had previously granted a “permanent allowance” for the echo command, this subsequent step could bypass additional confirmation, resulting in the automatic installation of the payload.

    Technically, Bob is designed with certain fail-safes, such as prohibiting command substitution like $(command). Nevertheless, researchers discovered that the agent fails to scrutinize process substitution—a flaw identified within the project’s minified JavaScript code. Furthermore, the system allegedly fails to detect when authorized commands are concatenated with unauthorized subcommands via redirection operators like >, effectively camouflaging a series of hazardous actions as a legitimate call.

    As Shankar Krishnan, Managing Director of PromptArmor, elucidated, human confirmation often validates only the “allow-listed” command, even when unauthorized operations are lurking within the same string. The researchers contrasted this with rival solutions, noting that Claude Code, for instance, would demand explicit consent for the entire composite set of commands, regardless of whether the initial command enjoyed auto-approval status.

    Once an adversary successfully coerces an agent into delivering and executing an arbitrary shell script, the potential for catastrophe is manifest: ranging from ransomware and credential theft to the total compromise of the device. PromptArmor underscores that this risk materializes in standard professional environments where a developer interacts with untrusted content. An agent may ingest malicious instructions from third-party documentation, forum discussions, or even the output of other terminal tools. In their primary example, the researchers chose an unfamiliar open-source repository as the most plausible and self-contained vector of attack. IBM has reportedly been apprised of these findings.

  • ARTEMIS AI Places 2nd in Live Pentest, Outperforming 9 of 10 Human Security Experts

    Researchers from Stanford and their collaborators conducted an unconventional experiment: they compared how ten seasoned professional penetration testers and a suite of autonomous AI agents performed against a real corporate-style pentest. The test was not carried out in a controlled lab environment, but within the live network of a large university—approximately 8,000 hosts spread across 12 subnets, including public segments and VPN-restricted zones—where every action had to be executed with care to avoid disrupting production services.

    At the heart of the study was ARTEMIS, a new AI-agent “framework” designed to operate as a coordinated team. A central “lead” agent decomposes the task, launches multiple sub-agents in parallel with distinct roles, and automatically funnels findings through a validation module to eliminate noise and duplicates. In the final comparative ranking, ARTEMIS placed second overall, uncovering nine confirmed vulnerabilities. Its accuracy rate—82% of reports deemed correct—was sufficient to outperform nine of the ten invited human pentesters.

    The authors emphasize that not all AI tools proved equally effective. Many existing wrappers around language models fell short of human performance: some abandoned the task prematurely, others stalled during early reconnaissance, and several systems refused to carry out offensive actions altogether. ARTEMIS, by contrast, exhibited behavior closely resembling a traditional pentesting workflow—scanning, target selection, hypothesis testing, exploitation attempts, and iteration. The critical distinction lay in parallelism: whenever the agent identified a promising lead in scan results, it immediately dispatched a dedicated sub-agent to investigate further, while the main process continued exploring other avenues.

    At the same time, the study does not portray AI as a flawless, out-of-the-box hacker. The agents’ primary weaknesses were a higher rate of false positives and difficulties in scenarios requiring confident interaction with graphical user interfaces. The report offers a telling example: human testers can readily infer that a “200 OK” response on a web page may simply reflect a redirect back to a login screen after a failed authentication attempt, whereas agents lacking robust GUI capabilities struggle with such nuance. Conversely, reliance on the command line occasionally became an advantage: in cases where a human tester’s browser failed to load legacy interfaces due to HTTPS issues, ARTEMIS was able to proceed using tools like curl with certificate verification disabled and still achieve results.

    Another layer of discussion centers on economics. Over extended runs, ARTEMIS operated for a total of 16 hours, and one of its configurations cost, by the authors’ estimates, roughly $18 per hour. By comparison, they cite professional pentesting labor at approximately $60 per hour. The implication is straightforward: even with clear limitations, autonomous agents already appear competitive in terms of cost-to-outcome ratio, particularly when deployed for continuous and systematic assessment of large-scale infrastructures.

    The authors argue that the study’s primary contribution lies not merely in determining “who is stronger,” but in grounding AI evaluation in real-world conditions. Live networks are noisy, heterogeneous, and demand sustained, long-horizon action rather than the solution of toy problems. They also acknowledge the experiment’s constraints—compressed timelines and a limited sample size—and call for more reproducible environments and longer-duration tests to better understand where autonomous agents genuinely accelerate security efforts and where they remain, for now, perilously overconfident.

  • Google Launches Gemini 3 Pro: Next-Gen Multimodal AI That Reasons Spatially & Converts Documents to Code

    Google has unveiled Gemini 3 Pro — a new generation of multimodal models that not only see images and video, but genuinely reason about what is taking place within them. According to the company, it is Google’s most powerful visual and spatial AI to date: it sets new benchmark records in document understanding, screen comprehension, complex schematic analysis, and long-form video reasoning, and is already oriented toward concrete applications ranging from education and medicine to law and finance.

    One of the most transformative advances in Gemini 3 Pro lies in its ability to understand real-world documents. Unlike polished textbook examples, real documents are often chaotic: photographed pages; interwoven images, tables, formulas, and diagrams; illegible handwriting; and convoluted layouts. The model pairs high-precision OCR with visual and logical analysis, enabling it not merely to read such documents but to reconstruct their structure as executable code — HTML, LaTeX, or Markdown. Demonstrations include the reconstruction of a complex handwritten table from an eighteenth-century trade journal, converting a photographed formula into valid LaTeX, and turning Florence Nightingale’s famous diagram into an interactive chart.

    From there, deeper reasoning comes into play. Gemini 3 Pro can work through long reports step by step, correlating tables, charts, and narrative analysis. In one demonstration, the model parses the U.S. Census Bureau’s 62-page report Income in the United States: 2022. It locates the relevant Gini index tables for “money income” and “income after taxes,” compares year-over-year changes, and ties those trends to textual explanations — such as the expiration of crisis-relief programs and stimulus payments. It then inspects data on income share for the lowest quintile and determines whether that share rose or fell. On the CharXiv Reasoning benchmark for tasks of this type, Gemini 3 Pro even surpasses average human performance.

    Its spatial reasoning has also been significantly strengthened. Gemini 3 Pro can identify the precise coordinates of objects in an image and operate over sequences of such points — enabling, for instance, pose estimation or trajectory tracking. The model uses an open vocabulary: one can ask, “Create a plan to clean up this messy desk and sort the trash,” and it will rely not on rigid taxonomies but on its understanding of the objects and their roles. Similarly, it can be embedded into AR/XR devices: a user may view a manual and ask the assistant, “Show me which screw the instructions refer to,” and the model highlights the correct object in the real scene.

    These same capabilities underpin its understanding of digital screens. Google notes that Gemini 3 Pro handles desktop and mobile interfaces with confidence and can act as the “engine” behind agents that perform routine computer actions. In a demonstration, the model interacts with an Excel spreadsheet: accurately clicking the required cells, creating a pivot table, and generating a revenue summary across promotion types on a separate sheet. This level of UI comprehension lends itself to automated testing, user training, and UX analytics.

    Video receives special attention. Gemini 3 Pro has been optimized to process high frame rates — up to 10 FPS, a tenfold improvement over the baseline. This is crucial for tasks requiring fine-grained motion analysis, such as examining the mechanics of an athletic movement. The enhanced “thinking mode” teaches the model not merely to enumerate what appears on screen but to infer causal relationships and explain why events unfold as they do. Another notable capability is its ability to convert long videos into structured knowledge for downstream automation: extracting key information from lectures or tutorials and immediately translating it into working code or formalized workflows.

    Google highlights a wide range of sector-specific applications. In education, improved visual reasoning helps students and teachers unpack math, physics, and chemistry problems involving diagrams or drawings — from elementary school to university level. The same technology powers the Nano Banana Pro assistant, which can, for example, overlay a student’s notebook photo with the exact step where an error occurred and annotate the correction directly on the image rather than as dry text.

    In medicine and biomedical research, Gemini 3 Pro is positioned as Google’s most capable general-purpose model for imaging tasks. It achieves state-of-the-art results on MedXpertQA-MM (advanced medical reasoning), VQA-RAD (radiology question-answering), and MicroVQA (microscopy image analysis). Demonstrations include interpreting high-magnification micrographs, linking observed structures to diagnoses or experimental conditions.

    Lawyers and financial specialists can use Gemini 3 Pro to dissect voluminous documents, contracts, and reports. Contract-management platforms can delegate complex revision scenarios with extensive redlines and footnotes to the model. Harvey.ai, a legal-AI startup, reports marked improvements in sophisticated legal reasoning and document comprehension — particularly valuable for corporate counsel handling large flows of internal and external agreements.

    For developers, Gemini 3 Pro introduces major improvements in visual data handling. The model now preserves the original aspect ratio of images, enhancing overall understanding. A new media_resolution parameter allows users to control the resolution — and thus resource cost — at which images or videos are processed. High resolution benefits dense text, intricate documents, and complex scenes; lower resolution suits general scene recognition or long-context analysis where performance and cost are paramount.

    Taken together, Gemini 3 Pro represents a shift from mere recognition to a fully fledged visual intelligence capable of linking images, text, and actions. Google anticipates that such multimodal systems will form the backbone of next-generation assistants and industry solutions — from warehouse robotics to legal platforms and educational tools.

  • The Silent Threat: Why Your AI Browser Agent Can’t Be Trusted

    Anthropic has issued a warning about a new threat emerging alongside “smart” browser extensions — websites may discreetly inject hidden commands, which an AI agent could execute without hesitation. The company unveiled a research preview of its Claude extension for Chrome while simultaneously publishing the results of internal security evaluations: during browser-based testing, models succumbed to command injection in 23.6% of cases when no safeguards were in place. These findings have sparked a wider debate over whether it is possible to safely embed autonomous AI agents within web browsers at all.

    The extension introduces a sidebar with persistent context across active tabs and, upon request, gains the ability to perform tasks — from logging meetings and sending replies to preparing expense reports and testing website functionality. User-side permissions govern access, and the preview has been made available to only a thousand subscribers on the Claude Max plan (priced between $100 and $200 per month), with a waitlist open for others.

    The project builds on Computer Use, a feature launched in October 2024. At that time, Claude could take screenshots and literally move the cursor on behalf of the user. Now, integration runs far deeper: the agent operates directly inside Chrome rather than simulating clicks externally.

    Security testing spanned 123 cases grouped into 29 attack scenarios. Without protective measures, injected instructions succeeded in 23.6% of attempts. In one example, a malicious email persuaded the assistant to delete incoming messages “for inbox hygiene” — and in the absence of guardrails, the agent erased emails without further confirmation.

    To mitigate such risks, Anthropic implemented multiple layers of defense. Users can explicitly grant or revoke access to individual sites, while before publishing content, completing purchases, or transmitting personal data, the agent now requests confirmation. Categories such as financial services, adult content, and piracy-related domains are blocked by default. As a result, repeat testing showed the success rate of autonomous attacks dropping to 11.2% overall, and in one subset of four browser-specific attack methods, the success rate fell from 35.7% to 0%.

    Independent developer Simon Willison criticized the remaining 11.2% as an unacceptably high risk, arguing that the very concept of a browser-based agent extension is inherently vulnerable. Without perfectly reliable safeguards, he warned, abuse is inevitable.

    Concerns are further reinforced by competitor examples. The Brave security team recently demonstrated that Perplexity’s Comet browser could be manipulated into unauthorized actions by hiding instructions inside Reddit posts. When asked to summarize a discussion, the agent would open Gmail in a parallel tab, extract an email address, and initiate account recovery steps. Perplexity’s subsequent patch proved insufficient — Brave confirmed that the safeguards could still be bypassed.

    Anthropic emphasized that the limited preview is designed to collect real-world attack patterns and refine defenses ahead of wider release. Yet at the current stage of maturity, much of the risk is effectively shifted to end users who deploy such assistants on the open web at their own peril. As Willison noted, it is unrealistic to expect individuals to evaluate every potential threat in such a dynamic environment, making it imperative that vendors resolve these security issues before bringing the technology to the mass market.

  • SpAIware: The Stealthy Attack That Hides Malware in Your AI’s Memory

    In the Windsurf Cascade development environment, designed for AI-driven code automation and programmer assistance, a vulnerability has been uncovered, dubbed SpAIware. This flaw allows malicious commands to be implanted into the AI system, stored in its long-term memory without the user’s knowledge, and subsequently leveraged for persistent data exfiltration.

    The researcher known as “wunderwuzzi”, who published a report on August 22, 2025, explained that he had first demonstrated a similar method last year against ChatGPT, after which OpenAI addressed the issue. In Windsurf, however, the memory mechanism proved vulnerable for the very same reasons.

    An inspection of the system prompt revealed that Cascade incorporates a tool called create_memory, which automatically records new information into persistent storage. This means that an attacker could inject hidden instructions through indirect prompt manipulation, securing them for future use.

    As a result, all subsequent sessions remain under the influence of these malicious commands, undermining the confidentiality, integrity, and availability of the entire interaction history.

    The attack unfolds by embedding concealed code — for example, a comment hidden within source files. When the document is analyzed, the agent activates the memory tool and silently stores the malicious instructions. Users may remain entirely unaware, as the entry is logged inconspicuously in the interface and often goes unnoticed.

    Even more insidiously, the instructions could be hidden within a single transparent pixel embedded in the interface, rendering them virtually invisible. Server logs further revealed that chat content was being transmitted to external resources, confirming the risk of data exfiltration.

    The researcher disclosed the flaw to developers on May 30, 2025. While the company initially acknowledged the bug, communication soon ceased. Public disclosure followed three months later in an effort to draw wider attention to the threat.

    Only after the disclosure did Windsurf respond, stating its intention to issue a patch — though no timeline has yet been confirmed. A demonstration video of the exploit has been withheld until critical issues are resolved.

    The risks posed by SpAIware extend beyond data theft. Attackers could implant false information or persistent “logic bombs” that execute with every new session, effectively transforming the system’s memory into a channel for remote control. This danger is amplified by the absence of sandboxing and oversight during memory creation — a stark contrast to other agents that require explicit user consent for such operations.

    As a mitigation strategy, experts recommend redesigning memory behavior so that the system only suggests storing data rather than saving it automatically. Additionally, disabling unverified external links and embedded images — as implemented in tools like VS Code — would reduce exposure. For end users, the advice remains straightforward: regularly audit stored memories and delete any suspicious entries.

    SpAIware starkly illustrates how the combination of long-term memory and a lack of restrictions creates an entirely new class of threats for AI agents. Unlike one-off exploits, instructions embedded in this manner persist throughout the system’s lifecycle, enabling continuous data leakage and behavioral manipulation.

  • Cursor AI’s “YOLO Mode” Exposed: Security Firm Warns of Easy Bypasses, Data Deletion, and RCE Risks

    AI-powered programming tools are rapidly gaining popularity, and one of the most prominent—Cursor—has introduced a new YOLO mode (short for “you only live once”) that enables its agent to execute complex sequences of actions without requiring user confirmation at each step. However, Israeli cybersecurity firm Backslash Security has sounded the alarm: this seemingly convenient feature could lead not merely to errors, but to catastrophic consequences, including file deletion and the execution of arbitrary commands.

    YOLO mode activates an automatic command execution process, drastically minimizing human oversight. Cursor allegedly incorporates safety mechanisms such as allowlists, denylists, and an explicit toggle to prohibit file deletion. While these protections appear robust on paper, security experts have discovered that in practice, they are easily circumvented.

    Backslash identified four distinct techniques to bypass these restrictions. These include command obfuscation, execution within subshells, writing and running scripts from disk, and exploiting quotation manipulations in bash to evade filters. Even if a command like curl is blacklisted, Cursor can still execute it if encoded in Base64 or wrapped in another shell. Such workarounds render attempts to constrain the agent’s behavior virtually futile.

    Developers may unwittingly expose themselves to danger by importing instructions for Cursor from unverified GitHub repositories. These files often contain behavioral templates for the agent, but nothing prevents them from embedding malicious code. Alarmingly, even a simple comment in the source code or a line in a README file could serve as an attack vector—if it contains a specially crafted fragment that the agent interprets as an executable instruction.

    According to Backslash, these vulnerabilities also invalidate any reliance on the file deletion safeguard in YOLO mode. Once the agent gains the ability to execute malicious code, no amount of checkboxes will restrain its actions.

    Cursor has yet to issue an official response. However, the research team reports that the company plans to abandon the ineffective denylist approach in its upcoming version 1.3, which had not been released at the time of the study’s publication. Until then, developers are advised to forgo illusions of protection and think twice before entrusting an unsupervised AI with access to real-world code.