The Illusion of Sapience: Unmasking the “Performative” AI and the Rise of Agentic Malware

by Nam Phong · March 23, 2026

Malefactors are already endeavoring to weave artificial intelligence into the fabric of malicious software, yet the current manifestations remain profoundly erratic. In certain instances, the neural network serves merely as an ostentatious facade, leaving behind naught but vociferous, utterly vacuous logs. Conversely, in other scenarios, it is entrusted with an eminently concrete imperative: adjudicating whether to detonate the payload upon a specific architecture or to prudently eschew the risk. A recent forensic dissection by Unit 42 brilliantly illuminates both of these diametric extremes simultaneously.

The vanguard of researchers scoured both the open digital expanse and their sovereign telemetry for spectral footprints indicating that the architects of malware are actively weaponizing large language models. Their inquiry was anchored upon three distinct paradigms. The primordial scenario: artificial intelligence serving as the very architect, drafting the venomous code itself. The secondary paradigm: the model acting as a strategic auxiliary in remote command and control, essentially dictating the subsequent tactical maneuvers. The tertiary, and most profoundly terrifying scenario: the malicious artifact autonomously rendering strategic judgments directly upon the subjugated architecture, utterly bereft of external, human orchestration. According to the intelligence gathered by Unit 42, the initial twain of these paradigms have already materialized within authentic specimens; however, a fully realized, sovereign local agentic logic has yet to be observed prowling the wild.

Their overarching deduction, nevertheless, remains unequivocally stark. Artificial intelligence possesses the contemporary capacity to aid malefactors in forging operational specimens, thereby precipitously lowering the barrier to entry for those previously constrained by technical ineptitude. Yet, the advent of a genuinely autonomous, sovereign malicious entity remains a distant horizon. The architects still grapple profoundly with the intricate choreography of embedding a model directly within the artifact, deploying it upon the targeted architecture, and compelling it to genuinely orchestrate the program’s behavior, rather than merely conjuring the illusion of sapience.

The inaugural case study revolves around a .NET-based infostealer, forged in C# beneath the .NET Framework 4.0 architecture, and subsequently shrouded by the ConfuserEx 2 packer. Whilst such obfuscation is customarily designed to thwart forensic dissection and detection, the cardinal revelation lies elsewhere: the artifact actively petitions the OpenAI GPT-3.5-Turbo architecture via HTTP API calls. In theory, this machination appears as the quintessential next evolutionary leap for malicious software. One might envision the subjugated program autonomously deciphering its environment, dynamically forging evasion stratagems, and engaging in profoundly sophisticated discourse with its command sovereign. In stark reality, the execution proved vastly more pedestrian.

The infostealer itself, it must be noted, is fundamentally operational. It meticulously harvests systemic intelligence, pillages cryptographic cookies from browsers, compiles exhaustive ledgers of files, and meticulously packages this intelligence for exfiltration to the command sovereign. Nestled adjacent to this functionality within the code is a constellation of functions ostensibly designed to masquerade as AI-driven logic: the synthetic genesis of evasion techniques, environmental reconnaissance, the obfuscation of data transit, and the fabrication of plausible missives for command server traffic. It is precisely these components that the forensic vanguard meticulously dissected.

This constellation comprises four distinct functions: GenerateEvasionTechnique(), AnalyzeTargetEnvironment(), GenerateObfuscatedCommunication(), and SendToC2ServerWithLLM(). According to the rigorous appraisal of Unit 42, not a single one of these functions renders the specimen demonstrably more perilous or labyrinthine. Conversely, the superfluous interrogations directed toward the model merely generate operational noise, acting as a beacon that could invariably attract the gaze of defensive sentinels. The invocations are executed, the responses are received, yet the utilitarian value is practically nil.

The primordial function, GenerateEvasionTechnique(), implores GPT-3.5 to conjure a rudimentary evasion stratagem tailored for an infostealer, demanding a succinct nomenclature of no more than three words. Exemplary responses encompass “Random Delay,” “Process Spoofing,” and “Memory Obfuscation.” Should the interface fail to respond, the artifact defaults to “Random Delay.” Subsequently, the most glaringly performative aspect unfolds: the culmination of this endeavor influences absolutely nothing. The procured string is merely inscribed into a victim_logs.txt ledger deposited upon the quarry’s desktop, accompanied by a chronological timestamp. There is zero substantive, executable implementation backing this proclamation.

It is precisely here that the profound superficiality of the entire architecture is laid bare. For such a response to possess authentic utility, the architect would be compelled to preemptively code bespoke handlers for every conceivable stratagem, or alternatively, to solicit data from the model capable of being dynamically transmuted into executable logic on the fly. Theoretically, both trajectories are viable. Yet, within the unearthed specimens, neither approach was manifested. The function merely conjures the phantom illusion that the artifact is autonomously selecting its methodology for evading detection.

The secondary function, AnalyzeTargetEnvironment(), projects a slightly more pragmatic facade, yet it equally fails to materially alter the operational reality. The petition dispatched to the model encompasses the operating system iteration, the architectural framework, and the user nomenclature; subsequently, GPT-3.5 is tasked with returning an integer betwixt 1,000 and 5,000, signifying a delay measured in milliseconds. This specific response is genuinely utilized: the program authentically enters a dormant state spanning one to five seconds. Should the model remain silent, a default hiatus of two seconds is invoked. Technically, this constitutes operational behavior, yet its strategic merit is profoundly dubious. The researchers unequivocally assert that such an integration yields no discernible, meaningful efficacy and, more accurately, exposes the architect’s profound lack of practical acumen in the forgery of truly clandestine instruments.

The tertiary function, GenerateObfuscatedCommunication(), entreats the model to nominate a rudimentary stratagem for the obfuscation of data transit. The examples provided encompass “Base64 Encode,” “XOR Cipher,” and “JSON Minify.” Following the model’s response, the artifact constructs a rudimentary data structure containing the nomenclature of the technique, an obfuscated timestamp, and a flag ostensibly declaring that the communication has been “enhanced” via a large language model. Upon initial inspection, one might presume the suggested methodology is genuinely applied to the network traffic. However, upon rigorous code dissection, the researchers unearthed absolutely no implementation to corroborate this assertion. The nomenclature of the technique is merely echoed into the console and a JSON ledger, whilst the actual data transmission remains utterly bereft of the promised obfuscation.

The generated logs are particularly illuminating in their performative nature. The console is inundated with proclamations heralding successful tethering to the OpenAI interface, the initialization of GPT-3.5-turbo, and the triumphant execution of evasion generation, environmental analysis, communication obfuscation, social engineering, dynamic adaptation, intelligent obfuscation, machine learning, and a sprawling litany of ostensibly active mechanisms. The culminating inscription declares that the integration of the large language model is “fully operational.” Yet, the overwhelming majority of these capabilities exist solely within the textual fabric of the logs themselves. They are utterly devoid of any substantive, underlying executable code.

The quaternary function, SendToC2ServerWithLLM(), is inextricably linked to the exfiltration of the plundered intelligence to the command sovereign. Herein, the artifact implores the model to fabricate a succinct, professionally toned missive to render the network traffic ostensibly more legitimate. GPT-3.5 provides a sterile, bureaucratic phrase, which the program subsequently embeds within the X-Message HTTP header, concurrently flagging the request with the string X-LLM-Enhanced: true. The payload itself is dispatched in JSON format toward hxxp[:]//localhost:3002/crypto-data. The researchers astutely observe that even this command server coordinate is configured as a default placeholder, strongly intimating that the specimen was either confined to local experimentation or constitutes an embryonic build never genuinely prepared for kinetic operations.

This specific function represents the closest approximation to authentic utilization of the model, as it precipitates an actual action rather than a mere ledger entry. Were a functional server coordinate to be substituted, the specimen could successfully exfiltrate data. Yet, even in such a scenario, the intervention of artificial intelligence bestows virtually no tangible enhancement. The auxiliary HTTP headers do not augment the efficacy of the communication conduit, nor do they confer any palpable tactical supremacy. They merely serve to underscore the sheer fact that a model was utilized. Consequently, the researchers characterize this approach more as a theatrical spectacle revolving around artificial intelligence than a genuinely utilitarian engineering breakthrough.

Fundamentally, the teleology of this .NET infostealer is entirely orthodox: to pillage sensitive telemetry from browsers and the host system, and subsequently exfiltrate it to a command sovereign. The architectural stratum tethered to the large language model was ostensibly envisioned to govern dynamic adaptation, detection evasion, and more sophisticated discourse with the command server. The forensic dissection reveals the exact antithesis: nearly the entirety of the AI-driven component operates as a decorative veneer, conjuring an illusion of labyrinthine complexity whilst contributing absolutely zero authentic functionality. Unit 42 concedes the possibility that such specimens were either authored with the assistance of artificial intelligence or cobbled together by individuals lacking profound expertise, who merely sought to graft a fashionable component onto an already familiar architecture.

The secondary specimen is profoundly more captivating, as the model is utilized not for aesthetic posturing, but to adjudicate a matter of paramount importance: the detonation of the payload. This involves a Go-based loader acting as a dropper for Sliver, an open-source framework utilized for adversary emulation and red team operations. Prior to the unfurling of the payload, the program aggregates systemic intelligence, encompassing its sovereign process nomenclature alongside that of its parent process. Subsequently, the dropper deciphers the Donut shellcode, yet it deliberately hesitates before initiating execution. Its primary imperative is to ascertain whether the contemporary environment presents an unacceptable peril.

To execute this audit, the malicious artifact compiles a comprehensive systemic dossier. It harvests the host nomenclature, a ledger of active processes, network telemetry, intelligence regarding tethered USB storage volumes, and the machine’s continuous operational uptime. This wealth of data is subsequently embedded within a petition and dispatched to GPT-4 via HTTP API calls. The model is, in essence, entrusted with the burden of decision: it must evaluate the aggregated telemetry and issue a definitive verdict on whether it is safe to deposit and detonate Sliver upon this specific host.

Customarily, such environmental audits within malicious software are executed with vastly greater simplicity and rigidity. The architect preemptively codifies heuristics, compiles ledgers of prohibited and permissible indicators, and embeds the logic necessary to identify sandboxes, virtual machines, forensic analysis instruments, and auxiliary markers of a hostile environment. This paradigm is profoundly familiar within ransomware and a multitude of other malicious lineages. In the case of this specific dropper, the logic is externalized: rather than relying upon a rigid codex of rules, the verdict is delegated to a language model. For the vanguard of defense, this represents a chilling evolution, as it becomes exponentially more arduous to decipher precisely what the artifact is attempting to evade, and what specific amalgamation of processes or configurations will compel it to abort its execution.

It is precisely this paradigm that the researchers deem genuinely fascinating. The model resides remotely, signifying that declarations of absolute autonomy remain premature. Yet, the underlying philosophy is starkly apparent: the architects of the malware seek not merely to receive text from a large language model, but to deputize it with the critical analysis of the operational environment and the ultimate authorization to proceed. Within the orthodox paradigm, this exact objective is achieved through rigidly hardcoded lists of permissions and prohibitions. The utilization of a model introduces the potential to synthesize a vastly greater multitude of systemic indicators, thereby yielding a profoundly more precise verdict regarding the environment’s suitability for infection. The inevitable subsequent evolution in this trajectory may well be the localized deployment of a diminutive language model or a compact machine-learning classifier, meticulously trained to differentiate betwixt benign and hostile environments based upon a complex matrix of host characteristics.

Against this backdrop, the contemporary landscape of AI-augmented malware manifests as a chaotic amalgamation of experiments, embryonic builds, and inaugural, coherent endeavors. The .NET infostealer exemplifies a profoundly superficial and ultimately futile utilization of a large language model. Conversely, the Sliver dropper illuminates a substantially more substantive paradigm, wherein the model actively participates in the existential decision of whether to execute the kinetic strike. The researchers emphatically underscore that, at this present juncture, it is impossible to definitively prove whether generative models were utilized in the actual authoring of these specific specimens. However, the mere existence of this potentiality is profoundly alarming, as artificial intelligence possesses the capacity to precipitously lower the barrier to entry for vastly less sophisticated malefactors.

The trajectory from this point forward, according to the prognostications of Unit 42, points inexorably toward escalating complexity. As the localized deployment of diminutive models becomes increasingly frictionless, the emergence of malicious specimens harboring intrinsically embedded AI capabilities is a virtual certainty. This applies most acutely to the synthetic genesis of code and the profoundly more agile, dynamic adaptation to specific operational environments. Such malicious architectures will possess the capacity to pivot their operational behavior with vastly greater celerity, evade detection with unprecedented efficacy, and dynamically select the most devastatingly advantageous methodology for executing their venomous objectives in real-time.

The proliferating integration of artificial intelligence within malicious software may manifest not merely in the labyrinthine escalation of logic, but concurrently in the accelerated genesis of nascent functionalities, alongside a profound amplification in the operational resilience of the specimens themselves. Stated otherwise, the peril extends far beyond a mere fashionable buzzword adorning the description of a kinetic strike. The true existential threat lies in the reality that the arsenals of malefactors may soon iterate with unprecedented velocity, operate with terrifying stability, and adapt to the specific defensive postures of their quarries with chilling precision. It is precisely for this reason that these nascent experiments demand the most unblinking, relentless scrutiny: even the failed endeavors serve as stark harbingers, illuminating the terrifying trajectory of the impending wave of digital threats.

Support Our Threat Intelligence

If you find our technology report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal