MalTerminal: The First Malware to Use GPT-4 for On-the-Fly Code Generation
SentinelLABS researchers have uncovered what they describe as the earliest known sample of malware embedding LLM capabilities — a specimen dubbed MalTerminal. Presented at LABScon 2025, their report catalogs a collection of artifacts: a Windows binary, several Python scripts, and auxiliary tools that together demonstrate how a GPT-4 model was harnessed at runtime to dynamically generate code for a ransomware module or a reverse shell.
The sample contains an API endpoint for the deprecated OpenAI Chat Completions interface, which was retired in early November 2023, suggesting that MalTerminal may predate that cutoff and thus legitimately contend for the title of one of the first LLM-augmented malware families. Unlike conventional malware, part of MalTerminal’s logic is produced on the fly via queries to GPT-4: the operator selects a mode — “ransomware” or “reverse shell” — and the model emits the corresponding code. Accompanying artifacts include scripts that replicate the binary’s behavior and a defensive scanner that, using the model, assesses suspicious Python files and generates reports — a vivid illustration of how LLMs can serve both offensive and defensive purposes.
The authors extend their work to propose a hunting methodology for LLM-powered threats that exploits the inevitable fingerprints of model integration: embedded API keys and hardcoded prompts. By cataloging key patterns and provider-specific prefixes — for example, sk-ant-api03 from one vendor or distinctive OpenAI fragments — they derived rules for large-scale retrohunting. A year-long VirusTotal analysis revealed thousands of files containing keys, a mix of innocuous developer leaks and samples with clear signs of abuse. Parallel prompt-hunting — extracting textual prompts from binaries and scripts and subjecting them to automated maliciousness screening via lightweight LLM classifiers — proved strikingly effective at exposing previously unseen tools.
The research highlights a paradox: outsourcing logic to an external model grants attackers flexibility and adaptability, yet it also introduces brittle dependencies — without a valid API key or preserved prompts, functionality collapses. This fragility makes “prompts-as-code” and embedded keys promising detection vectors, especially in the early stages of this threat’s evolution. To date, there is no incontrovertible evidence of widespread MalTerminal deployment in the wild; it may instead represent a proof of concept or a red-team toolkit — but the tactic itself reframes how defenders should think about signatures, network telemetry, and attribution.
SentinelLABS urges heightened scrutiny in application analysis and retrospective repository triage: beyond bytecode and string signatures, defenders should now hunt for textual prompts, messaging structures, and artifacts of cloud model access — the very loci where the next generation of malware mechanics will emerge.
Support Our Threat Intelligence
If you find our technology report and cybersecurity news helpful, consider supporting our work.