The Ghost in the Kernel: Inside KittyLoader’s Elite Anti-Analysis Arsenal
KittyLoader is a highly evasive loader written in C / Assembly.
Features
-
Hijacks early execution by replacing the C runtime entrypoint (__scrt_common_main_seh) with custom assembly.
-
Hides all modules by walking PEB->Ldr lists and unlinking its module entry (LDR_DATA_TABLE_ENTRY) from :
- InLoadOrderModuleList
- InInitializationOrderModuleList
- InMemoryOrderModuleList
-
Deploys a wide variety of anti-analysis techniques, including :
- Multilayer scoring (debugger, sandbox/resources, API integrity/hook checks, human-input entropy, contextual cues like domain/time of day) combined into a weighted overall confidence that continuously re-evaluates
- Picks an operational state (full → halted) and throttles/pauses with jittered, CPU-cycle-based delays in a loop that keeps reassessing the environment.
- API integrity/inline-hook heuristics and light tamper probes; human-interaction entropy sampling; randomized yet precise timing jitter to throw off debuggers
- Adds controlled noise (junk calcs + jittered delays) and spreads logic across multiple signals, reducing single-indicator detection.
-
Embedded payload is encrypted at rest, with key and nonce derived at runtime from entropy sources: PID, TID, QPC, memory load, CPU info (CPUID), tick count.
-
Preferred algo is ChaCha20, but in case of failure falls back to RC4, decryption occurs in place after the encrypted blob is copied into memory.
-
APIs are initially attempted to be resolved via tprtdll.dll, which is quite the modern technique, it does so using GetModuleHandleW(L”tprtdll.dll”) with DONT_RESOLVE_DLL_REFERENCES to minimize operation footprint.
-
Uses high-entropy randomness (PID/TID/GetTickCount/__rdtsc and more) to vary scan starts, delays, and sizes—reducing deterministic patterns and signature matches to cripple static.
-
Searches for (RX/RWX/RW, non-guarded) and guards behind additional is_region_safe() heuristic, and does the following :
- Resolves sensitive APIs via stealthy, hash-based lookups instead of plain export walking—shrinks observable footprints and evades basic hooks
- Loads libraries with quiet flags (DONT_RESOLVE_DLL_REFERENCES, LOAD_LIBRARY_SEARCH_SYSTEM32) which minimizes I/O usage and loader footprint
-
Writes and decrypts payload in scattered, variably sized chunks with micro-jitter—disrupts linear memory-write/decrypt heuristics and timing correlations
- Performs staged protection flips (RW → RWX → RX) and later restores RW before wipe/free—mimics legitimate behavior and lowers post-execution forensics
-
Execute via LdrCallEnclave, normally intended for SGX/VBS enclaves, instead of jumping to a secure enclave, we jump to an arbitrary function pointer in normal (VTL0) user memory – latest version adds timing camouflage and a plausible execution context
-
Cleans up carefully (I-cache flush, SecureZeroMemory, free) with randomized post-execution timing—limits residue and timeline clustering
To fix in the future – ChaCha20 implementation bug, entry hijack ASM prologue/epilogue issues, control flow and compile hazards, and silence anti analysis more. Add fallbacks if LdrCallEnclave fails, and implement more stable key derivation. Or just fix it yourself!
Download
Support Our Threat Intelligence
If you find our technology report and cybersecurity news helpful, consider supporting our work.