MEGR-APT: A Memory-Efficient APT Hunting System

by ddos · Published March 20, 2025 · Updated March 19, 2025

MEGR-APT

MEGR-APT is a scalable APT hunting system to discover suspicious subgraphs matching an attack scenario (query graph) published in Cyber Threat Intelligence (CTI) reports. MEGR-APT hunts APTs in a twofold process: (i) memory-efficient suspicious subgraphs extraction, and (ii) fast subgraph matching based on graph neural network (GNN) and attack representation learning.

MEGR-APT system Architecture

MEGR-APT RDF Provenance graph construction

The first step in MEGR-APT is to construct provenance graphs in the RDF graph engine.

Use construct_pg_cadets.py to query kernel audit logs from a structured database, Postgres, and construct a provenance graph in NetworkX format.
Use construct_rdf_graph_cadets.py to construct RDF-Based provenance graphs and store them in RDF graph engine, Stardog.

MEGR-APT Hunting Pipeline

MEGR-APT hunting pipeline consist of 2 steps as follows:

Use extract_rdf_subgraphs_cadets.py to extract suspicious subgraphs based on given attack query graphs’ IOCs.
Run main.py to find matches between suspicious subgraphs and attack query graphs using pre-trained GNN models (Has to run the script with the same parameters as the trained model, check the GNN matching documentation for more details).

The full hunting pipeline could be run using run-megrapt-on-a-query-graph.sh bash script to finds search for a specific query graph in a provenance graph. For evaluation, run-megrapt-per-host-for-evaluation.sh could be used. Use the Investigation_Reports.ipynb jupyter notebook to investigate detected subgraphs and produce a report to human analyst.

MEGR-APT Training Pipeline

To train a GNN graph matching model for MEGR-APT, you need to configure training/testing details in get_training_testing_sets() function in dataset_config.py file. Then take the following training steps:

Use extract_rdf_subgraphs_[dataset].py with --training argument, to extract a training/testing set of random benign subgraphs.
Use compute_ged_for_training.py to compute GED for the training set ( This step run is computationally expensive, takes long time, however it runs in parallel using multiple cores.).

Run main.py with the selected model training parameters as arguments ( See the GNN matching documentation for more details). The training pipeline could be run using train_megrapt_model.sh bash script.

MEGR-APT: A Memory-Efficient APT Hunting System

Search

Brilliantly

Content & Links

MEGR-APT: A Memory-Efficient APT Hunting System

MEGR-APT

MEGR-APT system Architecture

MEGR-APT RDF Provenance graph construction

MEGR-APT Hunting Pipeline

MEGR-APT Training Pipeline

Installation

Search

Brilliantly

Content & Links