gtirb: Intermediate Representation for Binary analysis and transformation
GTIRB
The GrammaTech Intermediate Representation for Binaries (GTIRB) is a machine code analysis and rewriting data structure. It is intended to facilitate the communication of binary IR between programs performing binary disassembly, analysis, transformation, and pretty-printing. It is modeled on LLVM-IR and seeks to serve a similar functionality of encouraging communication and interoperability between tools.
Structure
GTIRB has the following structure:
IR
An instance of GTIRB may include multiple modules (Module) which represent loadable objects such as executables or libraries. Each module holds information such as symbols (Symbol), data (DataObject), and an inter-procedural control flow graph (CFG). The CFG consists of basic blocks (Block) and controls flow edges between these blocks. Each datum and each block holds a range referring to the bytes in the ImageByteMap. Each symbol holds a pointer to the block or datum it references.
Instructions
GTIRB explicitly does NOT represent instructions or instruction semantics but does provide symbolic operand information and access to the bytes. There are many intermediate languages (IL)s for the representation of instruction semantics (e.g., BAP‘sBIL, Angr‘s Vex, or Ghidra’s P-code). GTIRB works with these or any other IL by storing instructions generally and efficiently as raw machine-code bytes and separately storing the symbolic and control flow information. The popular Capstone/Keystonedecoder/encoder provides an excellent option to read and write instructions from/to GTIRB’s machine-code byte representation without committing to any particular semantic IL. By supporting multiple ILs and separate storage of analysis results in auxiliary data tables GTIRB enables collaboration between independent binary analysis and rewriting teams and tools.
Auxiliary Data
GTIRB provides for the sharing of additional information, e.g. analysis results, in the form of AuxData objects. These can store maps and vectors of basic GTIRB types in a portable way. This repository will describe the anticipated structure for very common types of auxiliary data such as function boundary information, type information, or results of common analyses.
UUIDs
Every element of GTIRB (namely: modules (Module), symbols (Symbol), blocks (Block), and instructions (InstructionRef) has a universally unique identifier (UUID). UUIDs allow both first-class IR components and AuxData tables to reference elements of the IR.
Install && Use
Copyright (c) 2018 GrammaTech, Inc.