BinPool: Unlock Deeper Vulnerability Discovery in Binaries with This New Dataset
BinPool is a dataset consisting of vulnerable and patched binaries derived from historical Debian packages, compiled using four different optimization levels. It can be used for vulnerability discovery tasks through various methods, including machine learning and static analysis.
Features
BinPool provides the following features:
- Provides 603 unique CVEs and more than 80 CWEs.
- Includes the fix version of the corresponding Debian package for each CVE.
- Covers various programming languages (C, C++, Java, Python, PHP).
- Provides function and module names present in both patch and binary versions.
Measurement | Value |
---|---|
Number of Unique CVEs | 603 |
Number of CWEs | 89 |
Number of Debian Files | 824 |
Total Number of Binaries | 6144 |
Number of Debian Packages | 162 |
Number of Source Modules | 768 |
Number of Source Functions | 910 |
Number of Binary Functions | 7280 |
Below is a list of the most frequent CWEs in BinPool:
CWE | CWE Name | Count |
---|---|---|
CWE-787 | Out-of-bounds Write | 71 |
CWE-476 | NULL Pointer Dereference | 61 |
CWE-125 | Out-of-bounds Read | 54 |
CWE-190 | Integer Overflow or Wraparound | 34 |
CWE-20 | Improper Input Validation | 28 |
CWE-416 | Use After Free | 27 |
CWE-400 | Uncontrolled Resource Consumption | 20 |
You can download the dataset from Zenodo.
After downloading the data, the structure will be as follows: