GuardDog is a CLI tool that allows to identify malicious PyPI and npm packages, Go modules, GitHub actions, or VSCode extensions. It runs a set of heuristics on the package source code (through Semgrep rules) and on the package metadata.
GuardDog can be used to scan local or remote PyPI and npm packages, Go modules, GitHub actions, or VSCode extensions using any of the available heuristics.
It downloads and scans code from:
- NPM: Packages hosted in npmjs.org
- PyPI: Source files (tar.gz) packages hosted in PyPI.org
- Go: GoLang source files of repositories hosted in GitHub.com
- GitHub Actions: Javascript source files of repositories hosted in GitHub.com
- VSCode Extensions: Extensions (.vsix) packages hosted in marketplace.visualstudio.com
Heuristics
GuardDog comes with 2 types of heuristics:
-
Source code heuristics: Semgrep rules running against the package source code.
-
Package metadata heuristics: Python or Javascript heuristics running against the package metadata on PyPI or npm.
PyPI
Source code heuristics:
| Heuristic | Description |
|---|---|
| shady-links | Identify when a package contains an URL to a domain with a suspicious extension |
| obfuscation | Identify when a package uses a common obfuscation method often used by malware |
| clipboard-access | Identify when a package reads or write data from the clipboard |
| exfiltrate-sensitive-data | Identify when a package reads and exfiltrates sensitive data from the local system |
| download-executable | Identify when a package downloads and makes executable a remote binary |
| exec-base64 | Identify when a package dynamically executes base64-encoded code |
| silent-process-execution | Identify when a package silently executes an executable |
| dll-hijacking | Identifies when a malicious package manipulates a trusted application into loading a malicious DLL |
| steganography | Identify when a package retrieves hidden data from an image and executes it |
| code-execution | Identify when an OS command is executed in the setup.py file |
| cmd-overwrite | Identify when the ‘install’ command is overwritten in setup.py, indicating a piece of code automatically running when the package is installed |
Metadata heuristics:
| Heuristic | Description |
|---|---|
| empty_information | Identify packages with an empty description field |
| release_zero | Identify packages with an release version that’s 0.0 or 0.0.0 |
| typosquatting | Identify packages that are named closely to an highly popular package |
| potentially_compromised_email_domain | Identify when a package maintainer e-mail domain (and therefore package manager account) might have been compromised |
| unclaimed_maintainer_email_domain | Identify when a package maintainer e-mail domain (and therefore npm account) is unclaimed and can be registered by an attacker |
| repository_integrity_mismatch | Identify packages with a linked GitHub repository where the package has extra unexpected files |
| single_python_file | Identify packages that have only a single Python file |
| bundled_binary | Identify packages bundling binaries |
| deceptive_author | This heuristic detects when an author is using a disposable email |
npm
Source code heuristics:
| Heuristic | Description |
|---|---|
| npm-serialize-environment | Identify when a package serializes ‘process.env’ to exfiltrate environment variables |
| npm-obfuscation | Identify when a package uses a common obfuscation method often used by malware |
| npm-silent-process-execution | Identify when a package silently executes an executable |
| shady-links | Identify when a package contains an URL to a domain with a suspicious extension |
| npm-exec-base64 | Identify when a package dynamically executes code through ‘eval’ |
| npm-install-script | Identify when a package has a pre or post-install script automatically running commands |
| npm-steganography | Identify when a package retrieves hidden data from an image and executes it |
| npm-dll-hijacking | Identifies when a malicious package manipulates a trusted application into loading a malicious DLL |
| npm-exfiltrate-sensitive-data | Identify when a package reads and exfiltrates sensitive data from the local system |
Metadata heuristics:
| Heuristic | Description |
|---|---|
| empty_information | Identify packages with an empty description field |
| release_zero | Identify packages with an release version that’s 0.0 or 0.0.0 |
| potentially_compromised_email_domain | Identify when a package maintainer e-mail domain (and therefore package manager account) might have been compromised; note that NPM’s API may not provide accurate information regarding the maintainer’s email, so this detector may cause false positives for NPM packages. see https://www.theregister.com/2022/05/10/security_npm_email/ |
| unclaimed_maintainer_email_domain | Identify when a package maintainer e-mail domain (and therefore npm account) is unclaimed and can be registered by an attacker; note that NPM’s API may not provide accurate information regarding the maintainer’s email, so this detector may cause false positives for NPM packages. see https://www.theregister.com/2022/05/10/security_npm_email/ |
| typosquatting | Identify packages that are named closely to an highly popular package |
| direct_url_dependency | Identify packages with direct URL dependencies. Dependencies fetched this way are not immutable and can be used to inject untrusted code or reduce the likelihood of a reproducible install. |
| npm_metadata_mismatch | Identify packages which have mismatches between the npm package manifest and the package info for some critical fields |
| bundled_binary | Identify packages bundling binaries |
| deceptive_author | This heuristic detects when an author is using a disposable email |
go
Source code heuristics:
| Heuristic | Description |
|---|---|
| shady-links | Identify when a package contains an URL to a domain with a suspicious extension |
| go-exec-base64 | Identify Base64-decoded content being passed to execution functions in Go |
| go-exfiltrate-sensitive-data | This rule identifies when a package reads and exfiltrates sensitive data from the local system. |
| go-exec-download | This rule downloads and executes a remote binary after setting executable permissions. |
Metadata heuristics:
| Heuristic | Description |
|---|---|
| typosquatting | Identify packages that are named closely to an highly popular package |
GitHub Action
Source code heuristics:
| Heuristic | Description |
|---|---|
| npm-serialize-environment | Identify when a package serializes ‘process.env’ to exfiltrate environment variables |
| npm-obfuscation | Identify when a package uses a common obfuscation method often used by malware |
| npm-silent-process-execution | Identify when a package silently executes an executable |
| shady-links | Identify when a package contains an URL to a domain with a suspicious extension |
| npm-exec-base64 | Identify when a package dynamically executes code through ‘eval’ |
| npm-install-script | Identify when a package has a pre or post-install script automatically running commands |
| npm-steganography | Identify when a package retrieves hidden data from an image and executes it |
| npm-dll-hijacking | Identifies when a malicious package manipulates a trusted application into loading a malicious DLL |
| npm-exfiltrate-sensitive-data | Identify when a package reads and exfiltrates sensitive data from the local system |
Install & Use
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.