GuardDog: The Open-Source CLI Tool for Hunting Malicious Packages in npm, PyPI, and More

GuardDog is a CLI tool that allows to identify malicious PyPI and npm packages, Go modules, GitHub actions, or VSCode extensions. It runs a set of heuristics on the package source code (through Semgrep rules) and on the package metadata.

GuardDog can be used to scan local or remote PyPI and npm packages, Go modules, GitHub actions, or VSCode extensions using any of the available heuristics.

It downloads and scans code from:

Heuristics

GuardDog comes with 2 types of heuristics:

  • Source code heuristics: Semgrep rules running against the package source code.

  • Package metadata heuristics: Python or Javascript heuristics running against the package metadata on PyPI or npm.

PyPI

Source code heuristics:

Heuristic Description
shady-links Identify when a package contains an URL to a domain with a suspicious extension
obfuscation Identify when a package uses a common obfuscation method often used by malware
clipboard-access Identify when a package reads or write data from the clipboard
exfiltrate-sensitive-data Identify when a package reads and exfiltrates sensitive data from the local system
download-executable Identify when a package downloads and makes executable a remote binary
exec-base64 Identify when a package dynamically executes base64-encoded code
silent-process-execution Identify when a package silently executes an executable
dll-hijacking Identifies when a malicious package manipulates a trusted application into loading a malicious DLL
steganography Identify when a package retrieves hidden data from an image and executes it
code-execution Identify when an OS command is executed in the setup.py file
cmd-overwrite Identify when the ‘install’ command is overwritten in setup.py, indicating a piece of code automatically running when the package is installed

Metadata heuristics:

Heuristic Description
empty_information Identify packages with an empty description field
release_zero Identify packages with an release version that’s 0.0 or 0.0.0
typosquatting Identify packages that are named closely to an highly popular package
potentially_compromised_email_domain Identify when a package maintainer e-mail domain (and therefore package manager account) might have been compromised
unclaimed_maintainer_email_domain Identify when a package maintainer e-mail domain (and therefore npm account) is unclaimed and can be registered by an attacker
repository_integrity_mismatch Identify packages with a linked GitHub repository where the package has extra unexpected files
single_python_file Identify packages that have only a single Python file
bundled_binary Identify packages bundling binaries
deceptive_author This heuristic detects when an author is using a disposable email

npm

Source code heuristics:

Heuristic Description
npm-serialize-environment Identify when a package serializes ‘process.env’ to exfiltrate environment variables
npm-obfuscation Identify when a package uses a common obfuscation method often used by malware
npm-silent-process-execution Identify when a package silently executes an executable
shady-links Identify when a package contains an URL to a domain with a suspicious extension
npm-exec-base64 Identify when a package dynamically executes code through ‘eval’
npm-install-script Identify when a package has a pre or post-install script automatically running commands
npm-steganography Identify when a package retrieves hidden data from an image and executes it
npm-dll-hijacking Identifies when a malicious package manipulates a trusted application into loading a malicious DLL
npm-exfiltrate-sensitive-data Identify when a package reads and exfiltrates sensitive data from the local system

Metadata heuristics:

Heuristic Description
empty_information Identify packages with an empty description field
release_zero Identify packages with an release version that’s 0.0 or 0.0.0
potentially_compromised_email_domain Identify when a package maintainer e-mail domain (and therefore package manager account) might have been compromised; note that NPM’s API may not provide accurate information regarding the maintainer’s email, so this detector may cause false positives for NPM packages. see https://www.theregister.com/2022/05/10/security_npm_email/
unclaimed_maintainer_email_domain Identify when a package maintainer e-mail domain (and therefore npm account) is unclaimed and can be registered by an attacker; note that NPM’s API may not provide accurate information regarding the maintainer’s email, so this detector may cause false positives for NPM packages. see https://www.theregister.com/2022/05/10/security_npm_email/
typosquatting Identify packages that are named closely to an highly popular package
direct_url_dependency Identify packages with direct URL dependencies. Dependencies fetched this way are not immutable and can be used to inject untrusted code or reduce the likelihood of a reproducible install.
npm_metadata_mismatch Identify packages which have mismatches between the npm package manifest and the package info for some critical fields
bundled_binary Identify packages bundling binaries
deceptive_author This heuristic detects when an author is using a disposable email

go

Source code heuristics:

Heuristic Description
shady-links Identify when a package contains an URL to a domain with a suspicious extension
go-exec-base64 Identify Base64-decoded content being passed to execution functions in Go
go-exfiltrate-sensitive-data This rule identifies when a package reads and exfiltrates sensitive data from the local system.
go-exec-download This rule downloads and executes a remote binary after setting executable permissions.

Metadata heuristics:

Heuristic Description
typosquatting Identify packages that are named closely to an highly popular package

GitHub Action

Source code heuristics:

Heuristic Description
npm-serialize-environment Identify when a package serializes ‘process.env’ to exfiltrate environment variables
npm-obfuscation Identify when a package uses a common obfuscation method often used by malware
npm-silent-process-execution Identify when a package silently executes an executable
shady-links Identify when a package contains an URL to a domain with a suspicious extension
npm-exec-base64 Identify when a package dynamically executes code through ‘eval’
npm-install-script Identify when a package has a pre or post-install script automatically running commands
npm-steganography Identify when a package retrieves hidden data from an image and executes it
npm-dll-hijacking Identifies when a malicious package manipulates a trusted application into loading a malicious DLL
npm-exfiltrate-sensitive-data Identify when a package reads and exfiltrates sensitive data from the local system

Install & Use

Support Our Threat Intelligence

If you find our technology report and cybersecurity news helpful, consider supporting our work.

Crypto QR Code
USDT (TRC20):
TN8BdV8cp4T1Cd28gK9qTAnZknzzuwyUtm
USDT (ERC20):
0x3725e1a7d3bc5765499fa6aaafe307fabcd75bce