noseyparker: finds secrets and sensitive information in textual data and Git history

Nosey Parker: Find secrets in textual data

Nosey Parker is a command-line tool that finds secrets and sensitive information in textual data. It is useful both for offensive and defensive security testing.

Key features:

  • It supports scanning files, directories, and the entire history of Git repositories
  • It uses regular expression matching with a set of 60 patterns chosen for high signal-to-noise based on experience and feedback from offensive security engagements
  • It groups matches together that share the same secret, further emphasizing signal over noise
  • It is fast: it can scan at hundreds of megabytes per second on a single core and is able to scan 100GB of Linux kernel source history in less than 5 minutes on an older MacBook Pro

This open-source version of Nosey Parker is a reimplementation of part of the internal version in use at Praetorian, which has additional machine-learning capabilities. Read more in blog posts here and here.

Usage quick start

 

sensitive information Git history

The datastore

Most Nosey Parker commands use a datastore. This is a special directory that Nosey Parker uses to record its findings and maintain its internal state. A datastore will be implicitly created by the scan command if needed. You can also create a datastore explicitly using the datastore init -d PATH command.

Scanning filesystem content for secrets

Nosey Parker has built-in support for scanning files, recursively scanning directories, and scanning the entire history of Git repositories.

For example, if you have a Git clone of CPython locally at cpython.git, you can scan its entire history with the scan command. Nosey Parker will create a new datastore at np.cpython and saves its findings there.

$ noseyparker scan --datastore np.cpython cpython.git
Found 28.30 GiB from 18 plain files and 427,712 blobs from 1 Git repos [00:00:04]
Scanning content ████████████████████ 100% 28.30 GiB/28.30 GiB [00:00:53]
Scanned 28.30 GiB from 427,730 blobs in 54 seconds (538.46 MiB/s); 4,904/4,904 new matches

Rule Distinct Groups Total Matches
───────────────────────────────────────────────────────────
PEM-Encoded Private Key 1,076 1,192
Generic Secret 331 478
netrc Credentials 42 3,201
Generic API Key 2 31
md5crypt Hash 1 2

Run the `report` command next to show finding details.

 

You can specify multiple inputs to scan at once in any combination of the supported input types (files, directories, and Git repos).

Summarizing findings

Nosey Parker prints out a summary of its findings when it finishes scanning. You can also run this step separately:

$ noseyparker summarize --datastore np.cpython

Rule Distinct Groups Total Matches
───────────────────────────────────────────────────────────
PEM-Encoded Private Key 1,076 1,192
Generic Secret 331 478
netrc Credentials 42 3,201
Generic API Key 2 31
md5crypt Hash 1 2

 

Reporting detailed findings

To see details of Nosey Parker’s findings, use the report command. This prints out a text-based report designed for human consumption:

$ noseyparker report --datastore np.cpython
Finding 1/1452: Generic API Key
Match: QTP4LAknlFml0NuPAbCdtvH4KQaokiQE
Showing 3/29 occurrences:

Occurrence 1:
Git repo: clones/cpython.git
Blob: 04144ceb957f550327637878dd99bb4734282d07
Lines: 70:61-70:100

e buildbottest

notifications:
email: false
webhooks:
urls:
- https://python.zulipchat.com/api/v1/external/travis?api_key=QTP4LAknlFml0NuPAbCdtvH4KQaokiQE&stream=core%2Ftest+runs
on_success: change
on_failure: always
irc:
channels:
# This is set to a secure vari

Occurrence 2:
Git repo: clones/cpython.git
Blob: 0e24bae141ae2b48b23ef479a5398089847200b3
Lines: 174:61-174:100

j4 -uall,-cpu"

notifications:
email: false
webhooks:
urls:
- https://python.zulipchat.com/api/v1/external/travis?api_key=QTP4LAknlFml0NuPAbCdtvH4KQaokiQE&stream=core%2Ftest+runs
on_success: change
on_failure: always
irc:
channels:
# This is set to a secure vari
...

Install

Copyright (C) 2022 praetorian-inc