katana: next-generation crawling and spidering framework

Katana

A next-generation crawling and spidering framework

Feature

  • Fast And fully configurable web crawling
  • Standard and Headless mode support
  • JavaScript parsing / crawling
  • Customizable automatic form filling
  • Scope control – Preconfigured field / Regex
  • Customizable output – Preconfigured fields
  • INPUT – STDINURL and LIST
  • OUTPUT – STDOUTFILE, and JSON

Crawling Mode

Standard Mode

Standard crawling modality uses the standard go http library under the hood to handle HTTP requests/responses. This modality is much faster as it doesn’t have the browser overhead. Still, it analyzes HTTP responses body as is, without any javascript or DOM rendering, potentially missing post-dom-rendered endpoints or asynchronous endpoint calls that might happen in complex web applications depending, for example, on browser-specific events.

Headless Mode

Headless mode hooks internal headless calls to handle HTTP requests/responses directly within the browser context. This offers two advantages:

  • The HTTP fingerprint (TLS and user agent) fully identify the client as a legitimate browser
  • Better coverage since the endpoints are discovered analyzing the standard raw response, as in the previous modality, and also the browser-rendered one with javascript enabled.

Headless crawling is optional and can be enabled using -headless option.

Scope Control

Crawling can be endless if not scoped, as such katana comes with multiple support to define the crawl scope.

Install & Use

Copyright (c) 2022 ProjectDiscovery