webpalm: powerful command-line tool for website mapping and web scraping

by ddos · March 11, 2025

webpalm

WebPalm is a command-line tool that enables users to traverse a website and generate a tree of all its web pages and their links. It uses a recursive approach to enter each link found on a webpage and continues to do so until all levels have been explored. In addition to generating a site map, WebPalm can extract data from the body of each page using regular expressions and save the results in a file. This feature can be useful for web scraping or extracting specific information.

Features

Generate a palm tree struct of web urls

Dump data from body pages using regular expressions
live output mode
Export the web-tree to json, xml, txt

Fast and easy to use
Colorized output and error handling

Installation

From source

git clone https://github.com/Malwarize/webpalm.git
cd webpalm
go build -o webpalm && ./webpalm

From binary

You can download the binary from Releases

Via go

go install github.com/Malwarize/webpalm/v2@latest

Usage

webpalm -h
Flags:
  -x, --exclude-code ints        status codes to exclude / ex : -x 404,500
  -h, --help                     help for webpalm
  -i, --include strings          include only domains / ex : -i google.com,facebook.com
  -l, --level int                level of palming / ex: -l2
      --live                     live output mode (slow but live streaming) use only 1 thread / ex: --live
  -m, --max-concurrency int      max concurrent tasks / ex: -m 10 (default 10)
  -o, --output string            file to export the result (f.json, f.xml, f.txt) / ex: -o result.json
      --regexes stringToString   regexes to match in each page / ex: --regexes comments="\<\!--.*?-->" (default [])
  -u, --url string               target url / ex: -u https://google.com

Example

get the palm tree of a website:

webpalm -u https://google.com -l1 –live

get palm tree of a website and exclude some status codes:

webpalm -u https://google.com -l1 -x 404,500

get the palm tree of a website and dump data from the body of the pages:

webpalm -u https://google.com -l1 –regexes comments=“\<\!–.*?–>“ -o result.json“

this will dump the comments of each page in the body of the page

webpalm -u https://google.com -l1 –regexes comments=“\<\!–.*?–>“,emails=“([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+)“

this will dump the comments and emails of each page in the body of the page

get the palm tree of a website and export it to xml,txt:

webpalm -u https://google.com -l3 -o result.xml

webpalm -u https://google.com -l2 -o result.txt

get the palm tree of a website and include only some domains:

webpalm -u https://google.com -l2 -i google.com,facebook.com

this will crawl only the urls that contain google.com or facebook.com

treading and concurrency

get the palm tree of a website and use only 5 concurrent tasks:

webpalm -u https://google.com -l2 -m 5

? Note that the live mode is working with only 1 thread so you can’t use it with the live mode

Source: https://github.com/Malwarize/

webpalm: powerful command-line tool for website mapping and web scraping

Search

Brilliantly

Content & Links