webpalm: powerful command-line tool for website mapping and web scraping

by Nam Phong · March 11, 2025

webpalm

WebPalm is a command-line tool that enables users to traverse a website and generate a tree of all its web pages and their links. It uses a recursive approach to enter each link found on a webpage and continues to do so until all levels have been explored. In addition to generating a site map, WebPalm can extract data from the body of each page using regular expressions and save the results in a file. This feature can be useful for web scraping or extracting specific information.

Features

Generate a palm tree struct of web urls
Dump data from body pages using regular expressions
live output mode
Export the web-tree to json, xml, txt
Fast and easy to use
Colorized output and error handling

Installation

From source

[pastacode lang=”markup” message=”” highlight=”” provider=”manual” manual=”git%20clone%20https%3A%2F%2Fgithub.com%2FMalwarize%2Fwebpalm.git%0Acd%20webpalm%0Ago%20build%20-o%20webpalm%20%26%26%20.%2Fwebpalm”/]

From binary

You can download the binary from Releases

Via go

go install github.com/Malwarize/webpalm/v2@latest

Usage

[pastacode lang=”markup” message=”” highlight=”” provider=”manual” manual=”webpalm%20-h%0AFlags%3A%0A%20%20-x%2C%20–exclude-code%20ints%20%20%20%20%20%20%20%20status%20codes%20to%20exclude%20%2F%20ex%20%3A%20-x%20404%2C500%0A%20%20-h%2C%20–help%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20help%20for%20webpalm%0A%20%20-i%2C%20–include%20strings%20%20%20%20%20%20%20%20%20%20include%20only%20domains%20%2F%20ex%20%3A%20-i%20google.com%2Cfacebook.com%0A%20%20-l%2C%20–level%20int%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20level%20of%20palming%20%2F%20ex%3A%20-l2%0A%20%20%20%20%20%20–live%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20live%20output%20mode%20(slow%20but%20live%20streaming)%20use%20only%201%20thread%20%2F%20ex%3A%20–live%0A%20%20-m%2C%20–max-concurrency%20int%20%20%20%20%20%20max%20concurrent%20tasks%20%2F%20ex%3A%20-m%2010%20(default%2010)%0A%20%20-o%2C%20–output%20string%20%20%20%20%20%20%20%20%20%20%20%20file%20to%20export%20the%20result%20(f.json%2C%20f.xml%2C%20f.txt)%20%2F%20ex%3A%20-o%20result.json%0A%20%20%20%20%20%20–regexes%20stringToString%20%20%20regexes%20to%20match%20in%20each%20page%20%2F%20ex%3A%20–regexes%20comments%3D%22%5C%3C%5C!–.*%3F–%3E%22%20(default%20%5B%5D)%0A%20%20-u%2C%20–url%20string%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20target%20url%20%2F%20ex%3A%20-u%20https%3A%2F%2Fgoogle.com”/]

Example

get the palm tree of a website:

webpalm -u https://google.com -l1 –live

get palm tree of a website and exclude some status codes:

webpalm -u https://google.com -l1 -x 404,500

get the palm tree of a website and dump data from the body of the pages:

webpalm -u https://google.com -l1 –regexes comments=“\<\!–.*?–>“ -o result.json“

this will dump the comments of each page in the body of the page

webpalm -u https://google.com -l1 –regexes comments=“\<\!–.*?–>“,emails=“([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+)“

this will dump the comments and emails of each page in the body of the page

get the palm tree of a website and export it to xml,txt:

webpalm -u https://google.com -l3 -o result.xml

webpalm -u https://google.com -l2 -o result.txt

get the palm tree of a website and include only some domains:

webpalm -u https://google.com -l2 -i google.com,facebook.com

this will crawl only the urls that contain google.com or facebook.com

treading and concurrency

get the palm tree of a website and use only 5 concurrent tasks:

webpalm -u https://google.com -l2 -m 5

? Note that the live mode is working with only 1 thread so you can’t use it with the live mode

Source: https://github.com/Malwarize/

Support Our Threat Intelligence

If you find our technology report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal