Katana: a new crawling and spidering tool

A new web crawler and spidering tool from ProjectDiscovery.io has been released.

Install

katana requires Go 1.18 to install successfully.

go install github.com/projectdiscovery/katana/cmd/katana@latest

Usage

Input

$ katana -u https://web.com
$ katana -u https://web.com,https://otherweb.com

$ katana -list url_list.txt

echo https://www.web.com | katana
cat domains | httpx | katana

Crawling Mode

According to Katana’s documentation:

Standard Mode

Standard crawling modality uses the standard go HTTP library under the hood to handle HTTP requests/responses. This modality is much faster as it doesn’t have the browser overhead. Still, it analyzes HTTP responses body as is, without any javascript or DOM rendering, potentially missing post-dom-rendered endpoints or asynchronous endpoint calls that might happen in complex web applications depending, for example, on browser-specific events.

Headless Mode

Headless mode hooks internal headless calls to handle HTTP requests/responses directly within the browser context. This offers two advantages:

  • The HTTP fingerprint (TLS and user agent) fully identify the client as a legitimate browser
  • Better coverage since the endpoints are discovered by analyzing the standard raw response, as in the previous modality, and also the browser-rendered one with javascript enabled.

Headless crawling is optional and can be enabled using -headless option.

Javascript crawling can be enabled using -jc

$ katana -u https://web.com  
$ katana -u https://web.com -headless
$ katana -list url_list.txt -jc

Scope Control

-fs: Pre-defined scope field (dn,rdn,fqdn) (default «rdn»)

  • rdn: crawling scoped to root domain name and all subdomains (default)
  • fqdn: crawling scoped to given sub(domain)
  • dn: crawling scoped to domain name keyword

-cs: In scope url regex to be followed by crawler

-cos: Out of scope url regex to be excluded by crawler

Filters

-f option can be used to specify any of the available fields.
(url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)

Here is a table with examples of each field and expected output when used:

FieldDescriptionExample
urlURL Endpointhttps://admin.projectdiscovery.io/admin/login?user=admin&password=admin
qurlURL including query paramhttps://admin.projectdiscovery.io/admin/login.php?user=admin&password=admin
qpathPath including query param/login?user=admin&password=admin
pathURL Pathhttps://admin.projectdiscovery.io/admin/login
fqdnFully Qualified Domain nameadmin.projectdiscovery.io
rdnRoot Domain nameprojectdiscovery.io
rurlRoot URLhttps://admin.projectdiscovery.io
fileFilename in URLlogin.php
keyParameter keys in URLuser,password
valueParameter values in URLadmin,admin
kvKeys=Values in URLuser=admin&password=admin
dirURL Directory name/admin/
udirURL with Directoryhttps://admin.projectdiscovery.io/admin/

$ katana -u https://web.com -f qurl -silent

-em: Will ensure to display only output containing the given extensions.
-ef:  Will ensure to remove all the URLs containing the given extensions.

$ katana -u https://web.com -silent -em js,jsp,json
$ katana -u https://web.com -silent -ef css,txt,md

Rate-limits

Delay in seconds between each new request katana makes while crawling

$ katana -u https://web.com -delay 20

Number of URLs per target to fetch at the same time.

$ katana -u https://web.com -c 20

The number of targets to process at the same time from list input.

$ katana -u https://web.com -p 20

Output

$ katana -u https://web.com -o results
$ katana -u https://web.com -j -o results.json
$ katana -u https://web.com -silent

Configuration

Depth

$ katana -u https://web.com -depth 10

Timeout

$ katana -u https://web.com -timeout 20

Proxy

$ katana -u https://web.com -proxy http://127.0.0.1:8080

Know files

$ katana -u https://web.com -kf all

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *