Atrax - a simple web spider with wget

Atrax (http://en.wikipedia.org/wiki/Atrax_robustus) is a simple web spider useful during a penetration test.
It permits to spider a web site, to download its contents and to search for strings. At the moment we've some filters but you could add news: 

  - generic
  - XSS stuff
  - compromised stuff
  - email address
  - IP address
  - phone numbers

You could spider random url thanks to http://www.mangle.ca service.
To use a web proxy edit /etc/wgetrc parameters:

 http_proxy = http://proxy:1234
 https_proxy = http://proxy:1234

The script produces three logs log.wget, log.urls, log.results and save downloaded contents into "files" directory.

Note: spider option use HEAD HTTP method. Some server could respond repeatedly with 301 or 302 code.

Settings

WGET="/usr/bin/wget"
TREE="/usr/bin/tree"
#USER_AGENT="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
EXCLUSION_FILE="css|ico|png|jpg|gif"
EXCLUSION_URL="bootstrap|jquery|highcharts"
EXCLUSION_HEADERS="Date:|Last-Modified:|Keep-Alive:|ETag:|Content-Length:|Expires:"
PREFIX="atrax"
LOG_WGET="log.wget"
LOG_URL="log.urls"
LOG_RES="log.results"
TRIES="1"
SLEEP_RND="3"
DEPTH="5"
DOWNLOAD_LIMIT="1000"
TIMEOUT="3"
WAIT_TIME="3"
SITE_RND="http://www.mangle.ca/randomhomepage.php"
VERSION="1.0"

Requirements

- bash v3+ (https://www.gnu.org/software/bash/)
- wget (https://www.gnu.org/software/wget/)
- tree (http://mama.indstate.edu/users/ice/tree/)
- tested on GNU Wget 1.13.4 - Ubuntu 12.04.4 LTS

Demo

./atrax.sh <-t url> <-r> <-h>
 -t target url
 -r random url
 -h help

./atrax.sh -t http://www.nothink.org 

Target         : http://www.nothink.org
Log wget       : atrax-www.nothink.org/log.wget
Log url        : atrax-www.nothink.org/log.urls
Log results    : atrax-www.nothink.org/log.results
Exclusion file : css|ico|png|jpg|gif
Exclusion url  : bootstrap|jquery|highcharts
Working...

HTTP headers (distinct):
-------------------------------------------------------------------------
Accept-Ranges: bytes
Connection: close
Connection: Keep-Alive
Content-Type: application/application/x-rar-compressed
Content-Type: application/javascript
Content-Type: application/xml
Content-Type: application/x-sh
Content-Type: image/jpeg
Content-Type: image/png
Content-Type: image/x-icon
Content-Type: text/css
Content-Type: text/html
Content-Type: text/plain
Server: Apache/2.4.9 (Unix) mod_fcgid/2.3.9
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
x-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

Downloading 78 url...
Searching...

Generic:
-------------------------------------------------------------------------
codes.php                      <script
honeypot_dns_attacks.php       <script
honeypot_dns.php               <script
honeypots.php                  <script
index.php                      <script
malware_dns_request.php        <script
malware_http_request.php       <script
malware_irc_request.php        <script
wext-0.1.pl                    <object

XSS stuff:
-------------------------------------------------------------------------
exporting.js                   .innerHTML

Compromised:
-------------------------------------------------------------------------
honeypot_dns_attacks.php       shell
honeypot_ssh_download.php      hacked
honeypot_ssh_download.php      shell
utilities.php                  hacked
utilities.php                  shell

Email address:
-------------------------------------------------------------------------
CVE-2011-3192.pl               matteo.cantoni@nothink.org
honeypot_dns.php               matteo.cantoni@nothink.org
honeypots.php                  matteo.cantoni@nothink.org
honeypot_ssh_download.php      matteo.cantoni@nothink.org
honeypot_ssh.php               matteo.cantoni@nothink.org
index.php                      matteo.cantoni@nothink.org
snmpcheck-1.8.pl               matteo.cantoni@nothink.org
viruswatch.php                 matteo.cantoni@nothink.org
wext-0.1.pl                    matteo.cantoni@nothink.org

IP address:
-------------------------------------------------------------------------
appar.pl                       127.0.0.1
blacklist_ssh_all.txt          101.227.170.42
blacklist_ssh_all.txt          103.18.4.13
blacklist_ssh_all.txt          103.18.4.168
malware_network_activity.xml   218.93.205.23
....
....
malware_network_activity.xml   218.93.205.30
wext-0.1.pl                    127.0.0.1

Phone numbers:
-------------------------------------------------------------------------

Tree:
-------------------------------------------------------------------------
.
├── appar.pl
├── blacklist_ssh_all.txt
├── blacklist_ssh_day.txt
├── blacklist_ssh_week.txt
├── chargen.nse
├── check_routing_loop.py
├── cntlm.ini
├── codes.php
├── CVE-2011-3192.pl
├── dns-open-resolver.nse
├── exporting.js
├── honeypot_dns_attacks.php
....
....
├── honeypot_dns.php
├── honeypots.php
├── honeypot_ssh_download.php
├── honeypot_ssh.php
├── honeypot_web.php
├── http-status.nse
├── index.php
├── malware_dns_request.php
├── malware_http_request.php
├── malware_irc_request.php
├── malware_md5_list.txt
├── viruswatch.php
├── web_statistics_2013.txt
└── wext-0.1.pl

0 directories, 74 files

Done!

Download

1.0 - atrax.sh

See also

Other wget options you could use:

-D domain-list
-l depth
-w seconds
--waitretry=seconds
--no-http-keep-alive
--no-cache
--no-cookies
--load-cookies file
--save-cookies file
--header=header-line
--referer=url
--save-headers
--post-data=string
--post-file=file
--certificate
--server-response

Contact

Please send your feedback to Matteo Cantoni matteo.cantoni@nothink.org.