CeWL is a ruby app which spiders a given url to a specified depth,  optionally following external links, and returns a list of words which  can then be used for password crackers such as John the Ripper.
CeWL also has an associated command line app, FAB (Files Already  Bagged) which uses the same meta data extraction techniques to create  author/creator lists from already downloaded.
Usage
cewl [OPTION] ... URL   
- --help, -h
- Show help
- --depth x, -d x
- The depth to spider to, default 2
- --min_word_length, -m
- The minimum word length, this strips out all words under the specified length, default 3
- --offsite, -o
- By default, the spider will only visit the site specified. With this option it will also visit external sites
- --write, -w file
- Write the ouput to the file rather than to stdout
- --ua, -u user-agent
- Change the user agent
- -v
- Verbose, show debug and extra output
- --no-words, -n
- Don't output the wordlist
- --meta, -a file
- Include meta data, optional output file
- --email, -e file
- Include email addresses, optional output file
- --meta_file file
- Filename for metadata output
- --email_file file
- Filename for email output
- --meta-temp-dir directory
- The directory used used by exiftool when parsing files, the default is /tmp
- --count, -c:
- Show the count for each of the words found
- --auth_type
- Digest or basic
- --auth_user
- Authentication username
- --auth_pass
- Authentication password
- --proxy_host
- Proxy host
- --proxy_port
- Proxy port, default 8080
- --proxy_username
- Username for proxy, if required
- --proxy_password
- Password for proxy, if required
- --verbose, -v
- Verbose
- URL
- The site to spider.
Keeping track of history.   
- Version 4.3 - Various spider bug fixes and the introduction of the sorting the results by count
- Version 4.2 - Fixed the Spider gem by overriding the function, also handling #name links correctly
- Version 4.1 - Small bug fixes and added new parameter to set filenames for email and metadata output
- Version 4 - Runs with Ruby 1.9.x and grabs text out of alt and title tags
- Version 3 - Now spiders pages referenced in JavaScript location commands
- Version 2.2 - Data from email addresses and meta data can be written to their own files
- Version 2.1 - Fixed a bug some people were having while using the email option
- Version 2 - Added meta data support
- Version 1 - released
 
 






 
 
 
 
 
 
 
 
 
