Skip to content

Instantly share code, notes, and snippets.

@jesstess
Created November 27, 2010 20:02
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jesstess/718220 to your computer and use it in GitHub Desktop.
Save jesstess/718220 to your computer and use it in GitHub Desktop.
Use wget to spider a site as a logged-in user.
http://addictivecode.org/FrequentlyAskedQuestions
To spider a site as a logged-in user:
1. post the form data (_every_ input with a name in the form, even if it doesn't have a value) required to log in (--post-data).
2. save the cookies that get generated (--save-cookies), including session cookies (--keep-session-cookies), which are not saved when --save-cookies alone is specified.
2. load the cookies, continue saving the session cookies, and recursively (-r) spider (--spider) the site, ignoring (-R) /logout.
# log in and save the cookies
wget --post-data='username=my_username&password=my_password&next=' --save-cookies=cookies.txt --keep-session-cookies https://foobar.com/login
# spider the site from this user's perspective, skipping /logout since hitting that page will log you out
wget -R logout -r --spider --load-cookies=cookies.txt --save-cookies=cookies.txt --keep-session-cookies https://foobar.com/home
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment