Skip to content

Instantly share code, notes, and snippets.

@jadedgnome
jadedgnome / createAccount.sh
Created April 4, 2015 13:32
shell script to create a user account on linux
#!/bin/bash
clear
trap "" SIGHUP SIGINT SIGTERM SIGTSTP
#get username, check if its taken, and if its proper length
while true
do
echo -n "Create username: "

After installing Arch on my Raspberry Pi, internet worked out of the box: I could plug it into the router, turn it on, ssh in and start downloading things. But the router is in my housemate's bedroom, which isn't ideal. If I want the Pi to be connected to the internet in my room, I need it to be connected to my laptop. (Another option would be a USB wifi dongle, of course.) This is how I did it. Much credit goes to the Ubuntu wiki's Connection sharing page.

I should disclaim that I don't fully understand networking stuff, and some of what I say might be wrong. I also didn't write this as I was going; so while I've consulted my browser and shell histories, it's possible I've forgotten some steps.

My laptop is running Gentoo, and this is where most of the work has to be done. It connects to the internet through wifi, on interface wlan0. The ethernet port is eth0, and eth0 is also the name of the ethernet port on the Pi.

Step zero: plug ev

echo 'export PATH=$HOME/local/bin:$PATH' >> ~/.bashrc
. ~/.bashrc
mkdir ~/local
mkdir ~/node-latest-install
cd ~/node-latest-install
curl http://nodejs.org/dist/node-latest.tar.gz | tar xz --strip-components=1
./configure --prefix=~/local
make install # ok, fine, this step probably takes more than 30 seconds...
curl https://www.npmjs.org/install.sh | sh
#!/bin/bash
set -e
# Usage:
# rsync_parallel.sh [--parallel=N] [rsync args...]
#
# Options:
# --parallel=N Use N parallel processes for transfer. Defaults to 10.
#
# Notes:
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains domain.com \
--no-parent \
domain.com

Extract links from a BBC responsive site

DOMAIN="m.bbc.co.uk"
SERVICE="hindi"
HTTP_USER_AGENT="Mozilla/5.0 (iPhone; Mobile; AppleWebKit; Safari)"
EXCLUDE_EXTENSIONS="\.\(txt\|css\|js\|png\|gif\|jpg\)$"
MAX_DEPTH="3"

wget --spider --no-directories --no-parent --force-html --recursive \

--level=$MAX_DEPTH --no-clobber \

http://addictivecode.org/FrequentlyAskedQuestions
To spider a site as a logged-in user:
1. post the form data (_every_ input with a name in the form, even if it doesn't have a value) required to log in (--post-data).
2. save the cookies that get generated (--save-cookies), including session cookies (--keep-session-cookies), which are not saved when --save-cookies alone is specified.
2. load the cookies, continue saving the session cookies, and recursively (-r) spider (--spider) the site, ignoring (-R) /logout.
# log in and save the cookies
wget --post-data='username=my_username&password=my_password&next=' --save-cookies=cookies.txt --keep-session-cookies https://foobar.com/login
wget --spider -o wget.log -e robots=off --wait 3 -r -p -S http://
grep -ri 'http://' wget.log | grep -E -v '(files/|\.jpg|\.jpeg|\.gif|\.css|\.js|\.pdf|\.png|\.xls)' | awk '{print $3}'|sort|uniq|sort > site_map.txt
cat $1 |grep -i -E -v '(\.jpg|\.jpeg|\.gif|\.css|\.js|\.pdf|\.png|\.xls|\.ico|\.txt|\.doc|yandexbot|googlebot|YandexDirect|\/upload\/|" 404 |" 301 |" 302 )'|perl -MURI::Escape -lne 'print uri_unescape($_)'|grep yandsearch|awk '{print $1}'|sort|uniq|wc -l
@jadedgnome
jadedgnome / spider.sh
Last active August 29, 2015 14:13 — forked from azhawkes/spider.sh
#!/bin/bash
HOME="http://www.yourdomain.com/some/page"
DOMAINS="yourdomain.com"
DEPTH=2
OUTPUT="./urls.csv"
wget -r --spider --delete-after --force-html -D "$DOMAINS" -l $DEPTH "$HOME" 2>&1 \
| grep '^--' | awk '{ print $3 }' | grep -v '\. \(css\|js\|png\|gif\|jpg\)$' | sort | uniq > $OUTPUT