Skip to content

Instantly share code, notes, and snippets.

@ronan-mch
Created October 24, 2011 20:11
Show Gist options
  • Save ronan-mch/1310013 to your computer and use it in GitHub Desktop.
Save ronan-mch/1310013 to your computer and use it in GitHub Desktop.
This script uses bash commands to retrieve data from the Jobs.ie front page and process it into a spreadsheet friendly format
#!/bin/bash
#jobs.ie retrieval script
# This command invokes wget and saves the output file to temp1.txt
wget www.jobs.ie -O source-file.txt
# Prints current date to file
date >> scraper-dir/results.txt
#This chain of commands selects the correct table, removes html formatting and creates a tabbed output.
cat source-file.txt | grep cphMain | sed 's/<td>/\t/g' | sed 's/<tr>/\n/g' | sed 's/<\/td>/\t/g' | sed -e 's/<[^>]*>//g' >> scraper-dir/results.txt
# Removes the original file
rm source-file.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment