Skip to content

Instantly share code, notes, and snippets.

@likejazz
Last active December 11, 2015 07:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save likejazz/4563963 to your computer and use it in GitHub Desktop.
Save likejazz/4563963 to your computer and use it in GitHub Desktop.
PCRE example. extract title from docids.
#!/bin/bash
trim() {
# Determine if 'extglob' is currently on.
local extglobWasOff=1
shopt extglob >/dev/null && extglobWasOff=0
(( extglobWasOff )) && shopt -s extglob # Turn 'extglob' on, if currently turned off.
# Trim leading and trailing whitespace
local var=$1
var=${var##+([[:space:]])}
var=${var%%+([[:space:]])}
(( extglobWasOff )) && shopt -u extglob # If 'extglob' was off before, turn it back off.
echo -n "$var" # Output trimmed string.
}
# ------------------------------------------------------------------------------------------------
DOCID_FILE="docids.dat"
RESULTS_FILE="results.csv"
rm -rf $RESULTS_FILE; touch $RESULTS_FILE
while read docid; do
docid=$(trim "$docid");
title=`curl --silent "http://XX.XX.XX.XX?query2=$docid" | perl -ne 'm/<td>title<\/td><td>(.*?)<\/td>/; print "$1"'`
echo -e "$docid\t$title" >> $RESULTS_FILE
echo "$docid Done."
done < $DOCID_FILE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment