Skip to content

Instantly share code, notes, and snippets.

@muru
Last active March 30, 2019 07:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save muru/7ba2f59e6baffc3675c85b66a2895ec1 to your computer and use it in GitHub Desktop.
Save muru/7ba2f59e6baffc3675c85b66a2895ec1 to your computer and use it in GitHub Desktop.
A script to take links to Papers with Code, and save the link to abstract, title and year
#! /bin/bash
TYPE=PAPER
TAG1=PAPER
TAG2=NLP
TAG3=LANGUAGE_MODELING
trap 'rm -f foo.html' EXIT
while read -r url
do
echo "$url" >> log
curl -sL "$url" > foo.html
(pup 'a:contains("Abstract"), a:contains("PDF") attr{href}' < foo.html | head -n1; pup '.paper-title h1, .authors span:nth-child(1) text{}' < foo.html) |
grep '[[:alnum:]]' |
(sed 's/.* \([0-9]\{4\}\)/\1/;4,$d'; printf "%s\n" "$TYPE" "$TAG1" "$TAG2" "$TAG3") |
paste -d '\t' - - - - - - - |
tee -a output
echo
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment