Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active December 21, 2015 09:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielecook/6286237 to your computer and use it in GitHub Desktop.
Save danielecook/6286237 to your computer and use it in GitHub Desktop.
Simple bash script to download publication tables from ucsc genome browser and merge.
# Downloading pub tables. Requires wget. Can be installed using home brew for mac
# More information is available here: http://brew.sh/
mkdir ../data/
# Download Files
wget --timestamping --directory-prefix='../data/' 'http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/pubsArticle.txt.gz'
wget --timestamping --directory-prefix='../data/' 'http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/pubsMarkerAnnot.txt.gz'
## wget --timestamping --directory-prefix='../data/' 'http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/pubsSequenceAnnot.txt.gz'
# Unzip Files
gunzip ../data/pubsArticle.txt.gz
gunzip ../data/pubsMarkerAnnot.txt.gz
## gunzip ../data/pubsSequenceAnnot.txt.gz
# Cut out needed columns from files. Sort, and make unique.
cut -f 1,5,6,8 pubsMarkerAnnot.txt | sort -n -k 1 | uniq -u > pubsMarkerAnnot_cut.txt
cut -f 1,2,3,8 pubsArticle.txt | sort -n -k 1 | uniq -u > pubsArticle_cut.txt
# Join the cut files, rearrange columns and strip the <B> and </B> Tags. Also removes the article index.
# Apparently, the tab needs to be specified as a literal (hence the $ sign).
join -t $'\t' pubsMarkerAnnot_cut.txt pubsArticle_cut.txt | uniq -u | awk -F $'\t' '{print $6"\t"$5"\t"$2"\t"$3"\t"$7"\t"$4}' | sed "s/<B>//;s/<\/B>//" > pubs_join.txt
# columns:
# pmid
# pmc id
# marker type (gene, snp, band)
# marker name (e.g. BRCA1)
# publication title
# Snippet
# Create the 'unique titles', and 'unique snippets' files:
cut -f 5 pubs_join.txt | uniq > titles_unique.txt
cut -f 6 pubs_join.txt | uniq > snippets_unique.txt
@wakibbe
Copy link

wakibbe commented Aug 22, 2013

Thanks Dan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment