Skip to content

Instantly share code, notes, and snippets.

@arq5x
Created July 18, 2012 20:19
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save arq5x/3138599 to your computer and use it in GitHub Desktop.
Save arq5x/3138599 to your computer and use it in GitHub Desktop.
For Gemini: Create a master ChromHMM track from the 9 distinct cell types.
echo "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmGm12878HMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmH1hescHMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmHepg2HMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmHmecHMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmHsmmHMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmHuvecHMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmK562HMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmNhekHMM.bed.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmNhlfHMM.bed.gz" \
> chromhmm-files.txt
# downlaod
for remote in `cat chromhmm-files.txt`
do
wget $remote
done
# uncompress
for zip in `ls *.gz`
do
gunzip -f $zip
done
# bed+ -> ~bedgraph
for bed in `ls *.bed`
do
cut -f 1-4 $bed > $bed.bedg
done
# union of all intervals across all 9 cell types
bedtools unionbedg -i *.bedg > master.chromhmm.bedg
@arq5x
Copy link
Author

arq5x commented Jul 18, 2012

Remember to tweak the labels for the master bedg. Perhaps some add'l massaging of the values before making it to gemini.

@brentp
Copy link

brentp commented Jun 11, 2013

set -ex
cell_types=(Gm12878 H1hesc Hepg2 Hmec Hsmm Huvec K562 Nhek Nhlf)
dir=chromHMM

mkdir -p $dir; cd $dir;

for ct in "${cell_types[@]}"; do
    echo $ct;
    remote=http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmm${ct}HMM.bed.gz
    F=$(basename $remote .gz)
    wget --quiet -O - $remote | zcat - | cut -f 1-4 | perl -pe 's/\d+_(.+)/$1/' > $F
done

bedtools unionbedg -header -names "${cell_types[@]}" -i *.bed > master.chromhmm.bedg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment