Skip to content

Instantly share code, notes, and snippets.

@wflynny
Last active February 5, 2019 20:01
Show Gist options
  • Save wflynny/dea420ba14523ac8e662938f61156934 to your computer and use it in GitHub Desktop.
Save wflynny/dea420ba14523ac8e662938f61156934 to your computer and use it in GitHub Desktop.
Building 10X reference genomes from Ensembl
# Visit the Ensembl ftp site.
# ftp://ftp.ensembl.org/pub/release-95/
#
# You want to find data under the following two URLs:
# 1. ftp://ftp.ensembl.org/pub/release-95/fasta/[YOUR_SPECIES_HERE]/dna/
# 2. ftp://ftp.ensembl.org/pub/release-95/gtf/[YOUR_SPECIES_HERE]/
#
# The first file of interest is under the fasta URL:
# [YOUR_SPECIES_HERE].[ASSEMBLY].dna.primary_assembly.fa.gz
# or, if that doesn't exist,
# [YOUR_SPECIES_HERE].[ASSEMBLY].dna.top_level.fa.gz
#
# The second file of interest is under the gtf URL:
# [YOUR_SPECIES_HERE].[ASSEMBLY].[ASSEMBLY_VERSION].gtf.gz
#
# With those two URLs in hand, define these 4 things:
reference_name="species-assembly"
reference_version="3.0.0" # or whatever you want!
fasta_url="ftp://.../fasta/...fa.gz"
gtf_url="ftp://.../gtf/...gtf.gz"
fasta_file=$(basename ${fasta_url})
gtf_file=$(basename ${gtf_url})
gtf_file_filt="${gtf_file%.*}.filtered.${gtf_file##*.}"
wget ${fasta_url}
gunzip ${fasta_file}
wget ${gtf_url}
gunzip ${gtf_file}
# Check to see what biotypes you have in your data:
# e.g. cut -f9 ${gtf_file} | egrep -o 'gene_biotype "(\w+)"' | sort | uniq
# load cellranger
# module load cellranger/3.0.2
# make STAR compatiable gtf
# add whatever "gene_biotypes" you are interested in
# here's the bare minimum
cellranger mkgtf \
${gtf_file} \
${gtf_file_filt} \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lincRNA \
--attribute=gene_biotype:antisense
# make STAR compatiable reference
cellranger mkref \
--genome=${reference_name} \
--fasta=${fasta_file} \
--genes=${gtf_file_filt} \
--ref-version=${reference_version} \
--nthreads=2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment