Skip to content

Instantly share code, notes, and snippets.

@FloWuenne
Last active February 25, 2021 23:31
Show Gist options
  • Save FloWuenne/f80f66ef06fb146cf9b5706e6ee202f2 to your computer and use it in GitHub Desktop.
Save FloWuenne/f80f66ef06fb146cf9b5706e6ee202f2 to your computer and use it in GitHub Desktop.
Building a custom reference for kb-python in a virtual environment
## This is only on compute canada clusters where modules are available!
module load python/3.7.4
## Create virtual environment using python
python3 -m venv ./kb_python_env
## activate environment
source ./kb_python_env/bin/activate
## Install kb-python inside environment
pip3 install kb-python
## Define paths for index
ref_file_dir="."
index_dir="./kb_custom_index"
## Download the reference files from Gencode
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/GRCm38.primary_assembly.genome.fa.gz ## genome fasta
gunzip GRCm38.primary_assembly.genome.fa.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz ## gtf file (Comprehensive gene annotation)
gunzip gencode.vM25.annotation.gtf.gz
## Add your custom sequence to reference
## It is important that the fasta header and gtf entry are formatted exactly like the other genes and transcripts in the reference file
## otherwise, the reference building won't work correctly!
cat GRCm38.primary_assembly.genome.fa custom_sequence.fasta > GRCm38.primary_assembly.genome.with_custom_seq.fa
cat gencode.vM25.annotation.gtf custom_sequence.gtf > gencode.vM25.annotation.with_custom_seq.gtf
## Build the reference reference
kb ref --workflow standard -i $index_dir/kb_ref.GRCm38.with_custom_seq.idx -g $ref_file_dir/t2g_kb_ref.GRCm38.with_custom_seq -f1 $ref_file_dir/cdna.GRCm38.with_custom_seq $ref_file_dir/GRCm38.primary_assembly.genome.with_custom_seq.fa $ref_file_dir/gencode.vM25.annotation.with_custom_seq.gtf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment