Skip to content

Instantly share code, notes, and snippets.

@antonkulaga
Last active August 29, 2015 14:18
Show Gist options
  • Save antonkulaga/d0c76a2cd6e6de55b9af to your computer and use it in GitHub Desktop.
Save antonkulaga/d0c76a2cd6e6de55b9af to your computer and use it in GitHub Desktop.
transcriptome assembly with Scythe-Sickle-Hisat-Stringtie-Ballgrown
###IMPROVING FASTQ (note: apply the same to all your fastq files) ###
#deleting illumina adapters by https://github.com/vsbuffalo/scythe
./sickle se -f /home/uploader/flies/assembly4/3_cleaned.fastq -t sanger /home/uploader/flies/assembly4/3.fastq
#triming fastq by https://github.com/najoshi/sickle
sickle se -f /home/uploader/flies/assembly4/3_cleaned.fastq -t sanger -o /home/uploader/flies/assembly4/3.fastq -q 25
### Hisat http://ccb.jhu.edu/software/hisat/manual.shtml ###
#building hisat index
hisat-build dmel.fasta dmel.hisat
#extracting know splice sites to ease alignment for hisat
python extract_splice_sites.py in_gtf_filename > out_splice_site_filename
#aligning reads
hisat -x dmel.hisat -U 3.fastq -S 3_hisat.sam --known-splicesite-infile splicesites.txt
#12299676 reads; of these:
# 12299676 (100.00%) were unpaired; of these:
# 1665253 (13.54%) aligned 0 times
# 8570171 (69.68%) aligned exactly 1 time
# 2064252 (16.78%) aligned >1 times
#86.46% overall alignment rate
### StringTie ###
#convertion to bam by samtools http://www.htslib.org
samtools view -S -b 3_hisat.sam > 3.bam
#sorting
samtools sort 3.bam 3_sorted
#GTF creation
#generating gtf-s as well as coverage that will be required for further diff analysis
stringtie 3_sorted.bam -G dmel.gtf -o 3.gtf -v -m 100 -C 3_cov.gtf -B
#if we want just to get gtfs, then it will be enough to: stringtie 3_sorted.bam -G dmel.gtf -o 3.gtf -v -m 100
#creating a list of files for cuffmerge
touch merge.txt
nano merge.txt
#then -> add pathes to gtf-s to merge, one line for each
#merging all created GTF files (for the same of simplicity assume they are in /home/uploader/flies/assembly4/transcripts/<name> #folders, while fasta-s are in /home/uploader/flies/assembly
cuffmerge -o /home/uploader/flies/assembly4/transcripts -g /home/uploader/flies/assembly4/transcripts/dmel.gtf -s /home/uploader/flies/assembly4/dmel.fasta -p 4 /home/uploader/flies/assembly4/transcripts/merge.txt
###Ballgrown analysis####
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment