Skip to content

Instantly share code, notes, and snippets.

@jasongallant
jasongallant / extractsequences.py
Created August 29, 2012 21:36
Extract Sequences
"""
%prog some.fasta wanted-list.txt
"""
from Bio import SeqIO
import sys
wanted = [line.strip() for line in open(sys.argv[2])]
seqiter = SeqIO.parse(open(sys.argv[1]), 'fasta')
SeqIO.write((seq for seq in seqiter if seq.id in wanted), sys.stdout, "fasta")
@jasongallant
jasongallant / countchars.sh
Created August 29, 2012 22:58
Count Characters Per Line
awk '
BEGIN{print"in.var\tin.ancest\tout.var\tout.ancest\tninmissing\tnoutmissing"}
NR <= 1 { next }
{ for(i=1;i<=n=split($4,a,"");i++) if((a[i]=="1")) b++;
for(i=1;i<=n=split($4,a,"");i++) if((a[i]=="0")) c++;
for(i=1;i<=n=split($4,a,"");i++) if((a[i]==".")) x++;
for(i=1;i<=n=split($3,a,"");i++) if((a[i]=="1")) d++;
for(i=1;i<=n=split($3,a,"");i++) if((a[i]=="0")) e++;
for(i=1;i<=n=split($3,a,"");i++) if((a[i]==".")) y++;
printf "%s\n",$2"\t"FS"\t"b"\t"c"\t"d"\t"e"\t"x"\t"y;
@jasongallant
jasongallant / resequenceandsnp.sh
Created August 30, 2012 13:37
Workflow for Genome Resequencing and SNP Calling
#WORKFLOW FOR GENOME RESEQUENCING:
#CONCATENATE CASAVA OUTPUT
# Annoying first step, Casava 1.8 splits files into groups of 4,000,000 reads, requiring you to stitch it back together...
cat *P1_GA1_*R1* > Limenitis_Pool_P1_GA1_R1.fastq.gz
cat *P1_GA1_*R2* > Limenitis_Pool_P1_GA1_R2.fastq.gz
#repeat for each group of files per each individual pool/sample
@jasongallant
jasongallant / illuminator.py
Created September 13, 2012 18:31
Process Illumina Files (RNASeq)
import os
import sys
import glob
import shutil
import argparse
import subprocess
import multiprocessing
import itertools
class FullPaths(argparse.Action):
@jasongallant
jasongallant / calcdistances.py
Created May 7, 2013 01:17
Python: Calculate Distances
#!/usr/bin/env python
import sys
inputfile = sys.argv[1]
with open(inputfile) as fd:
next(fd)
for line in fd:
columns=line.split()
#print columns
@jasongallant
jasongallant / cortex.sh
Created May 7, 2013 16:22
Shell::Align::Cortex Submit Script
perl /projectnb/mullenl/programs/CORTEX_release_v1.0.5.15/scripts/calling/run_calls.pl \
--first_kmer 31 \
--last_kmer 61 \
--kmer_step 30 \
--fastaq_index index_file \
--auto_cleaning yes \
--bc yes \
--pd no \
--outdir ./cortexresults \
--outvcf cortextrial \
@jasongallant
jasongallant / findempty.sh
Created May 8, 2013 23:50
Shell::Find Empty Files
find /path/to/dest -type d -empty
# find all empty files in /tmp directory
find /tmp -type d -empty
@jasongallant
jasongallant / git-import-repository.md
Created February 28, 2019 16:59 — forked from martinbuberl/git-import-repository.md
Import existing Git repository into another

Import existing Git repository into another

Folder structure before (2 separate repositories):

XXX
 |- .git
 |- (project files)
YYY
 |- .git
@jasongallant
jasongallant / README.md
Created March 26, 2021 01:24 — forked from iracooke/README.md
NCBI TSA Submission Guide

Steps to submit to TSA

If you have a transcriptome that has been assembled from shotgun reads the TSA (Transcriptome Shotgun Assembly) database is a good place to put it so that it can be widely accessed.

This guide assumes that you simply want to submit the assembled sequences from your transcriptome without annotations. NCBI sets a high bar for inclusion of annotations so for most non-model organisms they are probably not going to meet the criteria.

To create a TSA submission take a look at the ncbi guidelines. This gist is based on those guidelines.

Register BioProject