Skip to content

Instantly share code, notes, and snippets.

View ShaiberAlon's full-sized avatar

Alon Shaiber ShaiberAlon

  • University of Chicago
  • Chicago
View GitHub Profile
@ShaiberAlon
ShaiberAlon / wait_for_cluster.py
Created March 30, 2016 22:15
waits for certain batch to finish before executing the next command (Copied from Meren)
import os
import sys
import time
import xml.dom.minidom
import string
import getpass
def still_running(job_name, job_owner):
f=os.popen('qstat -u \* -xml -r')
@ShaiberAlon
ShaiberAlon / remove_duplicate_sequence.shx
Created May 19, 2016 14:58
Short bash for removing duplicates of sequences from a fasta file (keeping one copy of each unique sequence)
#!bin/bash
# remove_duplicate_sequence.shx is a short bash to remove multiple copies of sequences from an input fasta file and saves the result in an output fasta file.
# the bash script was based on Pierre Lindenbaum's script: https://www.biostars.org/p/3003/#3008
# input:
# -f | --file : fasta file
# -o | --output : output file after removing all the sequences
#
while [ "$1" != "" ]; do
case $1 in
-f | --file ) shift
@ShaiberAlon
ShaiberAlon / convert_table_to_fasta.shx
Created June 2, 2016 18:51
Converting tab delimited file to a fasta format
#!bin/bash
#
# expects tab-delimited table with only two columns and no header
# column 1 - name of each sequence
# column 2 - sequence
# performs two actions:
# Adds > at the begining of each row
# converts all the tabs to new lines
#
while [ "$1" != "" ]; do
@ShaiberAlon
ShaiberAlon / ceonvert_xmlTab_to_normalTab.shx
Created June 2, 2016 18:53
bash to convert excel created tab delimited files to normal tab delimited files
tr '\r' '\n' < file.txt > file_fix.txt
#!/usr/bin/env python
list1=open('p207_profiled_samples.txt','r')
list2=open('p214_profield_samples.txt','r')
print list(set(list1.read()) - set(list2.read()))
#!/bin/bash
set -e
DIR=Metaphlan_output
job=Metaphlan_Primates
email=alon.shaiber@gmail.com
WAIT () {
python /workspace/meren/wait_for_cluster.py $1
}
import csv
import numpy as np
import argparse
parser = argparse.ArgumentParser(description='Adding the 2MA column to anvio AA table')
parser.add_argument('-i','--input',metavar='FILE',dest='input_file',help='Input file')
parser.add_argument('-o','--out',metavar='FILE',dest='output_file',help='Name of file for output')
parser.add_argument('-r','--ratio',metavar='NUMBER',dest='ratio',type=float,help='Minimal ratio between consensus and the second most covered amino-acid. If the ratio is lower than the provided threshold, then the 2MA value would be in the form concensus_concensus')
args = parser.parse_args()
@ShaiberAlon
ShaiberAlon / gen-tree-with-real-gene-order
Created November 8, 2016 03:27
Generate a newick formatted tree with numerical order
#!/usr/bin/env python
# -*- coding: utf-8
__author__ = "Alon Shaiber"
__copyright__ = ""
__credits__ = []
__license__ = ""
__version__ = 1
__maintainer__ = "Alon Shaiber"
@ShaiberAlon
ShaiberAlon / export_nuc_from_fasta.py
Last active March 3, 2017 19:55
script to get certain nucleotide from within a specific contig from a fasta file
#!/usr/bin/env python
import anvio.utils as u
import argparse
import sys
parser = argparse.ArgumentParser(description='Get nucleotides from fasta file beyween user defined nucleotide positions inside a specified contig')
parser.add_argument('-1','--N1',metavar='INT',dest='n1',type=int,help='Nucleotide sequence start position')
parser.add_argument('-2','--N2',metavar='INT',dest='n2',type=int,help='Nucleotide sequence start position')
parser.add_argument('-c','--contig',metavar='STRING',dest='c',help='Contig name')
parser.add_argument('-o','--out',metavar='FILE',dest='output',help='Output file')
@ShaiberAlon
ShaiberAlon / MAP.shx
Last active March 27, 2017 21:54
bash script to map multiple metagenomes to multiple references
#!/bin/bash
### DEFAULTS (FEEL FREE TO EDIT THESE) ##################
NUM_THREADS_FOR_MAPPING=10
NUM_THREADS_FOR_HMMSCAN=4
NUM_THREADS_FOR_ANVI_GEN_CONTIG=4
NUM_THREADS_FOR_ANVI_PROFILE=4
NUM_THREADS_FOR_ANVI_MERGE=4
# configure whether SNV analysis will be included or not (if you want it included then leave this empty