Skip to content

Instantly share code, notes, and snippets.

View pythseq's full-sized avatar
😉

pythseq pythseq

😉
View GitHub Profile
# Get a list of all organisms
curl -s "http://rest.kegg.jp/list/organism" > organisms-all.txt
# Get just a few of interest
cat organisms-all.txt | awk '$2~/^(hsa|mmu|rno|cfa|bta|gga|xla|xtr|dre|dme|cel|ath|ehi|tgo|eco|sau|mtu|mav|cje|ccol)$/' > organisms-of-interest.txt
# Get the accession codes for each
cut -f1 organisms-of-interest.txt > organisms-of-interest-codes.txt
# Make a directory to put all the kgml files downloaded
@pythseq
pythseq / job_array_demo.sh
Created February 28, 2018 10:01 — forked from andersgs/job_array_demo.sh
Slurm job arrays
#!/bin/bash
# Illustrating the use of job arrays in SLURM
# use the command: sbatch --array=1-100 job_array_demo.sh
# Partition for the job:
#SBATCH -p main
# Multithreaded (SMP) job: must run on one node
#SBATCH --nodes=1
@pythseq
pythseq / job_array_demo.sh
Created February 28, 2018 10:01 — forked from andersgs/job_array_demo.sh
Slurm job arrays
#!/bin/bash
# Illustrating the use of job arrays in SLURM
# use the command: sbatch --array=1-100 job_array_demo.sh
# Partition for the job:
#SBATCH -p main
# Multithreaded (SMP) job: must run on one node
#SBATCH --nodes=1
@pythseq
pythseq / parse_uniref_xml.py
Created March 16, 2018 10:38 — forked from sminot/parse_uniref_xml.py
Parse UniRef XML -> CSV
#!/usr/bin/python
import os
import sys
import xml
import gzip
import json
import time
from collections import defaultdict
import pandas as pd
@pythseq
pythseq / gimp.md
Created May 14, 2018 09:30 — forked from lindenb/gimp.md
gimp 2.6 procedures . xslt gimp xml procedures gimp scheme

Gimp Procedures

(script-fu-round-corners run-mode image drawable value toggle value value value toggle toggle)

.

Parameter(s)

@pythseq
pythseq / to_jupyter.md
Created October 3, 2018 16:56 — forked from lexnederbragt/to_jupyter.md
Tools to generate Jupyter Notebooks from plain (markup) text files
@pythseq
pythseq / SRA_Runs_to_BioSample.R
Created October 23, 2018 08:14 — forked from philippbayer/SRA_Runs_to_BioSample.R
For a file of SRA run IDs (ERR457868 etc.), ask the Sequence Read Archive for the associated BioSample names (SAMEA2399445 etc.)
library(rentrez)
library(assertthat)
library(readr)
search_ind <- function(term){
# get the IDs for a run ID
# ERR457868 searched, returns 1011219
results <- entrez_search(db="sra", term=term)$ids
assert_that(length(results) == 1)
results
}
@pythseq
pythseq / genbank_to_tbl.py
Created October 30, 2018 14:44 — forked from nickloman/genbank_to_tbl.py
genbank_to_tbl.py
# requires biopython
# run like:
# genbank_to_tbl.py "my organism name" "my strain ID" "ncbi project id" < my_sequence.gbk
# writes seq.fsa, seq.tbl as output
import sys
from copy import copy
from Bio import SeqIO
def find_gene_entry(features, locus_tag):
@pythseq
pythseq / dust_python.py
Created November 26, 2018 08:47 — forked from hiraksarkar/dust_python.py
Native dust in python (N.B. This is not accelerated sdust)
from collections import deque
import itertools
# Make dictionary of triplets
i = 0
triplet_index = {}
inverse_triplet = {}
for x in list(itertools.product(['A','T','G','C'], repeat=3)):
triplet_index[''.join(x)] = i
inverse_triplet[i] = ''.join(x)
@pythseq
pythseq / simple_for_loop_for_mapping.sh
Created December 7, 2018 08:59 — forked from meren/simple_for_loop_for_mapping.sh
A simple loop to serially map all samples.
#!/bin/bash
# A simple loop to serially map all samples.
# referenced from within http://merenlab.org/tutorials/assembly_and_mapping/
# how many threads should each mapping task use?
NUM_THREADS=4
for sample in `awk '{print $1}' samples.txt`
do