Skip to content

Instantly share code, notes, and snippets.

View LeeBergstrand's full-sized avatar

Lee Bergstrand LeeBergstrand

View GitHub Profile
@LeeBergstrand
LeeBergstrand / FastaMLtoSL.py
Last active December 10, 2022 17:33
Python script that converts FASTA files with multiline sequences (wrapped) to FASTAs files with single line sequences (unwrapped).
#!/usr/bin/env python
# Created by: Lee Bergstrand
# Descript: Converts multiline FASTAs to single line FASTAs
#
# Usage: FastaMLtoSL.py <sequences.faa>
# Example: FastaMLtoSL.py mySeqs.faa
#----------------------------------------------------------------------------------------
#===========================================================================================================
#Imports:
@LeeBergstrand
LeeBergstrand / DbCAN_HMMSCAN_Parser_Problem.md
Last active March 1, 2022 09:32
dbCAN_HMMSCAN_Parser_Problem

dbCAN Hmmscan Parser Issue

I was recently reverse engineering dbCAN's shell/perl script (hmmscan-parser.sh) for parsing HMMER's hmmscan human readable text results. Unfortunately, while figuring out how this script works I found an error.

#####Orignal Script:

	#!/usr/bin/env sh
	# Yanbin Yin
	# 08/18/2011
@LeeBergstrand
LeeBergstrand / Linuxbrew_Config.yml
Last active September 29, 2021 09:20
CircleCi config for homebrew
version: 2
jobs:
build:
working_directory: ~/repo
machine: true
steps:
- checkout
#!/usr/bin/env python3
"""
Created by: Lee Bergstrand
Description: Makes newer InterProScan TSV file compatible with Pygenprop.
Requirements: None
"""
@LeeBergstrand
LeeBergstrand / CSVMod.py
Created February 17, 2014 01:02
A simple python script to modify any element or group of elements in a CSV file using a Regex.
#!/usr/bin/env python
# Created by: Lee Bergstrand
# Descript: A simple script that modifies the elements inside a column of a CSV by
# using a regular expression to find and replace charaters in those elements.
#
# Usage: CSVmod.py <input.csv> <output.csv> <columnNumber> <regex> <replace>
# Example: CSVmod.py myInput.csv myOutput.csv 6 ^[\t]+|[\t]$ replacement
#----------------------------------------------------------------------------------------
import csv
@LeeBergstrand
LeeBergstrand / C_chlorochromatii_CaD3.tsv
Created January 9, 2019 01:57
Genome Properties Assignment Bug (Issue 30)
We can't make this file beautiful and searchable because it's too large.
NC_007514.1_1800 2da1d06b2f7f511868a9c2917777ab82 498 Pfam PF12183 Restriction endonuclease NotI 231 461 7.8E-98 T 14-05-2018 IPR022009 Restriction endonuclease, type II, NotI
NC_007514.1_1661 c1968ae7cab9a5d53947002f3518f2c2 247 Pfam PF02585 GlcNAc-PI de-N-acetylase 11 127 4.3E-21 T 14-05-2018 IPR003737 N-acetylglucosaminyl phosphatidylinositol deacetylase-related Reactome: R-HSA-162710
NC_007514.1_1661 c1968ae7cab9a5d53947002f3518f2c2 247 TIGRFAM TIGR04001 thiol_BshB1: bacillithiol biosynthesis deacetylase BshB1 8 235 1.9E-96 T 14-05-2018 IPR023842 Bacillithiol biosynthesis deacetylase, BshB1 GO:0019213|GO:0071793
NC_007514.1_1661 c1968ae7cab9a5d53947002f3518f2c2 247 PANTHER PTHR12993:SF21 11 232 2.8E-49 T 14-05-2018
NC_007514.1_1661 c1968ae7cab9a5d53947002f3518f2c2 247 PANTHER PTHR12993 11 232 2.8E-49 T 14-05-2018 IPR003737 N-acetylglucosaminyl phosphatidylinositol deacetylase-related Reactome: R-HSA-162710
NC_007514.1_1652 87cd3c15562a2ce4954e59e9d87f2ae8 404 Pfam PF01747 ATP-sulfurylase 175 385 1.
Take this part of the config for example:
      - restore_cache:
          keys:
          - v1-dependencies-{{ checksum "requirements.txt" }}
          - v1-dependencies-
          
      - save_cache:
          key: v1-dependencies-{{ checksum "requirements.txt" }}
 paths:
@LeeBergstrand
LeeBergstrand / remove_low_seq_files.sh
Created August 21, 2016 20:43
Shell script for selecting fastq files by number of seqs.
#!/usr/bin/env bash
if [ $# -eq 0 ]
then
echo "No arguments supplied..."
echo "Please provide a minimum number of seqs per file."
exit 1
fi
MIN_LENGTH=$1
This file has been truncated, but you can view the full file.
==> Downloading https://downloads.sourceforge.net/project/boost/boost/1.58.0/boost_1_58_0.tar.bz2
Already downloaded: /home/lee2/.cache/Homebrew/boost-1.58.0.tar.bz2
==> Verifying boost-1.58.0.tar.bz2 checksum
tar xf /home/lee2/.cache/Homebrew/boost-1.58.0.tar.bz2
==> ./bootstrap.sh --prefix=/home/lee2/.linuxbrew/Cellar/boost/1.58.0 --libdir=/home/lee2/.linuxbrew/Cellar/boost/1.58.0/lib --without-icu --without-libraries=python,mpi
Building Boost.Build engine with toolset gcc... tools/build/src/engine/bin.linuxx86_64/b2
Unicode/ICU support for Boost.Regex?... disabled.
Generating Boost.Build configuration in project-config.jam...
Bootstrapping is done. To build, run:
@LeeBergstrand
LeeBergstrand / Trim_SPAdes_FASTA
Last active August 29, 2015 14:22
Trim_SPAdes_FASTA.py
#!/usr/bin/env python
# Created by: Lee Bergstrand
# Modified by: Matt McInnes
# License: MIT
# Descript: Trims FASTAs from the assembler SPAdes by coverage to remove low coverage contigs.
# ----------------------------------------------------------------------------------------
# ===========================================================================================================
# Imports:
import argparse