Skip to content

Instantly share code, notes, and snippets.

View conchoecia's full-sized avatar

darrin t schultz conchoecia

View GitHub Profile
@conchoecia
conchoecia / contig_coords_from_DNA_sequence
Created August 27, 2022 11:26
these functions take a DNA sequence and return the coordinates (python syntax) of the contigs
#!/usr/bin/env python
myseq = "ACNNNNNNNNNNNNNCATGTACTTGGATCTATCGGTGATCGGATCGGTATGCGTACGAGTGTCAGTCNNNNNNNNNNNNNNNACGTGGTATGCGGCATGCGTAGCGTCAGCTAGCTGATATTGCGTAGCNNNNNNNNNNGGTATGCGTG"
def contig_ranges(seq, sub):
indices = list(find_all(seq, sub))
gap_starts = []
gap_stops = []
gap_ranges = []
prev = -1
@conchoecia
conchoecia / Snakefile
Created June 6, 2019 06:51
transcriptome_assembly_pipeline
"""
Author: Darrin Schultz @conchoecia
File: Transcriptome assembly, annotation, and db creation
Instructions:
- To run this script and all of its analyses: make sure that python 3,
snakemake, and biopython are installed on your Unix computer.
- Execute the following command: `snakemake --cores 45`, replacing `45`
with the number of threads available on your machine.
"""
@conchoecia
conchoecia / plot_chromatogram.py
Last active January 29, 2019 00:25
Plots_chromatograms
"""
fraction lum NaCl_Conc type
1 4213 0.06 step
2 5123 0.06 step
3 7813 0.06 step
4 12988 0.06 step
5 24373 0.06 step
6 24843 0.1 slope
7 7513 0.1 slope
8 5358 0.2 step
@conchoecia
conchoecia / analyze_positions.sh
Last active October 10, 2018 00:27
get info on linker content in HiC libaries
#!/bin/bash
# this script makes plots of positions of linker sequences in HiC/Chicago libraries
#!/bin/bash
function processthis {
OUT1="${LIB}fpos.txt"
IN1="${LIB}_f.fastq.gz"
OUT2="${LIB}rpos.txt"
IN2="${LIB}_r.fastq.gz"
@conchoecia
conchoecia / Snakefile_locus_map_and_reassemble
Last active April 28, 2018 00:24
map reads to a locus and reassemble
# The goal of this script is to map reads to a region and to assemble it de novo.
# This is useful for scenarios in which there are gaps between known homologous sequence,
# like what one might encounter when doing a de novo mitochondrial genome assembly
# or assembling a gene locus from a transcript.
#
#The steps for this assembly process are.
# 1) Map all of the reads to the scaffold
# 2) Extract all of the read pairs from the original fastq files and make new fastqs.
# 3) Use the new Fastq files in a de novo Spades assembly.
@conchoecia
conchoecia / MiSeqLibraryQC(Snakefile)
Last active April 13, 2018 13:24
This gist takes a list of directories that contain fastq files and trims them, runs fastqc, and outputs a QC report with info on trimming efficiency and Hi-C linker sequence content.
import os
from pathlib import Path
import yaml
configfile: "directories.yaml"
# The yaml file should look like this below. Just an object called "directories"
# and a list of directories in which the pipeline should look for things.
#
# directories:
@conchoecia
conchoecia / Snakefile
Last active June 26, 2018 18:34
Snakemake Trim adapters and quality-trim 10X Chromium data from HiSeq.
import glob
import os
"""
snakemake interprets some output files as being ambiguous since the processing pipeline is not a DAG.
as a results you must use --allow-ambiguity
example usage:
snakemake --cores 90
"""
@conchoecia
conchoecia / make_notebooks.py
Created October 4, 2016 22:58
Parses out a lab notebook in md format into subproject files for easy tracking.
#!/usr/bin/env python3
# script: make_notebooks.py
# author: darrin t schultz
# date : 20161004
# make_notebooks.py is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Ansi 0 Color</key>
<dict>
<key>Blue Component</key>
<real>0.1098039299249649</real>
<key>Green Component</key>
<real>0.1098039299249649</real>