Skip to content

Instantly share code, notes, and snippets.

View Swarchal's full-sized avatar

Scott Warchal Swarchal

View GitHub Profile
@Swarchal
Swarchal / 3k_plates.sh
Created February 19, 2017 17:13
find files ending with 06 to 15
ls | grep -E '*-(0[6-9]|1[1-5])$' > plates_06_to_15.txt
@Swarchal
Swarchal / find_correlation.py
Last active January 30, 2022 04:34
remove redundant columns in pandas dataframe
import pandas as pd
import numpy as np
def find_correlation(data, threshold=0.9, remove_negative=False):
"""
Given a numeric pd.DataFrame, this will find highly correlated features,
and return a list of features to remove.
Parameters
-----------
data : pandas DataFrame
@Swarchal
Swarchal / start_jupyter_server.sh
Last active February 16, 2017 12:12
start remote jupyter notebook server over ssh
#!/bin/bash
# start a jupyter server running on a remote host
# and port-forward to local machine
remote_host=$1
terminal=x-terminal-emulator
# start remote jupyter session on port:8889
$terminal ssh $remote_host "ipython notebook --no-browser --port=8889"
@Swarchal
Swarchal / kmer.py
Last active February 2, 2017 18:04
rosalind kmer solutions
#!/usr/bin/env python3.6
# sensible solution
import re
import sys
import itertools
def get_seq(path):
"""return fasta sequence from file"""
lines = open(path).readlines()
seq = ""
@Swarchal
Swarchal / copy_dir_tree.sh
Created January 21, 2017 21:11
copy directory structure without files
rsync -a -f"+ */" -f"- *" source/ destination/
@Swarchal
Swarchal / find_failed.py
Created January 16, 2017 18:34
find failed eddie jobs
#!/usr/bin/env python
"""re-rerun failed jobs
>>> ./find_failed.py $results_dir $path_to_batchlist
"""
import os
from sys import argv
def has_failed(directory, expected):
@Swarchal
Swarchal / train_test_split.py
Created December 21, 2016 11:19
train test split
import random
def train_test_split(data, labels, test_prop=0.3):
"""roll your own train test split"""
assert len(data) == len(labels)
n_test = round(test_prop * len(data))
n_train = len(data) - n_test
combined = list(zip(data, labels))
random.shuffle(combined)
x_train, y_train = zip(*combined[:n_train])
@Swarchal
Swarchal / lexv_explanation.ipynb
Created December 15, 2016 16:32
lexv_explanation
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Swarchal
Swarchal / otsu.jl
Last active November 12, 2016 01:03
otsu threshold
using StatsBase
function otsu_threshold(img, bit_depth = 256)
counts = fit(Histogram, img[:], nbins = bit_depth).weights
const total = prod(size(img))
current_max, threshold = 0, 0
weightB, sumB = 0, 0
sumT = sum([i * counts[i] for i in 1:bit_depth])
for (i, count) in enumerate(counts)
weightB += count
@Swarchal
Swarchal / qstatus.sh
Last active November 2, 2016 19:47
SGE job summary (add to .bashrc)
# define some colours for pretty output
NC='\033[0m'
RED='\033[0;31m'
YELLOW='\033[0;33m'
GREEN='\033[0;32m'
alias=qstatus="echo -e '${NC}----------' ;\
echo -e '${GREEN}running:' ;\
qstat | grep ' r ' | wc -l ;\
echo -e '${NC}----------' ;\