Skip to content

Instantly share code, notes, and snippets.

View sirselim's full-sized avatar
🎶
Coding to music

Miles sirselim

🎶
Coding to music
View GitHub Profile
@sirselim
sirselim / minknow_hirsute_setup.sh
Created August 31, 2021 08:37
Small bash script to automate MinKNOW and GPU Guppy set up on 21.04 based systems
#!/bin/bash
# author: Miles Benton
# created: 31st Aug 2021
# modified: 31st Aug 2021
#
# Notes:
# small bash script that automates installing and setting up ONT minknow and GPU
# guppy for live basecalling and GPU processing of nanopore data on Ubuntu 21.04
# based releases.
#
@sirselim
sirselim / guppy_basecalling.md
Last active August 30, 2023 03:46
My notes on setting up basecalling on Google Colab

Nanopore basecalling on Google Colab


NOTE: this whole idea is the brain child of Jürgen Hench. He got it up and running and posted about it here. I am merely wrapping the idea in a hopefully easy to follow set of instructions for people to test themseleves.


This notebook describes processing of Nanopore sequencing data (fast5 files) in a Google Colab interactive notebook environment. This is made possible by utalising the GPU enabled runtime that is available via Colab.

@sirselim
sirselim / basecalling_notes.md
Last active August 1, 2023 01:27
a collection of my notes while working on nanopore basecalling on the Jetson Xavier

Jetson Xavier basecalling notes

initial basecalling runs

'fast' flip-flop calling on the Jetson Xavier

guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg -i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive 
@sirselim
sirselim / bioinformatics-scripts
Last active December 10, 2019 03:33
a collection of handy scripts
# handy scripts for bioinformatics
# A collection of scripts that I find useful.
## convert bam to cram format
# define reference genome (required for cram format)
# GENOME="/data/publicData/genomes/human/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" # hg38
GENOME="/data/publicData/genomes/human/GRCh37/hs37d5.fa" # hg19
# find all bam files in current dir and convert to cram
find . -name "*.bam" | sed "s/\.bam$//" | xargs -I {} -P 36 samtools view -@ 4 -T $GENOME -C -o {}.cram {}.bam
@sirselim
sirselim / data_explorer.py
Last active November 27, 2019 07:54
playing with streamlit
import streamlit as st
import pandas as pd
import numpy as np
st.title('Uber pickups in NYC')
DATE_COLUMN = 'date/time'
DATA_URL = ('https://s3-us-west-2.amazonaws.com/'
'streamlit-demo-data/uber-raw-data-sep14.csv.gz')
@sirselim
sirselim / dbNSFP_pipeline_build.sh
Last active November 1, 2023 17:08
small bash script to download dbNSFP 'database' and wrangle into format for pipeline annotation process
#!/bin/bash
## small bash script to download and reformat dbNSFP for pipeline
## Miles Benton
## created: 2018-01-13
## modified: 2019-08-21
# Set to dbNSFP version to download and build
version="4.0a"
#TODO: add an option to 'scrape' this from the url to always return latest version
# define thread number for parallel processing where able
@sirselim
sirselim / CD19_MS_EWAS.csv
Created October 7, 2018 21:48
Ugly R script to generate differential methylation circos plots
We can't make this file beautiful and searchable because it's too large.
IlmnID Infinium_Design_Type Genome_Build CHR MAPINFO Probe_SNPs Probe_SNPs_10 UCSC_RefGene_Name ks.score ks.pval logP median_MS median_HC median_diff abs_median_diff adjP UCSC_RefGene_Group
cg19159092 II 37 1 856059 rs28534711 FLJ39609 0.54 0.001401357 2.853451255 0.369697583 0.235456657 0.134 0.134 1 TSS1500
cg01394461 II 37 1 887576 NOC2L 0.58 0.000405656 3.391842167 0.605999215 0.765577211 -0.16 0.16 1 Body
cg24004483 I 37 1 944783 0.42 0.029913567 1.524131795 0.421027063 0.567238531 -0.146 0.146 1
cg22627753 II 37 1 988623 AGRN 0.5 0.00432085 2.364430777 0.791366927 0.61083939 0.181 0.181 1 Body
cg09864227 I 37 1 1008207 0.58 0.000405656 3.391842167 0.826697106 0.724967037 0.102 0.102 1
cg00300303 II 37 1 1067223 0.5 0.00432085 2.364430777 0.623355286 0.774244123 -0.151 0.151 1
cg15822328 II 37 1 1072197 0.5 0.00432085 2.364430777 0.431884651 0.270809266 0.161 0.161 1
cg08474826 II 37 1 1099630 0.42 0.029913567 1.524131795 0.587936074 0.446153689 0.142 0.142 1
cg06967105 II 37 1 1