Skip to content

Instantly share code, notes, and snippets.

View fomightez's full-sized avatar

Wayne's Bioinformatics Code Portal fomightez

View GitHub Profile
@fomightez
fomightez / README.md
Last active June 23, 2025 15:17
JupyterLab current on Binder with awscli and zstd
@fomightez
fomightez / Centered plots based on Discourse Post.ipynb
Created June 20, 2025 17:48
Centered MAtplotlib plots based on Discourse Post
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fomightez
fomightez / useful_python_snippets.py
Last active June 19, 2025 17:40
Useful Python snippets
# These are meant to work in both Python 2 and 3, except where noted.
# See my useful_pandas_snippets.py for those related to dataframes (such as pickling/`df.to_pickle(save_as)`)
# https://gist.github.com/fomightez/ef57387b5d23106fabd4e02dab6819b4
# also see https://gist.github.com/fomightez/324b7446dc08e56c83fa2d7af2b89a33 for examples of my
# frequently used Python functions and slight variations for more expanded, modular structures.
#argparse
# good snippet collection at https://mkaz.tech/code/python-argparse-cookbook/
@fomightez
fomightez / useful_notebook_snippets
Last active June 19, 2025 16:08
Useful snippets for Jupyter notebooks
# Use `%%capture` to hush 'noisy' stdout and stderr streams, but still combine with getting `%%time` after
%%capture out_stream
%%time
---rest of a cell that does something with LOTS of output--
#In cell after, put following to get time of completion from that:
#time it took to run cell above
for x in out_stream.stdout.split("\n")[-3:]:
print(x)
@fomightez
fomightez / useful_pandas_snippets.py
Last active June 18, 2025 19:02 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
# List unique values in a DataFrame column
df['Column Name'].unique() # Note, `NaN` is included as a unique value. If you just want the number, use `nunique()` which stands
# for 'number of unique values'; By default, it excludes `NaN`. `.nunique(dropna=False)` will include `NaN` in the count of unique values.
# To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation.
df.height
df['height']
# are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html
# -or-
# http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/)
@fomightez
fomightez / Converting_Bytes_to_MBytes.ipynb
Created June 17, 2025 18:47
Converting Bytes to MBytes in the sense used by Sequence Read Archive tables and metadata and Logan Search Results. Allows inter-relating the various numbers given in exported data.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fomightez
fomightez / demo_ipylab_using_documentsearch.ipynb
Created May 16, 2025 16:20
Demonstrate using documentsearch in a Jupyter Notebook via ipylab for programmatic text search, in reply to a Jupyter Discourse thread https://discourse.jupyter.org/t/targeted-documentsearch-command/34957?u=fomightez
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fomightez
fomightez / p4339 natMX plasmid from Boone Lab.fa
Created February 17, 2016 20:11
sequence of p4339 natMX plasmid from Boone Lab as described at http://sites.utoronto.ca/boonelab/sga_technology/index.shtml 339-1460 are natMX amplicon insert made from primers 5'-ACATGGAGGCCCAGAATACCC-3' and 5'-CAGTATAGCGACCAGCATTCAC-3' according to Boone Lab
> p4339 natMX plasmid from Boone Lab as described at http://sites.utoronto.ca/boonelab/sga_technology/index.shtml 339-1460 are natMX amplicon insert made from primers 5'-ACATGGAGGCCCAGAATACCC-3' and 5'-CAGTATAGCGACCAGCATTCAC-3' according to Boone Lab
AGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGG
AAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCC
GGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGCTAT
TTAGGTGACACTATAGAATACTCAAGCTATGCATCAAGCTTGGTACCGAGCTCGGATCCACTAGTAACGGCCGCCAGTGT
GCTGGAATTCGCCCttaaACATGGAGGCCCAGAATACCCtccttgacagtcttgacgtgcgcagctcaggggcatgatgt
gactgtcgcccgtacatttagcccatacatccccatgtataatcatttgcatccatacattttgatggccgcacggcgcg
aagcaaaaattacggctcctcgctgcagacctgcgagcagggaaacgctcccctcacagacgcgttgaattgtccccacg
ccgcgcccctgtagagaaatataaaaggttaggatttgccactgaggttcttctttcatatacttccttttaaaatcttg
ctaggatacagttctcacatcacatccgaacataaacaaccatgggtaccactcttgacgacacggcttaccggtaccgc
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fomightez
fomightez / mre_for_highlight_gene_names.py
Last active March 31, 2025 20:05
For SO https://stackoverflow.com/q/79546920/8508004 - MRE only. At this time the way the color is made doesn't transfer to Excel. You can color whole cell using `props` and then that would transfer
import pandas as pd
import numpy as np
import matplotlib as mpl
import re
df = pd.DataFrame({
"Gene_name": ["sdsR", "arrS","gadF"],
"Genes_in_same_transcription_unit": ['pphA, sdsR', 'arrS','mdtF, mdtE, gadF, gadE'],
})
# Convert Gene_name column to a set for quick lookup