Skip to content

Instantly share code, notes, and snippets.

View fomightez's full-sized avatar

Wayne's Bioinformatics Code Portal fomightez

View GitHub Profile
@conormm
conormm / r-to-python-data-wrangling-basics.md
Last active April 24, 2024 18:22
R to Python: Data wrangling with dplyr and pandas

R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier. The beauty of dplyr is that, by design, the options available are limited. Specifically, a set of key verbs form the core of the package. Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe. Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R. The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

dplyr is organised around six key verbs:

@jdblischak
jdblischak / README.md
Last active March 29, 2019 02:40
kallisto vs. Subread for yeast RNA-seq analysis

Comparing speed for yeast RNA-seq analysis - kallisto vs. Subread

Introduction

[kallisto][] is a new method for processing RNA-seq data. By pseudoaligning reads to a transcriptome instead of aligning reads to a genome, the quantification step is much faster. While the computational speedup will be huge for projects with many samples and/or with organisms with large genomes, I was curious how much time would be saved using [kallisto][] on a small RNA-seq project for an organism with a smaller genome. To perform this comparison, I downloaded 6 fastq files from a recent yeast RNA-seq study on GEO. I chose [Subread][subread] as the comparison method because it performs read alignment but is optimized for quickly obtaining gene counts (it soft clips reads instead of trying to map exact exon-exon boundaries).

@kantale
kantale / karyoplot.py
Created March 2, 2015 00:17
Plot chromosome Ideograms with karyotype with matplotlib
import os
import matplotlib
from matplotlib.patches import Circle, Wedge, Polygon, Rectangle
from matplotlib.collections import PatchCollection
import matplotlib.pyplot as plt
def karyoplot(karyo_filename, metadata={}, part=1):
'''
To create a karyo_filename go to: http://genome.ucsc.edu/cgi-bin/hgTables
@bsweger
bsweger / useful_pandas_snippets.md
Last active April 19, 2024 18:04
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@slowkow
slowkow / GTF.py
Last active March 6, 2024 02:05
GTF.py is a simple module for reading GTF and GFF files
#!/usr/bin/env python
"""
GTF.py
Kamil Slowikowski
December 24, 2013
Read GFF/GTF files. Works with gzip compressed files and pandas.
http://useast.ensembl.org/info/website/upload/gff.html
@langner
langner / pubmed_search.py
Last active January 29, 2022 14:52
A class that searches Pubmed for a list of PMIDs via the BioPython Entrez module and returns the results in a simpler dictionary format.
"""Tools for searching Pubmed for a list of PMIDs.
The goal here is to search for many PMIDs at once, since searching
sequentially can take a long time. Using the the BioPython Entrez module
is super convenient to this end.
The results results are returned in a simple dictionary format.
"""
@sloria
sloria / bobp-python.md
Last active May 12, 2024 06:54
A "Best of the Best Practices" (BOBP) guide to developing in Python.

The Best of the Best Practices (BOBP) Guide for Python

A "Best of the Best Practices" (BOBP) guide to developing in Python.

In General

Values

  • "Build tools for others that you want to be built for you." - Kenneth Reitz
  • "Simplicity is alway better than functionality." - Pieter Hintjens
@minrk
minrk / nbstripout
Last active June 6, 2023 06:23
git pre-commit hook for stripping output from IPython notebooks
#!/usr/bin/env python
"""strip outputs from an IPython Notebook
Opens a notebook, strips its output, and writes the outputless version to the original file.
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS.
This does mostly the same thing as the `Clear All Output` command in the notebook UI.
LICENSE: Public Domain
@fomightez
fomightez / compositioncalc2.ipynb
Last active December 20, 2015 08:39
compositioncalc2.py from Practical Computing for Biologists by Steven H. D. Haddock and Casey W. Dunn AS A STATIC IPYTHON Notebook. Posted as a Gist by Wayne Decatur (fomightez) with full credit and reference to the original authors and note where the freely share the code online. You can see an interactive IPython gist of this at https://www.py…
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fomightez
fomightez / compositioncalc2.py
Last active December 20, 2015 07:59
compositioncalc2.py from Practical Computing for Biologists by Steven H. D. Haddock and Casey W. Dunn. Posted as a Gist by Wayne Decatur (fomightez) with full credit and reference to the original authors and specifying where they freely share the code online. You can see a static IPython Notebook version at http://nbviewer.ipython.org/6102154
# code by Steven H. D. Haddock and Casey W. Dunn as described in:
# Practical Computing for Biologists
# Steven H. D. Haddock and Casey W. Dunn
# Published in 2011 by Sinauer Associates.
# ISBN 978-0-87893-391-4
# http://www.sinauer.com/practical-computing-for-biologists.html
# see practicalcomputing.org
#
#scripts freely available by the original authors at practicalcomputing.org
#DIRECT LINK: http://practicalcomputing.org/files/pcfb_examples.zip