Skip to content

Instantly share code, notes, and snippets.

View fomightez's full-sized avatar

Wayne's Bioinformatics Code Portal fomightez

View GitHub Profile
@sloria
sloria / bobp-python.md
Last active May 1, 2024 08:37
A "Best of the Best Practices" (BOBP) guide to developing in Python.

The Best of the Best Practices (BOBP) Guide for Python

A "Best of the Best Practices" (BOBP) guide to developing in Python.

In General

Values

  • "Build tools for others that you want to be built for you." - Kenneth Reitz
  • "Simplicity is alway better than functionality." - Pieter Hintjens
#
sudo su -
cd /usr/local/bin
mkdir ffmpeg
cd ffmpeg
wget https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.2.1-amd64-static.tar.xz
tar xvf ffmpeg-4.2.1-amd64-static.tar.xz
@conormm
conormm / r-to-python-data-wrangling-basics.md
Last active April 24, 2024 18:22
R to Python: Data wrangling with dplyr and pandas

R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier. The beauty of dplyr is that, by design, the options available are limited. Specifically, a set of key verbs form the core of the package. Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe. Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R. The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

dplyr is organised around six key verbs:

@fomightez
fomightez / useful_pandas_snippets.py
Last active April 19, 2024 18:25 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
# List unique values in a DataFrame column
df['Column Name'].unique()
# To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation.
df.height
df['height']
# are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html
# -or-
# http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/)
@bsweger
bsweger / useful_pandas_snippets.md
Last active April 19, 2024 18:04
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@jakevdp
jakevdp / PythonLogo.ipynb
Last active April 7, 2024 18:40
Creating the Python Logo in Matplotlib
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@willmcgugan
willmcgugan / last_lines.py
Created March 2, 2024 16:10
Get the last lines from a file
import mmap
def get_last_lines(path: str, count: int) -> list[str]:
"""Get count last lines from a file."""
with open(path, "r+b") as text_file:
text_mmap = mmap.mmap(text_file.fileno(), 0, mmap.ACCESS_READ)
position = len(text_mmap)
while count and (position := text_mmap.rfind(b"\n", 0, position)) != -1:
count -= 1
@slowkow
slowkow / GTF.py
Last active March 6, 2024 02:05
GTF.py is a simple module for reading GTF and GFF files
#!/usr/bin/env python
"""
GTF.py
Kamil Slowikowski
December 24, 2013
Read GFF/GTF files. Works with gzip compressed files and pandas.
http://useast.ensembl.org/info/website/upload/gff.html
@iamatypeofwalrus
iamatypeofwalrus / roll_ipython_in_aws.md
Last active January 22, 2024 11:18
Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.

What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.

What are we using? What do you need?

  • An active AWS account. First time sign-ups are eligible for the free tier for a year
  • One Micro Tier EC2 Instance
  • With AWS we will use the stock Ubuntu Server AMI and customize it.
  • Anaconda for Python.
  • Coffee/Beer/Time
@sminot
sminot / ncbi_taxonomy.py
Last active January 7, 2024 09:37
Class for using the NCBI taxonomy, reading from taxdump files
import os
from functools import lru_cache
from collections import defaultdict
# Read in the taxonomy
class NCBITaxonomy():
def __init__(self, folder):
self.tax = defaultdict(dict)
# Read in the file of taxid information
names_fp = os.path.join(folder, 'names.dmp')