A "Best of the Best Practices" (BOBP) guide to developing in Python.
- "Build tools for others that you want to be built for you." - Kenneth Reitz
- "Simplicity is alway better than functionality." - Pieter Hintjens
# | |
sudo su - | |
cd /usr/local/bin | |
mkdir ffmpeg | |
cd ffmpeg | |
wget https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.2.1-amd64-static.tar.xz | |
tar xvf ffmpeg-4.2.1-amd64-static.tar.xz |
The dplyr
package in R makes data wrangling significantly easier.
The beauty of dplyr
is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas
package).
dplyr is organised around six key verbs:
# List unique values in a DataFrame column | |
df['Column Name'].unique() | |
# To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation. | |
df.height | |
df['height'] | |
# are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html | |
# -or- | |
# http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/) |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
import mmap | |
def get_last_lines(path: str, count: int) -> list[str]: | |
"""Get count last lines from a file.""" | |
with open(path, "r+b") as text_file: | |
text_mmap = mmap.mmap(text_file.fileno(), 0, mmap.ACCESS_READ) | |
position = len(text_mmap) | |
while count and (position := text_mmap.rfind(b"\n", 0, position)) != -1: | |
count -= 1 |
#!/usr/bin/env python | |
""" | |
GTF.py | |
Kamil Slowikowski | |
December 24, 2013 | |
Read GFF/GTF files. Works with gzip compressed files and pandas. | |
http://useast.ensembl.org/info/website/upload/gff.html |
Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.
import os | |
from functools import lru_cache | |
from collections import defaultdict | |
# Read in the taxonomy | |
class NCBITaxonomy(): | |
def __init__(self, folder): | |
self.tax = defaultdict(dict) | |
# Read in the file of taxid information | |
names_fp = os.path.join(folder, 'names.dmp') |