Wayne's Bioinformatics Code Portal fomightez

## bobp-python.md

      
              1 file
            
          
              569 forks
            
          
              132 comments
            
          
              3227 stars
            
          
                sloria
                / bobp-python.md
            
            
              Last active
              May 1, 2024 08:37
            
              
                A "Best of the Best Practices" (BOBP) guide to developing in Python.
              
          
    The Best of the Best Practices (BOBP) Guide for Python

A "Best of the Best Practices" (BOBP) guide to developing in Python.
In General

Values


"Build tools for others that you want to be built for you." - Kenneth Reitz
"Simplicity is alway better than functionality." - Pieter Hintjens


## Install ffmpeg on AWS Linux AMI
#

sudo su -

cd /usr/local/bin
mkdir ffmpeg

cd ffmpeg
wget https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.2.1-amd64-static.tar.xz
tar xvf ffmpeg-4.2.1-amd64-static.tar.xz

## r-to-python-data-wrangling-basics.md

      
              1 file
            
          
              101 forks
            
          
              38 comments
            
          
              402 stars
            
          
                conormm
                / r-to-python-data-wrangling-basics.md
            
            
              Last active
              April 24, 2024 18:22
            
              
                R to Python: Data wrangling with dplyr and pandas
              
          
    R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs:

  
## useful_pandas_snippets.py
# List unique values in a DataFrame column
df['Column Name'].unique()

# To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation.
df.height
df['height']
# are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html
# -or-
# http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/)

## useful_pandas_snippets.md

      
              1 file
            
          
              637 forks
            
          
              63 comments
            
          
              1441 stars
            
          
                bsweger
                / useful_pandas_snippets.md
            
            
              Last active
              April 19, 2024 18:04
            
              
                Useful Pandas Snippets
              
          
    Useful Pandas Snippets

A personal diary of DataFrame munging over the years.
Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)

(h/t @makmanalp)

  
## PythonLogo.ipynb

      
              1 file
            
          
              3 forks
            
          
              0 comments
            
          
              16 stars
            
          
                jakevdp
                / PythonLogo.ipynb
            
            
              Last active
              April 7, 2024 18:40
            
              
                Creating the Python Logo in Matplotlib
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## last_lines.py
import mmap


def get_last_lines(path: str, count: int) -> list[str]:
    """Get count last lines from a file."""
    with open(path, "r+b") as text_file:
        text_mmap = mmap.mmap(text_file.fileno(), 0, mmap.ACCESS_READ)
        position = len(text_mmap)
        while count and (position := text_mmap.rfind(b"\n", 0, position)) != -1:
            count -= 1

## GTF.py
#!/usr/bin/env python
"""
GTF.py
Kamil Slowikowski
December 24, 2013

Read GFF/GTF files. Works with gzip compressed files and pandas.

    http://useast.ensembl.org/info/website/upload/gff.html

## roll_ipython_in_aws.md

      
              1 file
            
          
              76 forks
            
          
              43 comments
            
          
              236 stars
            
          
                iamatypeofwalrus
                / roll_ipython_in_aws.md
            
            
              Last active
              January 22, 2024 11:18
            
              
                Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.
              
          
    What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.
What are we using? What do you need?


An active AWS account. First time sign-ups are eligible for the free tier for a year
One Micro Tier EC2 Instance
With AWS we will use the stock Ubuntu Server AMI and customize it.
Anaconda for Python.
Coffee/Beer/Time


## ncbi_taxonomy.py
import os
from functools import lru_cache
from collections import defaultdict

# Read in the taxonomy
class NCBITaxonomy():
    def __init__(self, folder):
        self.tax = defaultdict(dict)
        # Read in the file of taxid information
        names_fp = os.path.join(folder, 'names.dmp')
	#

	sudo su -

	cd /usr/local/bin
	mkdir ffmpeg

	cd ffmpeg
	wget https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.2.1-amd64-static.tar.xz
	tar xvf ffmpeg-4.2.1-amd64-static.tar.xz
	# List unique values in a DataFrame column
	df['Column Name'].unique()

	# To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation.
	df.height
	df['height']
	# are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html
	# -or-
	# http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/)
	import mmap


	def get_last_lines(path: str, count: int) -> list[str]:
	"""Get count last lines from a file."""
	with open(path, "r+b") as text_file:
	text_mmap = mmap.mmap(text_file.fileno(), 0, mmap.ACCESS_READ)
	position = len(text_mmap)
	while count and (position := text_mmap.rfind(b"\n", 0, position)) != -1:
	count -= 1
	#!/usr/bin/env python
	"""
	GTF.py
	Kamil Slowikowski
	December 24, 2013

	Read GFF/GTF files. Works with gzip compressed files and pandas.

	http://useast.ensembl.org/info/website/upload/gff.html
	import os
	from functools import lru_cache
	from collections import defaultdict

	# Read in the taxonomy
	class NCBITaxonomy():
	def __init__(self, folder):
	self.tax = defaultdict(dict)
	# Read in the file of taxid information
	names_fp = os.path.join(folder, 'names.dmp')