To work with LOGAN Search data, see slide 35 of Rayan Chikhi | Project Logan Assembling all public sequencing data | CGSI 2024 (around 20.32 minute mark), need awscli and zst installed
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# These are meant to work in both Python 2 and 3, except where noted. | |
# See my useful_pandas_snippets.py for those related to dataframes (such as pickling/`df.to_pickle(save_as)`) | |
# https://gist.github.com/fomightez/ef57387b5d23106fabd4e02dab6819b4 | |
# also see https://gist.github.com/fomightez/324b7446dc08e56c83fa2d7af2b89a33 for examples of my | |
# frequently used Python functions and slight variations for more expanded, modular structures. | |
#argparse | |
# good snippet collection at https://mkaz.tech/code/python-argparse-cookbook/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Use `%%capture` to hush 'noisy' stdout and stderr streams, but still combine with getting `%%time` after | |
%%capture out_stream | |
%%time | |
---rest of a cell that does something with LOTS of output-- | |
#In cell after, put following to get time of completion from that: | |
#time it took to run cell above | |
for x in out_stream.stdout.split("\n")[-3:]: | |
print(x) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# List unique values in a DataFrame column | |
df['Column Name'].unique() # Note, `NaN` is included as a unique value. If you just want the number, use `nunique()` which stands | |
# for 'number of unique values'; By default, it excludes `NaN`. `.nunique(dropna=False)` will include `NaN` in the count of unique values. | |
# To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation. | |
df.height | |
df['height'] | |
# are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html | |
# -or- | |
# http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> p4339 natMX plasmid from Boone Lab as described at http://sites.utoronto.ca/boonelab/sga_technology/index.shtml 339-1460 are natMX amplicon insert made from primers 5'-ACATGGAGGCCCAGAATACCC-3' and 5'-CAGTATAGCGACCAGCATTCAC-3' according to Boone Lab | |
AGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGG | |
AAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCC | |
GGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGCTAT | |
TTAGGTGACACTATAGAATACTCAAGCTATGCATCAAGCTTGGTACCGAGCTCGGATCCACTAGTAACGGCCGCCAGTGT | |
GCTGGAATTCGCCCttaaACATGGAGGCCCAGAATACCCtccttgacagtcttgacgtgcgcagctcaggggcatgatgt | |
gactgtcgcccgtacatttagcccatacatccccatgtataatcatttgcatccatacattttgatggccgcacggcgcg | |
aagcaaaaattacggctcctcgctgcagacctgcgagcagggaaacgctcccctcacagacgcgttgaattgtccccacg | |
ccgcgcccctgtagagaaatataaaaggttaggatttgccactgaggttcttctttcatatacttccttttaaaatcttg | |
ctaggatacagttctcacatcacatccgaacataaacaaccatgggtaccactcttgacgacacggcttaccggtaccgc |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
import matplotlib as mpl | |
import re | |
df = pd.DataFrame({ | |
"Gene_name": ["sdsR", "arrS","gadF"], | |
"Genes_in_same_transcription_unit": ['pphA, sdsR', 'arrS','mdtF, mdtE, gadF, gadE'], | |
}) | |
# Convert Gene_name column to a set for quick lookup |
NewerOlder