Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save fomightez/ed79e33e97601d839dd550fd224d583c to your computer and use it in GitHub Desktop.

Select an option

Save fomightez/ed79e33e97601d839dd550fd224d583c to your computer and use it in GitHub Desktop.
Useful snippets and examples for when converting command line commands from Jupyter/IPython back to Pure Python
#Useful examples for when converting command line commands from Jupyter/IPython back to Pure Python
# This is party for when need to speed up a `.ipy` script running. It will run much faster as `.py` than as `.ipy` if there
# are a lot of calls to command line / shell commands because saves time by not spawning new shell instance for
# each. (`.ipy` version great for quicker development and proto-typing but `.py` MUCH FASTER for running.)
# The Python versions also have the advantage that you can use them inside functions (I think) because don't have problem like
# with `!cp fn unsanitized_{fn}`or `%store` where actually run in global namespace which cannot see Python variable `fn`
# local to the function.
# RELATED NOTE: You can use the IPython `history` (via the "hist command (with -n to remove line numbers)") to
# help convert `.ipy` code or Jupyter code with exclamation marks and shell commands BACK TO PYTHON, see
# https://stackoverflow.com/a/1040640/8508004 (especially also see the comment by Michael Scott Cuthbert ).
# Note these can guide the way to stil use IPython in scripts -- `get_ipython()` doesn't work in pure Python, it seems)
#convert cell magic using shell bash code and passed in python variables:
# https://stackoverflow.com/a/74824151/8508004 (`get_ipython()` use) and https://stackoverflow.com/a/15898875/8508004 (`run_line_magic()` & `run_cell_magic()` use)
# IPython/Jupyter cell version of using command to copy fie
!cp fn unsanitized_{fn}
# Python version `cp`
sanitized_fn_name = f"unsanitized_input_{fn}.txt"
from shutil import copyfile
copyfile(fn, sanitized_fn_name)
# IPython/Jupyter cell version of calling a script to run it in the current notebook namespace
%run -i <script_name>.py
# Python version of calling a script to run it in the current namespace
# Note GREAT FOR HELPING SPEEDY DEVELOPMENT / PROTOPYING IN JUPYTER ENVIRONMENT BECAUSE YOU DON'T NEED TO PLACE
# ALL CODDE IN A FUNCTION AND HAVE `main`, etc.. (In long run is is best you do like https://stackoverflow.com/a/1186847/8508004 ; but sometimes refactoring is a drag after you make lots of progress.)
#This can be used to chain calling python scripts and be able to access a variable in the original namespace;
# for example in Jupyter you can use `%run -i <script_name>.py` to call a python script and then in that python script use
# `exec(open("<yet_another_script_name>.py").read())` to call a script from within that and have access to assigned variables
# and better yet you can use function `suppress_stdout_stderr()` (see my Python tips) as context for placeing
# `exec(open("<yet_another_script_name>.py").read())` so you can run the code with out the final script stderr showing since
# it may not be the normal way you'd call that script so the messages are 'off' or you just don't want to
# make it noisy for a user who doesn't care where you are calling from. Also nice if you already have much of backbone code
# needed in one script, so you can add python to the new script to get the old and use the suggestions in this paragraph
# to run it. CAN SAVE YOU FROM EDITING CODE IN TWO PLACES!(If you didn't structure it easily for an import.)
exec(open("<script_name>.py").read()) # based on https://stackoverflow.com/a/1186818/8508004
# IPython/Jupyter version of unzip WHERE FILE NAME HAS SPACES
!unzip "{data_file_name}"
# Python version of unzip WHERE FILE NAME HAS SPACES
cmd = f'unzip "{data_file_name}"' # based on comment below
# https://stackoverflow.com/a/30212621/8508004 ; because file from user had
# spaces in it
os.system(cmd)
# IPython/Jupyter cell version of using command to move / rename fie
!mv unpack "R7 for Wayne.zip"
# Python version `mv` / rename
from shutil import move
move("unpack", "R7 for Wayne")
# IPython/Jupyter version of `tar`
!tar czf {archive_file_name} {" ".join(files_produced)}
# Python version of `tar`
os.system(f'tar czf {archive_file_name} {" ".join(files_produced)}')
# also see [All The Ways to Compress and Archive Files in Python](https://towardsdatascience.com/all-the-ways-to-compress-and-archive-files-in-python-e8076ccedb4b)
# IPython/Jupyter cell version of using command to see end of file
!tail -n +2 {fn} >{name_for_f_without_first_line}
# Python version `tail` command
os.system(f"tail -n +2 {fn} >{name_for_f_without_first_line}")
# IPython/Jupyter cell version of using command to see first two lines of file and assigning each line to a list
first_two_lines_list = !head -2 {x}
# Python version of using command to see first two lines of file and assigning to a list
first_two_lines_list = !head -2 {x} #when use that it goes to a string
# and not bytes but the subprocess way produces bytes so changed to add
# conversion to string so didn't need to change any other code later
first_two_lines_list = subprocess.check_output(
f"head -2 {x}", shell=True).split()
first_two_lines_list = (
[ft.decode("utf-8") for ft in first_two_lines_list]) #the subprocess
# way produces bytes so changed to add conversion to string so didn't
# need to change any other code later
# IPython/Jupyter cell version of using command to delete a file / erase a file
!rm {temp_file_name}
# Python version `rm`
os.remove(temp_file_name)
# IPython/Jupyter cell version of using command to delete a directory even if not empty
!rm -rf {directory_name}
# Python version `rm -rf`
import shutil
shutil.rmtree(directory_name)
# note if the directory is empty you can just use `os.rmdir(directory_name)`, see https://linuxize.com/post/python-delete-files-and-directories/
# and https://twitter.com/driscollis/status/1442947656095965186
# IPython/Jupyter cell version of `curl`
if not os.path.isfile(file_needed):
!curl -OL https://raw.githubusercontent.com/fomightez/sequencework/master/bendit_standalone-utilities/{file_needed}
# Universal Python version of `curl` using Requests package (keep in mind for data tables, Pandas' read methods can mostly take a URL)
file_needed = "similarities_in_proteinprotein_interactions.py"
import requests
url = ("https://raw.githubusercontent.com/fomightez/structurework/master"
"/pdbsum-utilities/"+file_needed)
r = requests.get(url, allow_redirects=True)
with open(file_needed, 'wb') as streamhandler:
streamhandler.write(r.content)
# Universal Python version of `curl` using the urllib3 library, which offers more advanced retry and streaming capabilities, to get
# example script from GitHub
def get_content_atURL_with_URLLIB3(url, chunk_size=64):
'''
Get content with the urllib3 library, which offers more advanced retry and
streaming capabilities than Requests.
And works in JupyterLite.
'''
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where(),
retries=urllib3.Retry(
total=3,
backoff_factor=0.1,
status_forcelist=[500, 502, 503, 504]
)
)
try:
response = http.request('GET', url, preload_content=False)
collected = ''
chunk_count = 0
while True:
chunk = response.read(chunk_size)
if not chunk:
break
chunk_count += 1
collected += chunk.decode(errors='ignore')
response.release_conn()
return collected, chunk_count
except urllib3.exceptions.ChunkedEncodingError as ex:
print(f"Specific ChunkedEncodingError: {ex}")
return collected, chunk_count
except Exception as ex:
print(f"General error: {ex}")
return None, 0
def get_script_using_URLLIB3(script_needed):
'''
Get script using the urllib3 library, which offers more advanced retry and
streaming capabilities than Requests.
And works in JupyterLite.
'''
if not os.path.isfile(script_needed):
url = ("https://raw.githubusercontent.com/fomightez/structurework/master"
"/PDBmodelComparator-utilities/"+script_needed)
r_text, _ = get_content_atURL_with_URLLIB3(url)
with open(script_needed, 'w') as filehandler:
filehandler.write(r_text)
import os
import urllib3
import certifi
script_needed = "missing_residue_detailer.py"
get_script_using_URLLIB3(script_needed)
# Universal Python version of `curl` using the urllib3 library, which offers more advanced retry and streaming capabilities,
# along Example Protein Data Bank's Direct file access server with CORS headers. Unlke the one with requests below, this one works
# with header from large PDB records. And still works in JupyterLite, too.
import urllib3
import certifi
def fetch_pdb_headerURLLIB3(pdb_id, chunk_size=64):
url = f'https://files.rcsb.org/header/{pdb_id.upper()}.pdb'
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where(),
retries=urllib3.Retry(
total=3,
backoff_factor=0.1,
status_forcelist=[500, 502, 503, 504]
)
)
try:
response = http.request('GET', url, preload_content=False)
collected = ''
chunk_count = 0
while True:
chunk = response.read(chunk_size)
if not chunk:
break
chunk_count += 1
collected += chunk.decode(errors='ignore')
response.release_conn()
return collected, chunk_count
except urllib3.exceptions.ChunkedEncodingError as ex:
print(f"Specific ChunkedEncodingError: {ex}")
return collected, chunk_count
except Exception as ex:
print(f"General error: {ex}")
return None, 0
# Example usage
pdb_id = '4DQO'
header, chunks = fetch_pdb_headerURLLIB3(pdb_id)
print(f"Retrieved header in {chunks} chunks")
print(header[:1000]) # Print first 1000 characters
# Universal Python version of `curl` using Requests package Example Protein Data Bank's Direct file access server with CORS headers
# so that it works with browser-based computing environments powered by WebAssembly (WASM), like Pyodide / JupyterLite!
# But fails in MyBinder-served sessions with large PDB file recorrds data, like 4dqo. See https://github.com/fomightez/structurework/blob/master/PDBmodelComparator-utilities/missing_residue_detailer.py and https://github.com/fomightez/sequencework/blob/master/LookUpTaxon/LookUpTaxonFA.py, where built into a script, the former of which combines multiple ways to insure gets header.
def fetch_pdbheader_using_requests(pdb_id):
"""
Take a PDB accession code and return the PDB file header using RCSB's direct file server that happens to have CORS headers enabled
See https://www.wwpdb.org/ftp/pdb-ftp-sites
from https://github.com/fomightez/structurework/blob/master/PDBmodelComparator-utilities/missing_residue_detailer.py
"""
url = f'https://files.rcsb.org/header/{pdb_id.upper()}.pdb'
import requests
response = requests.get(url, allow_redirects=True)
response.raise_for_status() # Raise an exception for non-200 status codes
return response.text
header = fetch_pdbheader_using_requests("1d66")
# Python way to call `curl` if sure on unix-based machine
if not os.path.isfile(file_needed):
os.system("curl -OL https://raw.githubusercontent.com/"\
"fomightez/sequencework/master/"\
f"bendit_standalone-utilities/{file_needed}")
# Alternative Python version of `curl` using `sh` module; based on code in https://github.com/fomightez/sequencework/blob/master/omega-presence/find_mito_fungal_lsu_rRNA_and_check_for_omega_intron.py
file_needed = "check_for_omega_intron.py"
if not os.path.isfile(file_needed):
sys.stderr.write("\nObtaining script containing function to use "
"...\n")
# based on http://amoffat.github.io/sh/
from sh import curl
curl("-OL",
"https://raw.githubusercontent.com/fomightez/sequencework/"
"master/omega-presence/"+file_needed)
# verify that worked & ask for it to be done manually if fails
if not os.path.isfile(file_needed):
github_link = ("https://github.com/fomightez/sequencework/tree"
"/master/omega-presence")
sys.stderr.write("\n'+file_needed+' not found. "
"Please add it to your current working\ndirectory from {}"
".\n**EXITING !!**.\n".format(github_link))
sys.exit(1)
# R kernel in Jupyter versions of `curl`
system("curl -OL http://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.52_FB2023_03/gtf/dmel-all-r6.52.gtf.gz", intern=TRUE) # based on https://github.com/aws/amazon-sagemaker-examples/issues/912
# IPython/Jupyter cell version of `grep`
!grep -r -i "Q6NW40" *
# Python version of `grep` with recursive directory traversal and wildcard use
from sh import grep
from sh import glob
l = grep("-r","-i","Q6NW40",glob('*')) # asterisk wildcard use based on https://stackoverflow.com/a/32923739/8508004 ;
# also see http://amoffat.github.io/sh/sections/faq.html?highlight=glob#why-doesn-t-work-as-a-command-argument
# IPython/Jupyter cell version of using command to run a script and make it a notebook
!jupytext --to notebook --execute {plots4review_fn[:-6]+".py"}
# Python version of using command to run a script and make it a notebook
os.system(f'jupytext --to notebook --execute {plots4review_fn[:-6]+".py"}')
# IPython/Jupyter cell version of running shell command
with io.capture_output() as captured:
!bendIt -s {fasta_file_name_for_merged} -o {output_file_suffix} -c \
{curvature_window_size} -b {other_metric_reporting_window_size} \
-g {report_with_curvature_settings_corrspndnce[report_with_curvature]} \
--xmin {curvature_window_size} --xmax {end_range}
# Python version of running shell command
with io.capture_output() as captured:
os.system(f"bendIt -s {fasta_file_name_for_merged} -o \
{output_file_suffix} -c {curvature_window_size} -b \
{other_metric_reporting_window_size} -g \
{report_with_curvature_settings_corrspndnce[report_with_curvature]} \
--xmin {curvature_window_size} --xmax {end_range}")
# Related: Python version of checking software to run in shell is installed in environment; based on code in https://github.com/fomightez/sequencework/blob/master/omega-presence/find_mito_fungal_lsu_rRNA_and_check_for_omega_intron.py
sys.stderr.write("Checking blastn (BLAST+) installed"
"...\n")
try:
cmd="blastn -version"
result = subprocess.check_output(cmd, shell=True) # based on
# https://stackoverflow.com/a/18739828/8508004
result = result.decode("utf-8") # so it isn't bytes
if "blastn:" in result:
sys.stderr.write("Detected '{}'...\n".format(
result.replace("\n","")))
except CalledProcessError:
sys.stderr.write("\nblastn not detected. Please install BLAST+ or "
"run in an\nenvironment launched from "
"https://github.com/fomightez/blast-binder.\n**EXITING !!**.\n")
sys.exit
# Related to replacing use of wget and getting data
# https://twitter.com/fperez_org/status/1417616732210941954 July 2021
# >'I'd mentioned wget as in "that's what I'd use at the terminal, but now I want to build a more reusable workflow off Python tools." So your requests-cache suggestion fits perfectly.'
# That was in reply to "Py/Data tweeps - what's your favorite, preferably decorator-enabled tool/lib to fetch-and-cache data these days? Use case is small-to-medium-sized, http-based, csv/hdf/etc formats that I'd wget manually and can store locally."
#Related to use of SCP:
#https://twitter.com/driscollis/status/1430942020424675329 August 2021
#>"In less than 20 lines of #Python code, you can SFTP / SSH a file from your local server to a remote server using Paramiko:" <--- see code in image
#>"I have a better example on my blog, @mousevspython , where I also implement the ability to download a file from the remote server:"
Related note on path for when you don't know 100% if users will be on unix-y system:
"subprocess is the wrong tool for this job.`import os.path` & `mypath = os.path.abspath(os.path.dirname(os.getcwd()))`
...is both faster and portable to non-UNIXy operating systems." - [SOURCE](https://stackoverflow.com/a/74350790/8508004)
# mkdir
#`mkdir` in shell is `os.mkdir()` in Python
# However, if anything but simple directory making where know it won't overwirte, then
# use Python 3 the new 'Create parent directories if needed' ability in `os.makedirs()`, like so:
os.makedirs("/tmp/path/to/desired/directory", exist_ok=True)` # SEE https://stackoverflow.com/a/600612/8508004
# So it will make parents or even just the ONE directory needed if need be and not overwrite.
# so use for making from a list:
[os.makedirs(x, exist_ok=True) for x in pool_dirs_to_make]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment