Last active
August 12, 2025 12:30
-
-
Save fomightez/ed79e33e97601d839dd550fd224d583c to your computer and use it in GitHub Desktop.
Useful snippets and examples for when converting command line commands from Jupyter/IPython back to Pure Python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #Useful examples for when converting command line commands from Jupyter/IPython back to Pure Python | |
| # This is party for when need to speed up a `.ipy` script running. It will run much faster as `.py` than as `.ipy` if there | |
| # are a lot of calls to command line / shell commands because saves time by not spawning new shell instance for | |
| # each. (`.ipy` version great for quicker development and proto-typing but `.py` MUCH FASTER for running.) | |
| # The Python versions also have the advantage that you can use them inside functions (I think) because don't have problem like | |
| # with `!cp fn unsanitized_{fn}`or `%store` where actually run in global namespace which cannot see Python variable `fn` | |
| # local to the function. | |
| # RELATED NOTE: You can use the IPython `history` (via the "hist command (with -n to remove line numbers)") to | |
| # help convert `.ipy` code or Jupyter code with exclamation marks and shell commands BACK TO PYTHON, see | |
| # https://stackoverflow.com/a/1040640/8508004 (especially also see the comment by Michael Scott Cuthbert ). | |
| # Note these can guide the way to stil use IPython in scripts -- `get_ipython()` doesn't work in pure Python, it seems) | |
| #convert cell magic using shell bash code and passed in python variables: | |
| # https://stackoverflow.com/a/74824151/8508004 (`get_ipython()` use) and https://stackoverflow.com/a/15898875/8508004 (`run_line_magic()` & `run_cell_magic()` use) | |
| # IPython/Jupyter cell version of using command to copy fie | |
| !cp fn unsanitized_{fn} | |
| # Python version `cp` | |
| sanitized_fn_name = f"unsanitized_input_{fn}.txt" | |
| from shutil import copyfile | |
| copyfile(fn, sanitized_fn_name) | |
| # IPython/Jupyter cell version of calling a script to run it in the current notebook namespace | |
| %run -i <script_name>.py | |
| # Python version of calling a script to run it in the current namespace | |
| # Note GREAT FOR HELPING SPEEDY DEVELOPMENT / PROTOPYING IN JUPYTER ENVIRONMENT BECAUSE YOU DON'T NEED TO PLACE | |
| # ALL CODDE IN A FUNCTION AND HAVE `main`, etc.. (In long run is is best you do like https://stackoverflow.com/a/1186847/8508004 ; but sometimes refactoring is a drag after you make lots of progress.) | |
| #This can be used to chain calling python scripts and be able to access a variable in the original namespace; | |
| # for example in Jupyter you can use `%run -i <script_name>.py` to call a python script and then in that python script use | |
| # `exec(open("<yet_another_script_name>.py").read())` to call a script from within that and have access to assigned variables | |
| # and better yet you can use function `suppress_stdout_stderr()` (see my Python tips) as context for placeing | |
| # `exec(open("<yet_another_script_name>.py").read())` so you can run the code with out the final script stderr showing since | |
| # it may not be the normal way you'd call that script so the messages are 'off' or you just don't want to | |
| # make it noisy for a user who doesn't care where you are calling from. Also nice if you already have much of backbone code | |
| # needed in one script, so you can add python to the new script to get the old and use the suggestions in this paragraph | |
| # to run it. CAN SAVE YOU FROM EDITING CODE IN TWO PLACES!(If you didn't structure it easily for an import.) | |
| exec(open("<script_name>.py").read()) # based on https://stackoverflow.com/a/1186818/8508004 | |
| # IPython/Jupyter version of unzip WHERE FILE NAME HAS SPACES | |
| !unzip "{data_file_name}" | |
| # Python version of unzip WHERE FILE NAME HAS SPACES | |
| cmd = f'unzip "{data_file_name}"' # based on comment below | |
| # https://stackoverflow.com/a/30212621/8508004 ; because file from user had | |
| # spaces in it | |
| os.system(cmd) | |
| # IPython/Jupyter cell version of using command to move / rename fie | |
| !mv unpack "R7 for Wayne.zip" | |
| # Python version `mv` / rename | |
| from shutil import move | |
| move("unpack", "R7 for Wayne") | |
| # IPython/Jupyter version of `tar` | |
| !tar czf {archive_file_name} {" ".join(files_produced)} | |
| # Python version of `tar` | |
| os.system(f'tar czf {archive_file_name} {" ".join(files_produced)}') | |
| # also see [All The Ways to Compress and Archive Files in Python](https://towardsdatascience.com/all-the-ways-to-compress-and-archive-files-in-python-e8076ccedb4b) | |
| # IPython/Jupyter cell version of using command to see end of file | |
| !tail -n +2 {fn} >{name_for_f_without_first_line} | |
| # Python version `tail` command | |
| os.system(f"tail -n +2 {fn} >{name_for_f_without_first_line}") | |
| # IPython/Jupyter cell version of using command to see first two lines of file and assigning each line to a list | |
| first_two_lines_list = !head -2 {x} | |
| # Python version of using command to see first two lines of file and assigning to a list | |
| first_two_lines_list = !head -2 {x} #when use that it goes to a string | |
| # and not bytes but the subprocess way produces bytes so changed to add | |
| # conversion to string so didn't need to change any other code later | |
| first_two_lines_list = subprocess.check_output( | |
| f"head -2 {x}", shell=True).split() | |
| first_two_lines_list = ( | |
| [ft.decode("utf-8") for ft in first_two_lines_list]) #the subprocess | |
| # way produces bytes so changed to add conversion to string so didn't | |
| # need to change any other code later | |
| # IPython/Jupyter cell version of using command to delete a file / erase a file | |
| !rm {temp_file_name} | |
| # Python version `rm` | |
| os.remove(temp_file_name) | |
| # IPython/Jupyter cell version of using command to delete a directory even if not empty | |
| !rm -rf {directory_name} | |
| # Python version `rm -rf` | |
| import shutil | |
| shutil.rmtree(directory_name) | |
| # note if the directory is empty you can just use `os.rmdir(directory_name)`, see https://linuxize.com/post/python-delete-files-and-directories/ | |
| # and https://twitter.com/driscollis/status/1442947656095965186 | |
| # IPython/Jupyter cell version of `curl` | |
| if not os.path.isfile(file_needed): | |
| !curl -OL https://raw.githubusercontent.com/fomightez/sequencework/master/bendit_standalone-utilities/{file_needed} | |
| # Universal Python version of `curl` using Requests package (keep in mind for data tables, Pandas' read methods can mostly take a URL) | |
| file_needed = "similarities_in_proteinprotein_interactions.py" | |
| import requests | |
| url = ("https://raw.githubusercontent.com/fomightez/structurework/master" | |
| "/pdbsum-utilities/"+file_needed) | |
| r = requests.get(url, allow_redirects=True) | |
| with open(file_needed, 'wb') as streamhandler: | |
| streamhandler.write(r.content) | |
| # Universal Python version of `curl` using the urllib3 library, which offers more advanced retry and streaming capabilities, to get | |
| # example script from GitHub | |
| def get_content_atURL_with_URLLIB3(url, chunk_size=64): | |
| ''' | |
| Get content with the urllib3 library, which offers more advanced retry and | |
| streaming capabilities than Requests. | |
| And works in JupyterLite. | |
| ''' | |
| http = urllib3.PoolManager( | |
| cert_reqs='CERT_REQUIRED', | |
| ca_certs=certifi.where(), | |
| retries=urllib3.Retry( | |
| total=3, | |
| backoff_factor=0.1, | |
| status_forcelist=[500, 502, 503, 504] | |
| ) | |
| ) | |
| try: | |
| response = http.request('GET', url, preload_content=False) | |
| collected = '' | |
| chunk_count = 0 | |
| while True: | |
| chunk = response.read(chunk_size) | |
| if not chunk: | |
| break | |
| chunk_count += 1 | |
| collected += chunk.decode(errors='ignore') | |
| response.release_conn() | |
| return collected, chunk_count | |
| except urllib3.exceptions.ChunkedEncodingError as ex: | |
| print(f"Specific ChunkedEncodingError: {ex}") | |
| return collected, chunk_count | |
| except Exception as ex: | |
| print(f"General error: {ex}") | |
| return None, 0 | |
| def get_script_using_URLLIB3(script_needed): | |
| ''' | |
| Get script using the urllib3 library, which offers more advanced retry and | |
| streaming capabilities than Requests. | |
| And works in JupyterLite. | |
| ''' | |
| if not os.path.isfile(script_needed): | |
| url = ("https://raw.githubusercontent.com/fomightez/structurework/master" | |
| "/PDBmodelComparator-utilities/"+script_needed) | |
| r_text, _ = get_content_atURL_with_URLLIB3(url) | |
| with open(script_needed, 'w') as filehandler: | |
| filehandler.write(r_text) | |
| import os | |
| import urllib3 | |
| import certifi | |
| script_needed = "missing_residue_detailer.py" | |
| get_script_using_URLLIB3(script_needed) | |
| # Universal Python version of `curl` using the urllib3 library, which offers more advanced retry and streaming capabilities, | |
| # along Example Protein Data Bank's Direct file access server with CORS headers. Unlke the one with requests below, this one works | |
| # with header from large PDB records. And still works in JupyterLite, too. | |
| import urllib3 | |
| import certifi | |
| def fetch_pdb_headerURLLIB3(pdb_id, chunk_size=64): | |
| url = f'https://files.rcsb.org/header/{pdb_id.upper()}.pdb' | |
| http = urllib3.PoolManager( | |
| cert_reqs='CERT_REQUIRED', | |
| ca_certs=certifi.where(), | |
| retries=urllib3.Retry( | |
| total=3, | |
| backoff_factor=0.1, | |
| status_forcelist=[500, 502, 503, 504] | |
| ) | |
| ) | |
| try: | |
| response = http.request('GET', url, preload_content=False) | |
| collected = '' | |
| chunk_count = 0 | |
| while True: | |
| chunk = response.read(chunk_size) | |
| if not chunk: | |
| break | |
| chunk_count += 1 | |
| collected += chunk.decode(errors='ignore') | |
| response.release_conn() | |
| return collected, chunk_count | |
| except urllib3.exceptions.ChunkedEncodingError as ex: | |
| print(f"Specific ChunkedEncodingError: {ex}") | |
| return collected, chunk_count | |
| except Exception as ex: | |
| print(f"General error: {ex}") | |
| return None, 0 | |
| # Example usage | |
| pdb_id = '4DQO' | |
| header, chunks = fetch_pdb_headerURLLIB3(pdb_id) | |
| print(f"Retrieved header in {chunks} chunks") | |
| print(header[:1000]) # Print first 1000 characters | |
| # Universal Python version of `curl` using Requests package Example Protein Data Bank's Direct file access server with CORS headers | |
| # so that it works with browser-based computing environments powered by WebAssembly (WASM), like Pyodide / JupyterLite! | |
| # But fails in MyBinder-served sessions with large PDB file recorrds data, like 4dqo. See https://github.com/fomightez/structurework/blob/master/PDBmodelComparator-utilities/missing_residue_detailer.py and https://github.com/fomightez/sequencework/blob/master/LookUpTaxon/LookUpTaxonFA.py, where built into a script, the former of which combines multiple ways to insure gets header. | |
| def fetch_pdbheader_using_requests(pdb_id): | |
| """ | |
| Take a PDB accession code and return the PDB file header using RCSB's direct file server that happens to have CORS headers enabled | |
| See https://www.wwpdb.org/ftp/pdb-ftp-sites | |
| from https://github.com/fomightez/structurework/blob/master/PDBmodelComparator-utilities/missing_residue_detailer.py | |
| """ | |
| url = f'https://files.rcsb.org/header/{pdb_id.upper()}.pdb' | |
| import requests | |
| response = requests.get(url, allow_redirects=True) | |
| response.raise_for_status() # Raise an exception for non-200 status codes | |
| return response.text | |
| header = fetch_pdbheader_using_requests("1d66") | |
| # Python way to call `curl` if sure on unix-based machine | |
| if not os.path.isfile(file_needed): | |
| os.system("curl -OL https://raw.githubusercontent.com/"\ | |
| "fomightez/sequencework/master/"\ | |
| f"bendit_standalone-utilities/{file_needed}") | |
| # Alternative Python version of `curl` using `sh` module; based on code in https://github.com/fomightez/sequencework/blob/master/omega-presence/find_mito_fungal_lsu_rRNA_and_check_for_omega_intron.py | |
| file_needed = "check_for_omega_intron.py" | |
| if not os.path.isfile(file_needed): | |
| sys.stderr.write("\nObtaining script containing function to use " | |
| "...\n") | |
| # based on http://amoffat.github.io/sh/ | |
| from sh import curl | |
| curl("-OL", | |
| "https://raw.githubusercontent.com/fomightez/sequencework/" | |
| "master/omega-presence/"+file_needed) | |
| # verify that worked & ask for it to be done manually if fails | |
| if not os.path.isfile(file_needed): | |
| github_link = ("https://github.com/fomightez/sequencework/tree" | |
| "/master/omega-presence") | |
| sys.stderr.write("\n'+file_needed+' not found. " | |
| "Please add it to your current working\ndirectory from {}" | |
| ".\n**EXITING !!**.\n".format(github_link)) | |
| sys.exit(1) | |
| # R kernel in Jupyter versions of `curl` | |
| system("curl -OL http://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.52_FB2023_03/gtf/dmel-all-r6.52.gtf.gz", intern=TRUE) # based on https://github.com/aws/amazon-sagemaker-examples/issues/912 | |
| # IPython/Jupyter cell version of `grep` | |
| !grep -r -i "Q6NW40" * | |
| # Python version of `grep` with recursive directory traversal and wildcard use | |
| from sh import grep | |
| from sh import glob | |
| l = grep("-r","-i","Q6NW40",glob('*')) # asterisk wildcard use based on https://stackoverflow.com/a/32923739/8508004 ; | |
| # also see http://amoffat.github.io/sh/sections/faq.html?highlight=glob#why-doesn-t-work-as-a-command-argument | |
| # IPython/Jupyter cell version of using command to run a script and make it a notebook | |
| !jupytext --to notebook --execute {plots4review_fn[:-6]+".py"} | |
| # Python version of using command to run a script and make it a notebook | |
| os.system(f'jupytext --to notebook --execute {plots4review_fn[:-6]+".py"}') | |
| # IPython/Jupyter cell version of running shell command | |
| with io.capture_output() as captured: | |
| !bendIt -s {fasta_file_name_for_merged} -o {output_file_suffix} -c \ | |
| {curvature_window_size} -b {other_metric_reporting_window_size} \ | |
| -g {report_with_curvature_settings_corrspndnce[report_with_curvature]} \ | |
| --xmin {curvature_window_size} --xmax {end_range} | |
| # Python version of running shell command | |
| with io.capture_output() as captured: | |
| os.system(f"bendIt -s {fasta_file_name_for_merged} -o \ | |
| {output_file_suffix} -c {curvature_window_size} -b \ | |
| {other_metric_reporting_window_size} -g \ | |
| {report_with_curvature_settings_corrspndnce[report_with_curvature]} \ | |
| --xmin {curvature_window_size} --xmax {end_range}") | |
| # Related: Python version of checking software to run in shell is installed in environment; based on code in https://github.com/fomightez/sequencework/blob/master/omega-presence/find_mito_fungal_lsu_rRNA_and_check_for_omega_intron.py | |
| sys.stderr.write("Checking blastn (BLAST+) installed" | |
| "...\n") | |
| try: | |
| cmd="blastn -version" | |
| result = subprocess.check_output(cmd, shell=True) # based on | |
| # https://stackoverflow.com/a/18739828/8508004 | |
| result = result.decode("utf-8") # so it isn't bytes | |
| if "blastn:" in result: | |
| sys.stderr.write("Detected '{}'...\n".format( | |
| result.replace("\n",""))) | |
| except CalledProcessError: | |
| sys.stderr.write("\nblastn not detected. Please install BLAST+ or " | |
| "run in an\nenvironment launched from " | |
| "https://github.com/fomightez/blast-binder.\n**EXITING !!**.\n") | |
| sys.exit | |
| # Related to replacing use of wget and getting data | |
| # https://twitter.com/fperez_org/status/1417616732210941954 July 2021 | |
| # >'I'd mentioned wget as in "that's what I'd use at the terminal, but now I want to build a more reusable workflow off Python tools." So your requests-cache suggestion fits perfectly.' | |
| # That was in reply to "Py/Data tweeps - what's your favorite, preferably decorator-enabled tool/lib to fetch-and-cache data these days? Use case is small-to-medium-sized, http-based, csv/hdf/etc formats that I'd wget manually and can store locally." | |
| #Related to use of SCP: | |
| #https://twitter.com/driscollis/status/1430942020424675329 August 2021 | |
| #>"In less than 20 lines of #Python code, you can SFTP / SSH a file from your local server to a remote server using Paramiko:" <--- see code in image | |
| #>"I have a better example on my blog, @mousevspython , where I also implement the ability to download a file from the remote server:" | |
| Related note on path for when you don't know 100% if users will be on unix-y system: | |
| "subprocess is the wrong tool for this job.`import os.path` & `mypath = os.path.abspath(os.path.dirname(os.getcwd()))` | |
| ...is both faster and portable to non-UNIXy operating systems." - [SOURCE](https://stackoverflow.com/a/74350790/8508004) | |
| # mkdir | |
| #`mkdir` in shell is `os.mkdir()` in Python | |
| # However, if anything but simple directory making where know it won't overwirte, then | |
| # use Python 3 the new 'Create parent directories if needed' ability in `os.makedirs()`, like so: | |
| os.makedirs("/tmp/path/to/desired/directory", exist_ok=True)` # SEE https://stackoverflow.com/a/600612/8508004 | |
| # So it will make parents or even just the ONE directory needed if need be and not overwrite. | |
| # so use for making from a list: | |
| [os.makedirs(x, exist_ok=True) for x in pool_dirs_to_make] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment