Skip to content

Instantly share code, notes, and snippets.

View nbroad1881's full-sized avatar

Nicholas Broad nbroad1881

  • Hugging Face
  • San Francisco, California
  • 19:11 (UTC -07:00)
View GitHub Profile
@nbroad1881
nbroad1881 / os_and_pathlib.py
Created May 23, 2020 13:05
Useful snippets for working with files and paths in python
import os
from pathlib import Path
# Absolute path of file
absolute_path = os.path.dirname(os.path.abspath(__file__))
# OR
absolute_path = Path(__file__).resolve()
# List contents of directory
os.listdir('dirname-or-blank-for-current-dir')
@nbroad1881
nbroad1881 / jupyter_dockerfile
Last active May 1, 2020 19:56
dev environment using jupyter lab and docker
FROM ubuntu:18.04
# Set character encoding environment variables
ENV LC_ALL=C.UTF-8 LANG=C.UTF-8
# Allow apt-get install without interaction from console
ENV DEBIAN_FRONTEND=noninteractive
# Set the working dir to the root user home folder
WORKDIR /root
@nbroad1881
nbroad1881 / jupyter_dockerfile
Last active May 1, 2020 19:57
dev environment using jupyter lab and docker
# **************************************************
# Commands to run this dockerfile
# $docker build -t name_of_image directory
#
# $docker run -v ~/path/to/local/dir:/root/work -it --name my_container -p 8888:8888 --rm name_of_image
# (-v stands for volumes. This mounts a local dir to a dir in the container)
# -v ~/path/to/local/dir:/root/work -it \
# (-it stands for interactive. Any changes to local dir will then be seen in the connected dir in the container
# --name my_container \
@nbroad1881
nbroad1881 / pynb-magic.py
Created April 30, 2020 10:58
Python magic snippets
# help for a function
%timeit?
# run code block multiple times to get average time
%%timeit
L = [n ** 2 for n in range(1000)]
# paste multi-line code to cell
%paste
>>> def donothing(x):
@nbroad1881
nbroad1881 / corpus_split.py
Created April 24, 2020 21:28
If there is a massive corpus in a single file, this will break it up by number of lines. Also gets list of filenames
!split -l 250000 text_file.txt smaller_
### split [options] filename prefix
### -l linenumber
### -b bytes
import glob
file_list = glob.glob("smaller_*")

Keybase proof

I hereby claim:

  • I am nbroad1881 on github.
  • I am nicholasbroad (https://keybase.io/nicholasbroad) on keybase.
  • I have a public key ASB10K5suwte9WvhBvNox4bXW95vszH1jaJXZ54ejZAeUAo

To claim this, I am signing this object: