Skip to content

Instantly share code, notes, and snippets.

View joshlk's full-sized avatar

Josh Levy-Kramer joshlk

  • London, UK
View GitHub Profile
@joshlk
joshlk / Remote_debugging_with_pycharm.md
Created March 9, 2021 13:23
Remove debugging with pycharm

Steps to debug a program on a remote machine without using remote deployment.

  1. Start the debug server in pycharm and specify a port such as 21000
  2. SSH remote forward a port e.g. ssh host -R 21000:localhost:21000
  3. Start a Python process and insert the following line (first pip install pydevd-pycharm`):
import pydevd_pycharm; pydevd_pycharm.settrace('localhost', port=21000, stdoutToServer=True, stderrToServer=True)
@joshlk
joshlk / where.sh
Last active February 4, 2021 13:54
`where` in the $PATH is an executable found. Bash/shell/unix script. Extends `which` to also print the $PATH index (GNU General Public License)
#! /bin/sh
set -ef
if test -n "$KSH_VERSION"; then
puts() {
print -r -- "$*"
}
else
puts() {
printf '%s\n' "$*"
@joshlk
joshlk / sentence-segmentation-benchmark.ipynb
Last active October 23, 2023 18:24
A comparison of different sentence segmentation models for the English language. The Brown corpus is used to benchmark the models.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joshlk
joshlk / priority_queue.py
Created May 22, 2020 09:55
A PriorityQueue or MinHeap implementation. Items with the smallest cost are popped first. Object orientated interface for `heapq` (Python standard library).
import heapq
class PriorityQueue:
"""
A PriorityQueue or MinHeap implementation.
Items with the smallest cost are popped first.
"""
def __init__(self):
self.h = []
@joshlk
joshlk / directed_agglomerative_clustering.py
Last active October 1, 2019 14:23
Directed Agglomerative Clustering: similar to `sklearn.cluster.AgglomerativeClustering` but the label is the root node. The root node is the root of a connected DAG (Directed Acyclic Graph). Also can frame algorithm as determining the weakly connected components and identifying the root.
import numpy as np
from itertools import count, product
class DirectedAgglomerativeClustering:
"""
Similar to `sklearn.cluster.AgglomerativeClustering` but the label is the root node. The root node is the root of a
connected DAG (Directed Acyclic Graph).
Also can frame algorithm as determining the weakly connected components and identifying the root.
Algorithm is naive implementation and O(n^3)
"""
@joshlk
joshlk / random_numbers.pyx
Last active July 11, 2019 10:22
Random numbers in Cython. Random integers and floating points from the standard library
from libc.stdlib cimport rand, srand, RAND_MAX
from libc.time cimport time
def get_RAND_MAX():
return RAND_MAX
"""
unsigned long is at least 32 bits and so in the range [−2,147,483,647, +2,147,483,647]
"""
@joshlk
joshlk / trio_progress_bar.py
Created May 31, 2019 13:58
Progress bar for Python Trio tasks using tqdm
import trio
import tqdm
class TrioProgress(trio.abc.Instrument):
def __init__(self, total, notebook_mode=False, **kwargs):
if notebook_mode:
from tqdm import tqdm_notebook as tqdm
else:
from tqdm import tqdm
@joshlk
joshlk / leak_toy_example.py
Created November 20, 2018 17:45
Minimal working example of pySpark memory leak
from pyspark import SparkContext
from pyspark.sql import SQLContext
import numpy as np
sc = SparkContext()
sqlContext = SQLContext(sc)
# Create dummy pySpark DataFrame with 1e5 rows and 16 partitions
df = sqlContext.range(0, int(1e5), numPartitions=16)
@joshlk
joshlk / uber_h3_geoindex_plot.ipynb
Last active November 2, 2020 22:09
Uber's H3 Geoindex plotted on the UK and Bristol at different levels
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joshlk
joshlk / bash_cheatsheet.bash
Created September 23, 2018 12:21
Bash cheatsheet
# SSH port forwarding/tunnel
# If I want to be able to access port 6006 on a remote server locally
ssh -L 6006:localhost:6006 user@remote