Skip to content

Instantly share code, notes, and snippets.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mangecoeur
mangecoeur / jhub-the-hard-way.md
Created November 5, 2019 16:19
Install Jupyterhub and Jupyterlab The Hard Way

Install Jupyterhub and Jupyterlab The Hard Way

The combination of Jupyterhub and Jupyterlab is a great way to make shared computing resources available to group.

These instruction are a guide for a manual, 'bare metal' install of Jupyterhub and Jupyterlab. This is ideal for running on a single server: build a beast of a machine and share it within your lab, or use a virtual machine from any VPS or cloud provider.

This guide has similar goals to that of The Littlest Jupyerhub setup

@mangecoeur
mangecoeur / config.py
Created February 27, 2018 13:37
Dummy config module for importing a master shared config file from a parent folder.
"""
Background
----------
One of the simplest configuration approaches in python is to just use python files,
giving you the full power of python - the least hassle approach in a trusted environment.
However, importing config modules can be problematic in interactive environments.
For example, when using jupyter notebooks organised into sub-folders,
we want to access a common config file in the overall project root.
@mangecoeur
mangecoeur / slim_silhouette.py
Created February 16, 2018 13:14
Low memory, good performance implementation of k-means silhouette samples using numba
from sklearn.utils import check_X_y
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics.cluster.unsupervised import check_number_of_labels
from numba import jit
@jit(nogil=True, parallel=True)
def euclidean_distances_numba(X, Y=None, Y_norm_squared=None):
# disable checks
XX_ = (X * X).sum(axis=1)
# # we need a reference to the snippets package
# snippetsPackage = require(atom.packages.getLoadedPackage('autocomplete-snippets').path)
#
# # we need a reference to the original method we'll monkey patch
# __oldGetSnippets = snippetsPackage.getSnippets
#
# snippetsPackage.getSnippets = (editor) ->
# snippets = __oldGetSnippets.call(this, editor)
#
# # we're only concerned by ruby files
@mangecoeur
mangecoeur / archive_to_nc.py
Created March 24, 2017 10:55
Script to archive .pp files to netCDF
import datetime
import shutil
import tempfile
import tarfile
from collections import namedtuple
from pathlib import Path
from enum import IntEnum
import numpy as np
@mangecoeur
mangecoeur / description.md
Last active March 30, 2021 21:34
Pandas PostgresSQL support for loading to DB using fast COPY FROM method

This small subclass of the Pandas sqlalchemy-based SQL support for reading/storing tables uses the Postgres-specific "COPY FROM" method to insert large amounts of data to the database. It is much faster that using INSERT. To acheive this, the table is created in the normal way using sqlalchemy but no data is inserted. Instead the data is saved to a temporary CSV file (using Pandas' mature CSV support) then read back to Postgres using Psychopg2 support for COPY FROM STDIN.

@mangecoeur
mangecoeur / concurrent.futures-intro.md
Last active January 9, 2024 16:04
Easy parallel python with concurrent.futures

Easy parallel python with concurrent.futures

As of version 3.3, python includes the very promising concurrent.futures module, with elegant context managers for running tasks concurrently. Thanks to the simple and consistent interface you can use both threads and processes with minimal effort.

For most CPU bound tasks - anything that is heavy number crunching - you want your program to use all the CPUs in your PC. The simplest way to get a CPU bound task to run in parallel is to use the ProcessPoolExecutor, which will create enough sub-processes to keep all your CPUs busy.

We use the context manager thusly:

with concurrent.futures.ProcessPoolExecutor() as executor:
@mangecoeur
mangecoeur / pandas-sqlalchemy-read.py
Created April 14, 2013 19:56
A slightly modified version of Pandas SQL DataFrame read which accepts an SQLAlchemy Engine object instead of a DBAPI connection object.
"""
Collection of query wrappers / abstractions to both facilitate data
retrieval and to reduce dependency on DB-specific API.
"""
from pandas.core.api import DataFrame
def _safe_fetch(cur):
try:
result = cur.fetchall()
@mangecoeur
mangecoeur / a-conda-workon-tool.md
Last active February 9, 2021 14:53
A "virtualenv activate" for Anaconda environments

A "virtualenv activate" for Anaconda environments

I've been using the Anaconda python package from continuum.io recently and found it to be a good way to get all the complex compiled libs you need for a scientific python environment. Even better, their conda tool lets you create environments much like virtualenv, but without having to re-compile stuff like numpy, which gets old very very quickly with virtualenv and can be a nightmare to get correctly set up on OSX.

The only thing missing was an easy way to switch environments - their docs suggest running python executables from the install folder, which I find a bit of a pain. Coincidentally I came across this article - Virtualenv's bin/activate is Doing It Wrong - which desribes a simple way to launch a sub-shell with certain environment variables set. Now simple was the key word for me since my bash-fu isn't very strong, but I managed to come up with the script below. Put this in a text file called conda-work