Skip to content

Instantly share code, notes, and snippets.

View malcolmgreaves's full-sized avatar

Malcolm Greaves malcolmgreaves

View GitHub Profile
@malcolmgreaves
malcolmgreaves / bug_pandas_map_mangles_column_datatype.py
Created February 13, 2024 19:35
Demonstration showing a bug in Pandas: it automatically converts datetime columns into a different pandas-specific type, even when the original column has `dtype=object`.
from datetime import datetime
import pandas as pd
now = datetime.now()
df = pd.DataFrame.from_dict(
{
"created_at": pd.Series([now, now - timedelta(seconds=100), now + timedelta(seconds=10)], dtype='object'),
}
@malcolmgreaves
malcolmgreaves / pandas_required_columns.py
Last active February 10, 2024 02:18
Conceptual framework for writing Pandas DataFrame code where required columns are not only documented, but parameterized. This establishes an interface between the name of a column in code vs. its name in the data.
from abc import ABC
from dataclasses import dataclass
from typing import List, NamedTuple, Sequence, Type, TypeVar
import pandas as pd
__all__: Sequence[str] = (
# main abstraction & utilities for columns required in a dataframe
"Columns",
@malcolmgreaves
malcolmgreaves / env_var_secret.Dockerfile
Created January 11, 2024 21:35
Example passing a secret value via an env var to a docker build.
# Run this example:
#
# mysecret=SECRET_VALUE docker build --secret id=mysecret,env=mysecret -f Dockerfile -t deleteme .
#
FROM debian:trixie-slim
RUN <<EOF cat >> file
#!/bin/bash
if [[ -z "\${MYSECRET}" ]]; then
echo "No MYSECRET env var!!!"
@malcolmgreaves
malcolmgreaves / requirements.txt--pyproject.toml
Created January 9, 2024 22:08
A pyproject.toml that uses setuptools & gets `dependencies` dynamically from a requirements.txt file.
[build-system]
requires = ["setuptools", "wheel", "setuptools_scm"]
build-backend = "setuptools.build_meta"
[project]
name = "mypackage"
requires-python = ">=3.10"
dynamic = ["dependencies"]
[tool.setuptools.dynamic]
@malcolmgreaves
malcolmgreaves / Dockerfile--cuda_117-torch_113-geometric_204
Last active January 5, 2024 20:58
Dockerfile based on Ubuntu 22.04 that has CUDA 11.7 dev libraries & drivers installed alongside PyTorch 1.13 and Torch-Geometric 2.0.4 libraries.
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get install -y \
python3-setuptools python3-dev swig \
wget git unzip tmux vim tree xterm \
build-essential gcc \
@malcolmgreaves
malcolmgreaves / testing_args_easier_debug_messages.py
Last active November 28, 2023 20:02
Exploring patterns for validating function arguments in Python.
"""
$ python testing_args_easier_debug_messages.py.py
Hello world, I can't believe you've have 42 birthdays! I hope you find time for crafting soon!
Hello universe, I can't believe you've have 117.0 birthdays! I hope you find time for crafting soon!
ValueError: Need positive numbers, not: age=whoops
ValueError: Need positive numbers, not: age=-1
ValueError: Need non-empty strings, not: name=
ValueError: Need non-empty strings, not: hobby=None
"""
@malcolmgreaves
malcolmgreaves / temporary_fork_behavior_setter.py
Created November 20, 2023 22:12
Context manager for temporarily overriding Python forking behavior.
from contextlib import contextmanager
from multiprocessing import get_start_method, set_start_method
from typing import Literal
@contextmanager
def ForkingBehavior(
*,
start_method: Literal['spawn', 'fork', 'forkserver'],
Hugging Face Optimized Inference License 1.0 (HFOILv1.0)
This License Agreement governs the use of the Software and its Modifications. It is a
binding agreement between the Licensor and You.
This License Agreement shall be referred to as Hugging Face Optimized Inference License
1.0 or HFOILv1.0. We may publish revised versions of this License Agreement from time to
time. Each version will be given a distinguished number.
@malcolmgreaves
malcolmgreaves / unix_program_shortcuts.md
Created July 27, 2023 17:34
Helpful unix shortcuts

Unix Program Shortcuts

Files

Need to transfer data locally or remote? Support efficient re-connect and dedupe? Want summary progress information?

rsync -a --info=progress2 --recursive [SOURCE] ... [DEST]
from dataclasses import dataclass
from typing import Collection, Dict, Iterator, Optional, Set
import torch
def torchscript(model: torch.nn.Module) -> torch.ScriptModule:
"""Runs TorchScript's scripting mode on the input model.
A torch scripted model is able to run in a Python-free execution environment,
ideal for production inference.