Skip to content

Instantly share code, notes, and snippets.

View lebrice's full-sized avatar

Fabrice Normandin lebrice

View GitHub Profile
@lebrice
lebrice / arxiv_id_to_name.py
Last active September 13, 2023 06:55
A simple tool to add the name of downloaded paper pdf's in front of the id. Also removes duplicate downloads of the same arxiv paper.
"""A simple tool to add the name of downloaded paper pdf's in front of the id.
(Written by fabrice.normandin@gmail.com)
If there are multiple downloads of same paper, replaces the original with the
latest download. This can be useful in a downloads folder filled with copies.
For instance:
"""
import glob
@lebrice
lebrice / conditional_fields.py
Last active November 23, 2023 18:12
Conditional dataclass fields. The default_factory can now take as an argument the value of other fields on the dataclass.
from __future__ import annotations
import inspect
from dataclasses import dataclass, field, Field, fields
from typing import Any, Callable, TypeVar, overload
from logging import getLogger as get_logger
logger = get_logger("conditional_fields")
T = TypeVar("T")
@lebrice
lebrice / flower_multiprocessing.py
Created May 26, 2022 14:36
Example of how to run many clients in parallel using the Flower framework, on a SLURM cluster (Mila)
## Flower follow-up Multiprocessing error, can't pickle _thread.Lock object
""" Follow-up to [last week's question](https://hackmd.io/OsKKGG3QSTawhaWMMMRaeA#Using-FLWR-How-do-I-run-LOTS-of-clients).
The solution from last week (using multiprocessing.Pool) doesn't quite work.
Solution: Use Processes instead:
Solution: Use the --array option of sbatch to run multiple jobs, and then use multiprocessing to run multiple processes within each job. However, Use Processes instead:
"""
import multiprocessing as mp
@lebrice
lebrice / job_array_example.py
Last active June 9, 2022 19:18
Job array example
from dataclasses import dataclass
import os
from simple_parsing import ArgumentParser
from itertools import product
@dataclass
class ProblemConfig:
dataset: int = 0 # Which dataset ID to use.
rank: int = 0 # The rank of some matrix
from __future__ import annotations
# Context: Dataset is on GPU memory.
from typing import Iterable
import torch
from torch import Tensor
from torchvision.datasets import MNIST
from torch.utils.data import TensorDataset, DataLoader, Dataset, ConcatDataset
@lebrice
lebrice / setup_cache.py
Last active October 4, 2022 17:32
Consolidating the cache on the Mila cluster
"""Sets up a user cache directory for commonly used libraries, while reusing shared cache entries.
Use this to avoid having to download files to the $HOME directory, as well as to remove
duplicated downloads and free up space in your $HOME and $SCRATCH directories.
The user cache directory should be writeable, and doesn't need to be empty.
This command adds symlinks to (some of) the files contained in the *shared* cache directory to this
user cache directory.
The shared cache directory should be readable (e.g. a directory containing frequently-downloaded
@lebrice
lebrice / imagenet.py
Last active August 17, 2022 21:38
Imagenet DataModule adapted for the Mila Cluster
""" ImageNet datamodule adapted to the Mila cluster.
Can be used either with a PyTorch-Lightning Trainer, or by itself to easily get efficient
dataloaders for the ImageNet dataset.
Requirements (these are the versions I'm using, but this can probably be loosened a bit).
- pytorch-lightning==1.6.0
- lightning-bolts==0.5
"""
@lebrice
lebrice / pep_idea.py
Created August 4, 2022 21:11
Idea for a new use for typing.Unpack: Annotate the signature of a callable.
from __future__ import annotations
from pytorch_lightning import Trainer
from typing_extensions import Unpack, ParamSpec
from typing import Callable, TypedDict
# Option A: TypedDict
# --> lots of code duplication!
class TrainerConfig(TypedDict, total=False):
@lebrice
lebrice / ffcv_test.py
Created August 9, 2022 20:33
ffcv_test.py
from __future__ import annotations
import itertools
from pytorch_lightning import Trainer
import torch
import numpy as np
from pytorch_lightning import LightningModule
from torch import nn, Tensor
import pytest
from .imagenet_ffcv import ImagenetFfcvDataModule
@lebrice
lebrice / imagenet_ffcv.py
Last active August 15, 2022 15:52
Imagenet FFCV Datamodule. Meant for a SLURM cluster (e.g. the Mila cluster). Builds on top of https://gist.github.com/lebrice/4a67df47d9fca3e199d3e7686396240c
""" ImageNet datamodule that uses FFCV. """
from __future__ import annotations
import typing
from collections.abc import Iterable, Sequence
from pathlib import Path
from typing import Any, Callable, TypeVar
import cv2 # noqa