Skip to content

Instantly share code, notes, and snippets.

View jjerphan's full-sized avatar

Julien Jerphanion jjerphan

View GitHub Profile
@jjerphan
jjerphan / pyarrow_to_c_abi.ipynb
Last active June 28, 2023 09:37
pyarrow to Arrow C Data API
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jjerphan
jjerphan / open_mp_joblib_threading_comparison.ipynb
Created February 16, 2023 10:07
OpenMP vs joblib Threading backend
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jjerphan
jjerphan / vtable.md
Last active September 29, 2023 07:22
Cython Extension Types' Methods' dispatch using vtable (extracted from https://github.com/scikit-learn/scikit-learn/pull/20254#discussion_r716904109)

Does self.distance_metric.rdist use a v-table look up? (I am curious. This may not be actionable)

Yes, it does.


V-table implementation details
@jjerphan
jjerphan / benchmark.py
Created October 14, 2021 13:44
CuML -- sklearn -- sklearnex NearestNeighbors Benchmarks
import sys
import cudf
import joblib
import numpy as np
impo
@jjerphan
jjerphan / mahalanobis.pyx
Last active June 29, 2021 12:09
Python interaction when using memory views as attributes on Cython classes
%%cython --annotate
#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
## Adapted from sklearn.neighbors.Mahalanobis
# https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/neighbors/_dist_metrics.pyx#L669
import numpy as np
cimport numpy as np
@jjerphan
jjerphan / setup.py
Last active December 2, 2021 10:58
std::vector to numpy array coercion via Cython
# Cython compile instructions
import numpy
from setuptools import setup, Extension
from Cython.Build import build_ext
# To compile, use
# python setup.py build --inplace
extensions = [
Extension("stdvect_to_ndarray",
import numpy as np
from sklearn.neighbors import DistanceMetric
from .common import Benchmark
class DistanceMetricBenchmark(Benchmark):
param_names = ["n", "d"]
params = ([100, 1000, 10_000], [5, 10, 100])
def setup(self, n, d):
from sklearn.feature_selection import mutual_info_regression, mutual_info_classif
from sklearn.neighbors import KernelDensity, NearestNeighbors
from .common import Benchmark
from sklearn.datasets import make_classification, make_regression
class RemovedCheckBenchmarks(Benchmark):
param_names = ['n', 'd']
params = (
@jjerphan
jjerphan / mismatch.py
Created April 23, 2021 07:33
Explo scikit-learn#19952
import numpy as np
from scipy.linalg import cho_solve, cholesky
from sklearn.gaussian_process.kernels import RBF
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
kernel = RBF(length_scale=1.0)
X, y = make_regression()
X_train, X_test, y_train, y_test = train_test_split(X, y)
@jjerphan
jjerphan / binary_tree_bench.py
Created April 15, 2021 08:45
Simple Benchmark for `sklearn.neighbours.BinaryTree`
import numpy as np
from sklearn.neighbors import KDTree, BallTree
from .common import Benchmark
class BinaryTreeStatsBenchmark(Benchmark):
"""
Base class for BinaryTree benchmarks for removing statistics.
"""