Skip to content

Instantly share code, notes, and snippets.

View cakiki's full-sized avatar
🐈‍⬛
meow

Christopher Akiki cakiki

🐈‍⬛
meow
View GitHub Profile
@cakiki
cakiki / doc_embeddings_with_vectorizers.ipynb
Created June 19, 2021 10:41 — forked from lmcinnes/doc_embeddings_with_vectorizers.ipynb
Document Embeddings with the Vectorizers Library
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cakiki
cakiki / document-embeddings-big_models.ipynb
Created July 1, 2021 10:37 — forked from lmcinnes/document-embeddings-big_models.ipynb
Document Embeddings with Vectorizers and Large USE and BERT models
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cakiki
cakiki / tpu_topology_env_vars.py
Created July 4, 2021 11:26 — forked from skye/tpu_topology_env_vars.py
You can use these environment variables to run a Python process on a subset of the TPU cores on a Cloud TPU VM. This allows running multiple TPU processes at the same time, since only one process can access a given TPU core at a time. Note that in JAX, 1 TPU core = 1 TpuDevice as reported by `jax.devices()`.
# 4x 1 chip (2 cores) per process:
os.environ["TPU_CHIPS_PER_HOST_BOUNDS"] = "1,1,1"
os.environ["TPU_HOST_BOUNDS"] = "1,1,1"
# Different per process:
os.environ["TPU_VISIBLE_DEVICES"] = "0" # "1", "2", "3"
# Pick a unique port per process
os.environ["TPU_MESH_CONTROLLER_ADDRESS"] = "localhost:8476"
os.environ["TPU_MESH_CONTROLLER_PORT"] = "8476"
# 1-liner for bash: TPU_CHIPS_PER_HOST_BOUNDS=1,1,1 TPU_HOST_BOUNDS=1,1,1 TPU_VISIBLE_DEVICES=0 TPU_MESH_CONTROLLER_ADDRESS=localhost:8476 TPU_MESH_CONTROLLER_PORT=8476
@cakiki
cakiki / AlignedUMAP Demo.ipynb
Created February 5, 2022 21:44 — forked from lmcinnes/AlignedUMAP Demo.ipynb
Demonstration of experimental Aligned UMAP in 0.5dev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cakiki
cakiki / stopword-bn.txt
Created June 4, 2022 21:32
Bengali Stopwords BigScience
অবশ্য
অনেক
অনেকে
অনেকেই
অন্তত
অথবা
অথচ
অর্থাত
অন্য
@cakiki
cakiki / bm25.py
Created February 25, 2023 14:43 — forked from koreyou/bm25.py
Implementation of OKapi BM25 with sklearn's TfidfVectorizer
""" Implementation of OKapi BM25 with sklearn's TfidfVectorizer
Distributed as CC-0 (https://creativecommons.org/publicdomain/zero/1.0/)
"""
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy import sparse
class BM25(object):