Skip to content

Instantly share code, notes, and snippets.

View napsternxg's full-sized avatar
🎯
Focusing

Shubhanshu Mishra napsternxg

🎯
Focusing
View GitHub Profile
@napsternxg
napsternxg / 07-March-2024.tsv
Last active March 7, 2024 14:55
Wikidata names gender and ethnic group
We can't make this file beautiful and searchable because it's too large.
human gender genderLabel ethnic_group ethnic_groupLabel given_name date_of_birth date_of_birthLabel given_nameLabel family_name family_nameLabel
http://www.wikidata.org/entity/Q946 http://www.wikidata.org/entity/Q6581097 male http://www.wikidata.org/entity/Q1026 Poles http://www.wikidata.org/entity/Q13422248 1957-04-22T00:00:00Z 1957-04-22T00:00:00Z Donald http://www.wikidata.org/entity/Q62102784 Tusk
http://www.wikidata.org/entity/Q946 http://www.wikidata.org/entity/Q6581097 male http://www.wikidata.org/entity/Q1026 Poles http://www.wikidata.org/entity/Q15207702 1957-04-22T00:00:00Z 1957-04-22T00:00:00Z Franciszek http://www.wikidata.org/entity/Q62102784 Tusk
http://www.wikidata.org/entity/Q989 http://www.wikidata.org/entity/Q6581097 male http://www.wikidata.org/entity/Q1026 Poles http://www.wikidata.org/entity/Q69242302 1920-05-18T00:00:00Z 1920-05-18T00:00:00Z Iohannes Paulus http://www.wikidata.org/entity/Q56541347 Wojtyła
http://www.wikidata.org/entity/Q18978 http://www.wikidata.org/entity/Q6581072 femal
@napsternxg
napsternxg / Colbertv2_Torch_Scratch.ipynb
Created January 24, 2024 18:04
Colbertv2_Torch_Scratch
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@napsternxg
napsternxg / onnx_edit.py
Last active January 24, 2024 15:58
Edit Onnx Model Ops
import onnx
model_path = "./model.onnx"
fixed_model_path = model_path.replace(".onnx", ".fixed.onnx")
# # Load the ONNX model which should have last layer as Sigmoid.
# LGBM Models may sometime not add the Sigmoid op during export when using regression loss
onnx_model = onnx.load(model_path)
print(onnx_model)
onnx.checker.check_model(onnx_model)
@napsternxg
napsternxg / accelerated_sentence_transformer.diff
Last active November 7, 2023 16:20
accelerate support for sentence_transformer
diff --git a/sentence_transformers/SentenceTransformer.py b/sentence_transformers/SentenceTransformer.py
index e44e573..ae4dea4 100644
--- a/sentence_transformers/SentenceTransformer.py
+++ b/sentence_transformers/SentenceTransformer.py
@@ -16,6 +16,7 @@ from torch.optim import Optimizer
from torch.utils.data import DataLoader
import torch.multiprocessing as mp
from tqdm.autonotebook import trange
+from tqdm.autonotebook import tqdm
import math
@napsternxg
napsternxg / TasteAtlas.ipynb
Last active October 24, 2023 21:06
TasteAtlas
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@napsternxg
napsternxg / display_ner.py
Last active October 17, 2023 17:35
NER utilities
from IPython.display import display, HTML
class DisplayEntities:
@classmethod
def display(cls, texts, grouped_entities):
html = []
html.append(cls.get_style())
for text, entities in zip(texts, grouped_entities):
html.append(cls.show_entities(text, entities))
display(HTML("".join(html)))
"""Faster Implementation of Unsupervised Query Segmentation.
Uses vectorized operations
- author: @napsternxg
Unsupervised Query Segmentation Using only Query Logs [Mishra et. al. 2011]
https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/pp0295-mishra.pdf
@napsternxg
napsternxg / wikidata_subclass.sparql
Created July 14, 2020 05:08
Wikidata get all subclasses of a given class
SELECT ?subClass ?subClassLabel ?desc WHERE {
?subClass wdt:P279* wd:Q5. # Here we are getting all subClasses of Human and its subclasses
OPTIONAL {
?subClass rdfs:label ?desc.
FILTER((LANG(?desc)) = "en")
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
@napsternxg
napsternxg / setfit_sentence_transformer_fixed.py
Last active September 7, 2023 16:51
Sentence Transformer + Setfit classification head for inference without installing setfit
from datasets import load_dataset, Dataset, DatasetDict
from sentence_transformers.losses import CosineSimilarityLoss
from sentence_transformers import SentenceTransformer
from setfit import SetFitModel, SetFitTrainer, sample_dataset
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import json
@napsternxg
napsternxg / async_queue_runner.py
Last active August 28, 2023 21:26
asyncio_queue_event
import asyncio
import logging
import random
import time
from dataclasses import dataclass
from typing import Any
from tqdm.auto import tqdm
logger = logging.getLogger(__name__)