Skip to content

Instantly share code, notes, and snippets.

View asehmi's full-sized avatar

Arvindra Sehmi asehmi

View GitHub Profile
@asehmi
asehmi / compute_embeddings_e5.py
Created January 23, 2024 07:19 — forked from pszemraj/compute_embeddings_e5.py
helper script using just transformers/torch to compute text embeddings (for e5 models https://huggingface.co/intfloat/e5-base-v2 )
import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
from pandas import DataFrame
from typing import List, Union
from tqdm.auto import tqdm, trange
@asehmi
asehmi / grammar_synthesis.py
Created January 23, 2024 07:17 — forked from pszemraj/grammar_synthesis.py
basic implementation of a custom wrapper class for using the grammar synthesis text2text models
"""
Class for correcting text using a pretrained model grammar synthesis model.
- models are available here: https://hf.co/models?other=grammar%20synthesis
requirements for this snippet:
pip install -U transformers accelerate
NOTE: if you want to use 9-bit to fit the model on a smaller GPU, you need bitsandbytes:
pip install -U transformers accelerate bitsandbytes
@asehmi
asehmi / hf_repo_download.py
Created January 23, 2024 07:15 — forked from pszemraj/hf_repo_download.py
huggingface hub - download a full snapshot of a repository without using git
"""
hf_hub_download.py
This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.
Usage:
python hf_hub_download.py <repo_id> [options]
Arguments:
<repo_id> Repository ID in the format "organization/repository".
@asehmi
asehmi / nougat_em.sh
Created January 23, 2024 07:12 — forked from pszemraj/nougat_em.sh
bash script to apply facebookresearch/nougat on a directory of PDFs
#!/bin/bash
# pip install nougat-ocr
# see https://github.com/facebookresearch/nougat for details and license
DEFAULT_BATCHSIZE=4
usage() {
echo "Usage: $0 <path_to_directory> [--batchsize BATCHSIZE]"
exit 1
@asehmi
asehmi / download_URLs_in_file.py
Created January 23, 2024 07:10 — forked from pszemraj/download_URLs_in_file.py
pdf downloading utils
import os
import argparse
import requests
from urllib.parse import urlparse
from tqdm import tqdm
from joblib import Parallel, delayed
from tenacity import retry, stop_after_attempt, wait_fixed
@retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
@asehmi
asehmi / multithreaded_processing_of_queued_data.py
Created November 27, 2023 19:14
Streamlit multi-threaded task execution, with queues
import time
import random
from queue import Queue
import threading
import streamlit as st
from streamlit.runtime.scriptrunner import add_script_run_ctx
pre_msgs = []
result_msgs = []
post_msgs = []
@asehmi
asehmi / embedded_st_app.html
Last active May 18, 2023 17:09
How to cleanly embed a Streamlit app in a web page
<!--
This is a sample HTML file that you can use to embed your Streamlit app in an iframe.
The Streamlit app is embedded cleanly and is almost indistinguishable from a native app.
Use it as a template and customize it to your needs.
NOTE: It's convenient to start your Streamlit app in headless mode, for example
$ streamlit run --server.port=8005 --server.headless=true app.py
-->
<!DOCTYPE html>
@asehmi
asehmi / button_open_web_page.py
Created April 13, 2023 11:51
Streamlit URL button
import streamlit as st
from streamlit.components.v1 import html
def open_page(url):
open_script= """
<script type="text/javascript">
window.open('%s', '_blank').focus();
</script>
""" % (url)
html(open_script)
@asehmi
asehmi / st_button_colour.py
Created March 5, 2023 23:25
Streamlit change button background colour
# Ref: https://discuss.streamlit.io/t/issues-with-background-colour-for-buttons/38723/2?u=asehmi
import streamlit as st
import streamlit.components.v1 as components
def ChangeButtonColour(widget_label, font_color, background_color='transparent'):
htmlstr = f"""
<script>
var elements = window.parent.document.querySelectorAll('button');
for (var i = 0; i < elements.length; ++i) {{
if (elements[i].innerText == '{widget_label}') {{
@asehmi
asehmi / printarr
Created February 26, 2023 00:48 — forked from nmwsharp/printarr
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.