This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
import torch.nn.functional as F | |
from torch import Tensor | |
from transformers import AutoTokenizer, AutoModel | |
from pandas import DataFrame | |
from typing import List, Union | |
from tqdm.auto import tqdm, trange | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Class for correcting text using a pretrained model grammar synthesis model. | |
- models are available here: https://hf.co/models?other=grammar%20synthesis | |
requirements for this snippet: | |
pip install -U transformers accelerate | |
NOTE: if you want to use 9-bit to fit the model on a smaller GPU, you need bitsandbytes: | |
pip install -U transformers accelerate bitsandbytes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
hf_hub_download.py | |
This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model. | |
Usage: | |
python hf_hub_download.py <repo_id> [options] | |
Arguments: | |
<repo_id> Repository ID in the format "organization/repository". |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# pip install nougat-ocr | |
# see https://github.com/facebookresearch/nougat for details and license | |
DEFAULT_BATCHSIZE=4 | |
usage() { | |
echo "Usage: $0 <path_to_directory> [--batchsize BATCHSIZE]" | |
exit 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import argparse | |
import requests | |
from urllib.parse import urlparse | |
from tqdm import tqdm | |
from joblib import Parallel, delayed | |
from tenacity import retry, stop_after_attempt, wait_fixed | |
@retry(stop=stop_after_attempt(5), wait=wait_fixed(2)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
import random | |
from queue import Queue | |
import threading | |
import streamlit as st | |
from streamlit.runtime.scriptrunner import add_script_run_ctx | |
pre_msgs = [] | |
result_msgs = [] | |
post_msgs = [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- | |
This is a sample HTML file that you can use to embed your Streamlit app in an iframe. | |
The Streamlit app is embedded cleanly and is almost indistinguishable from a native app. | |
Use it as a template and customize it to your needs. | |
NOTE: It's convenient to start your Streamlit app in headless mode, for example | |
$ streamlit run --server.port=8005 --server.headless=true app.py | |
--> | |
<!DOCTYPE html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc. |