Peter pszemraj

## tiktoken-to-hf.ipynb

      
              1 file
            
          
              2 forks
            
          
              17 comments
            
          
              14 stars
            
          
                xenova
                / tiktoken-to-hf.ipynb
            
            
              Last active
              May 10, 2024 00:59
            
              
                Convert tiktoken tokenizers to the Hugging Face tokenizers format
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## lfs_checkpoint_uploader.sh
#!/bin/bash

# install: sudo apt-get install inotify-tools
# Usage: ./scriptname.sh /path/to/monitor/directory /path/to/repo/directory
# If no monitor directory is passed, monitor directory = repo directory
# put & at the end of the command to run in background

# Define your monitor directory
MONITOR_DIR="${1:-$2}"
if [ -z "$MONITOR_DIR" ]; then

## finetune_llama_v2.py
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software

## rwkv_embeddings.py
import logging
import warnings
from typing import List, Optional, Union

import numpy as np
import torch
from torch.nn import functional as F
from tqdm.auto import trange
from transformers import AutoTokenizer, PreTrainedModel, PreTrainedTokenizer, RwkvModel

## run_summarization_referencecard.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                pszemraj
                / run_summarization_referencecard.md
            
            
              Last active
              June 5, 2023 06:38
            
          
    reference for run_summarization


reference for transformers 4.30.0.dev0

about

The below options are additional configuration parameters that can be used when training a model with Hugging Face Transformers. These options control various aspects of the training process, such as the optimizer to use, data loading settings, memory management, model evaluation, checkpointing, and integration with the Hugging Face Model Hub.
Here is a summary of the high-level functionalities provided by some of the options:

  
## hf_repo_download.py
"""
hf_hub_download.py

This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.

Usage:
    python hf_hub_download.py <repo_id> [options]

Arguments:
    <repo_id>               Repository ID in the format "organization/repository".

## bot_readme.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                pszemraj
                / bot_readme.md
            
            
              Last active
              May 22, 2023 17:45
            
              
                for Gio's slugbot project
              
          
    Image Classification Telegram Bot

This script runs a Telegram bot that classifies images using a pre-trained model. The bot handles /start and /help commands, as well as photo messages. When a photo message is received, the bot downloads the photo, classifies it, and sends a message with the prediction.
The original intended use case is to classify if an image contains a slug or not:


credit to this demo https://huggingface.co/spaces/MasleK/Snails_Snakes_Slugs


## inference_openai.py
"""
inference_openai.py - text generation with OpenAI API

    See https://platform.openai.com/docs/quickstart for more details.

Usage:
python inference_openai.py --prompt "The quick brown fox jumps over the lazy dog." --model "gpt-3.5-turbo" --temperature 0.5 --max_tokens 256 --n 1 --stop "."

Detailed usage:
python inference_openai.py --help

## eval_summaries.py
"""
eval_summaries.py - evaluate summary/document pairs via a variety of metrics,

    Metrics include max salient similarity, topic similarity, compression factor,
    readability scores, and spelling error fraction

details:
python eval_summaries.py --help

this script was developed while evaluating summaries generated with the textsum package

## dl_gauntlet.sh
URL=https://www.dropbox.com/sh/zu1p7rhg5238a5y/AABsJN_pCYf9plSDZY8ziKATa?dl=1
wget -O docs.zip $URL
unzip -B -j docs.zip -d gauntlet && rm -rf docs.zip
	#!/bin/bash

	# install: sudo apt-get install inotify-tools
	# Usage: ./scriptname.sh /path/to/monitor/directory /path/to/repo/directory
	# If no monitor directory is passed, monitor directory = repo directory
	# put & at the end of the command to run in background

	# Define your monitor directory
	MONITOR_DIR="${1:-$2}"
	if [ -z "$MONITOR_DIR" ]; then
	# coding=utf-8
	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	import logging
	import warnings
	from typing import List, Optional, Union

	import numpy as np
	import torch
	from torch.nn import functional as F
	from tqdm.auto import trange
	from transformers import AutoTokenizer, PreTrainedModel, PreTrainedTokenizer, RwkvModel
	"""
	hf_hub_download.py

	This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.

	Usage:
	python hf_hub_download.py <repo_id> [options]

	Arguments:
	<repo_id> Repository ID in the format "organization/repository".
	"""
	inference_openai.py - text generation with OpenAI API

	See https://platform.openai.com/docs/quickstart for more details.

	Usage:
	python inference_openai.py --prompt "The quick brown fox jumps over the lazy dog." --model "gpt-3.5-turbo" --temperature 0.5 --max_tokens 256 --n 1 --stop "."

	Detailed usage:
	python inference_openai.py --help
	"""
	eval_summaries.py - evaluate summary/document pairs via a variety of metrics,

	Metrics include max salient similarity, topic similarity, compression factor,
	readability scores, and spelling error fraction

	details:
	python eval_summaries.py --help

	this script was developed while evaluating summaries generated with the textsum package
	URL=https://www.dropbox.com/sh/zu1p7rhg5238a5y/AABsJN_pCYf9plSDZY8ziKATa?dl=1
	wget -O docs.zip $URL
	unzip -B -j docs.zip -d gauntlet && rm -rf docs.zip