simonmesmith

## simple_rag.py
"""
SIMPLE RAG!

This file provides a class, Collection, that makes it easy to add retrieval
augmented generation (RAG) to an application.

There are so many overly complex RAG tools out there, such as LlamaIndex and
LangChain. Even Chroma can be overly complex for some use cases, and I've
run into issues (in Streamlit Share) where Chroma's dependency on SQLite
caused a conflict that I couldn't resolve. Argh!

## openai_assistants_api_with_streamlit.py
"""
    This file demonstrates how to use the OpenAI Assistants API with Streamlit.
    Users upload a CSV and the assistant writes an article about the data. For
    this to work, you'll need to:

    1. Install the streamlit and openai packages
    2. Get an OpenAI API key
    3. Set an environment variable called OPENAI_API_KEY with your API key
    4. Create an assistant with Code Interpreter on, and this system message:
    "You are an experienced data journalist. You receive a CSV of data from a

## book_modernizer.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                simonmesmith
                / book_modernizer.md
            
            
              Last active
              October 10, 2023 13:47
            
              
                Book modernizer
              
          
    Book modernization with GPT-3.5

This is a proof of concept that uses OpenAI's GPT-3.5 to modernize books.
For example, given this passage from Mary Shelley's The Last Man:

I visited Naples in the year 1818. On the 8th of December of that year,
my companion and I crossed the Bay, to visit the antiquities which are
scattered on the shores of Baiæ. The translucent and shining waters of
the calm sea covered fragments of old Roman villas, which were


## openai_streaming_with_functions.py
"""
    This example shows how to use the streaming feature with functions.
"""

import json
import os
import sys
from collections.abc import Generator
from typing import Any, Dict, List

## smartgpt.py
import openai
from tqdm import tqdm
import os


def gpt(prompt: str) -> str:
    """
    This function uses the OpenAI API to generate a response to a given prompt.

    Args:

## fine_tune_with_t5.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                simonmesmith
                / fine_tune_with_t5.md
            
            
              Last active
              May 14, 2024 18:57
            
              
                Fine-tuning T5 with Hugging Face
              
          
    Fine-tuning T5 with Hugging Face

Recently, I had to fine-tune a T5 model using Hugging Face's libraries.
Unfortunately, there was a lot of outdated information and many conflicting examples online.
If you just want to get going quickly, you can:

Copy this code
Swap in your own dataframe (ensure it has "source_text" and "target_text" columns, or modify parts of the code that depend on them accordingly)


## colorgram.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                simonmesmith
                / colorgram.md
            
            
              Last active
              October 10, 2023 13:45
            
              
                Colorgram
              
          
    Colorgram

Extract colors from images and attach them as swatches. See an example here.
Takes one argument: number_of_color_clusters (int), which is pretty self-explanatory. No? Okay, it means how many color clusters (e.g. reddish colors, bluish colors, etc.) to return and add to the swatches.
Usage:
input_path = "input_image.jpg"

  
## pytesseract_tablereader.md

      
              2 files
            
          
              0 forks
            
          
              2 comments
            
          
              0 stars
            
          
                simonmesmith
                / pytesseract_tablereader.md
            
            
              Last active
              October 10, 2023 13:45
            
              
                Pytesseract table reader
              
          
    Pytesseract tablereader

Here's some code that tries to solve a problem that I think I've identified with regard to converting tables in images into dataframes that we can work with programmatically.
Right now, I see two main solutions in market. One is to use third-party APIs like Amazon's and Google's products in this area, which can get expensive at scale. Another is to use or build upon complex code that uses image processing libraries like OpenCV to find gridlines and use these to determine table rows and columns.
My hypothesis is that we can use only Pytesseract to read tables, since it provides coordinates of text in images, and tables follow a standard structure (rows and columns). I've been working on the code here accordingly.
Usage is very simple:
	"""
	SIMPLE RAG!

	This file provides a class, Collection, that makes it easy to add retrieval
	augmented generation (RAG) to an application.

	There are so many overly complex RAG tools out there, such as LlamaIndex and
	LangChain. Even Chroma can be overly complex for some use cases, and I've
	run into issues (in Streamlit Share) where Chroma's dependency on SQLite
	caused a conflict that I couldn't resolve. Argh!
	"""
	This file demonstrates how to use the OpenAI Assistants API with Streamlit.
	Users upload a CSV and the assistant writes an article about the data. For
	this to work, you'll need to:

	1. Install the streamlit and openai packages
	2. Get an OpenAI API key
	3. Set an environment variable called OPENAI_API_KEY with your API key
	4. Create an assistant with Code Interpreter on, and this system message:
	"You are an experienced data journalist. You receive a CSV of data from a
	"""
	This example shows how to use the streaming feature with functions.
	"""

	import json
	import os
	import sys
	from collections.abc import Generator
	from typing import Any, Dict, List
	import openai
	from tqdm import tqdm
	import os


	def gpt(prompt: str) -> str:
	"""
	This function uses the OpenAI API to generate a response to a given prompt.

	Args: