simonw/openai-file-thing.md Secret

## openai-file-thing.md

      
    Raw
  

              openai-file-thing.md
            
          
    Getting Claude to build me a CLI app for working with the OpenAI file search API

human (Aug 29, 2024, 08:07 PM)
I hit Ctrl+A in Firefox on https://platform.openai.com/docs/assistants/tools/file-search and copied and pasted the raw text into Claude - you can see the text I pasted here: https://gist.github.com/simonw/f9775727dcde2edc0f9f15bbda0b4d42
human (Aug 29, 2024, 08:09 PM)
Based on this documentation, write me a Python CLI app (using the Click CLi library) with the following features:
openai-file-chat add-files name-of-vector-store *.pdf *.txt

This creates a new vector store called name-of-vector-store and adds all the files passed to the command to that store.
openai-file-chat name-of-vector-store1 name-of-vector-store2 ...

This starts an interactive chat with the user, where any time they hit enter the question is answered by a chat assistant using the specified vector stores.
assistant (Aug 29, 2024, 08:09 PM)
Certainly! I'll create a Python CLI app using the Click library that implements the features you've described. Here's the implementation:
OpenAI File Chat CLI

import click
import os
from openai import OpenAI
from typing import List

client = OpenAI()

@click.group()
def cli():
    """OpenAI File Chat CLI"""
    pass

@cli.command()
@click.argument('vector_store_name')
@click.argument('files', nargs=-1, type=click.Path(exists=True))
def add_files(vector_store_name: str, files: List[str]):
    """Create a new vector store and add files to it."""
    try:
        # Create a new vector store
        vector_store = client.beta.vector_stores.create(name=vector_store_name)
        click.echo(f"Created vector store: {vector_store_name}")

        # Upload files and add them to the vector store
        file_ids = []
        for file_path in files:
            with open(file_path, "rb") as file:
                uploaded_file = client.files.create(file=file, purpose="assistants")
                file_ids.append(uploaded_file.id)
        
        # Add files to the vector store
        file_batch = client.beta.vector_stores.file_batches.create_and_poll(
            vector_store_id=vector_store.id,
            file_ids=file_ids
        )

        click.echo(f"Added {len(file_ids)} files to the vector store.")
        click.echo(f"File batch status: {file_batch.status}")
        click.echo(f"File counts: {file_batch.file_counts}")

    except Exception as e:
        click.echo(f"An error occurred: {str(e)}", err=True)

@cli.command()
@click.argument('vector_store_names', nargs=-1)
def chat(vector_store_names: List[str]):
    """Start an interactive chat using specified vector stores."""
    try:
        # Create an assistant with file search enabled
        assistant = client.beta.assistants.create(
            name="File Chat Assistant",
            instructions="You are a helpful assistant. Use the provided vector stores to answer user questions.",
            model="gpt-4o",
            tools=[{"type": "file_search"}],
            tool_resources={
                "file_search": {
                    "vector_store_ids": vector_store_names
                }
            }
        )

        # Create a thread
        thread = client.beta.threads.create()

        click.echo("Chat started. Type your questions and press Enter. Type 'exit' to quit.")

        while True:
            user_input = click.prompt("You")
            if user_input.lower() == 'exit':
                break

            # Add the user's message to the thread
            client.beta.threads.messages.create(
                thread_id=thread.id,
                role="user",
                content=user_input
            )

            # Create a run
            run = client.beta.threads.runs.create(
                thread_id=thread.id,
                assistant_id=assistant.id
            )

            # Wait for the run to complete
            while run.status != "completed":
                run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

            # Retrieve and display the assistant's response
            messages = client.beta.threads.messages.list(thread_id=thread.id)
            for message in messages.data:
                if message.role == "assistant":
                    click.echo(f"Assistant: {message.content[0].text.value}")
                    break

    except Exception as e:
        click.echo(f"An error occurred: {str(e)}", err=True)

if __name__ == '__main__':
    cli()
This Python CLI app implements the features you requested using the Click library and the OpenAI API. Here's a breakdown of the main components:

The add_files command creates a new vector store with the given name and adds the specified files to it.
The chat command starts an interactive chat session using the specified vector stores.

To use this CLI app, you would save it as a Python file (e.g., openai_file_chat.py) and run it from the command line. Here are some example commands:


To add files to a new vector store:
python openai_file_chat.py add-files my-vector-store *.pdf *.txt


To start a chat session using one or more vector stores:
python openai_file_chat.py chat vector-store1 vector-store2


Note that you'll need to have the OpenAI Python library installed (pip install openai) and set up your OpenAI API key as an environment variable (OPENAI_API_KEY) for this script to work.
This implementation assumes that the vector store IDs are the same as the names you provide when creating them. If this is not the case in the actual API, you may need to modify the chat function to retrieve the correct vector store IDs.
Also, error handling is basic in this implementation. You might want to add more robust error handling and user feedback for a production-ready application.
Would you like me to explain any part of the code in more detail or make any modifications?
human (Aug 29, 2024, 08:12 PM)
output just the Python code to add a list-stores command
assistant (Aug 29, 2024, 08:12 PM)
Certainly! Here's the Python code to add a list-stores command to the existing CLI app:
OpenAI File Chat CLI

import click
import os
from openai import OpenAI
from typing import List

client = OpenAI()

@click.group()
def cli():
    """OpenAI File Chat CLI"""
    pass

# ... (previous code remains the same)

@cli.command()
def list_stores():
    """List all vector stores."""
    try:
        vector_stores = client.beta.vector_stores.list()
        if vector_stores.data:
            click.echo("Vector Stores:")
            for store in vector_stores.data:
                click.echo(f"- ID: {store.id}, Name: {store.name}")
        else:
            click.echo("No vector stores found.")
    except Exception as e:
        click.echo(f"An error occurred: {str(e)}", err=True)

if __name__ == '__main__':
    cli()
This code adds a new list_stores command to the CLI app. Here's what it does:

It defines a new function list_stores() decorated with @cli.command().
Inside the function, it uses the OpenAI client to list all vector stores.
If vector stores are found, it prints each store's ID and name.
If no vector stores are found, it prints a message saying so.
If an error occurs, it prints an error message.

To use this new command, you would run:
python openai_file_chat.py list-stores

This will list all the vector stores associated with your OpenAI account.
Remember to place this new function alongside the other command functions in your CLI app, before the if __name__ == '__main__': line.
human (Aug 29, 2024, 08:16 PM)
Modify the chat command such that after every response it does this thing:
run_step = client.beta.threads.runs.steps.retrieve(
    thread_id="thread_abc123",
    run_id="run_abc123",
    step_id="step_abc123",
    include=["step_details.tool_calls[*].file_search.results[*].content"]
)


And prints out the details
assistant (Aug 29, 2024, 08:16 PM)
Certainly! I'll modify the chat command to retrieve and print out the details of the run step after every response. Here's the updated version of the chat function:
OpenAI File Chat CLI

import click
import os
from openai import OpenAI
from typing import List
import json

client = OpenAI()

# ... (previous code remains the same)

@cli.command()
@click.argument('vector_store_names', nargs=-1)
def chat(vector_store_names: List[str]):
    """Start an interactive chat using specified vector stores."""
    try:
        # Create an assistant with file search enabled
        assistant = client.beta.assistants.create(
            name="File Chat Assistant",
            instructions="You are a helpful assistant. Use the provided vector stores to answer user questions.",
            model="gpt-4o",
            tools=[{"type": "file_search"}],
            tool_resources={
                "file_search": {
                    "vector_store_ids": vector_store_names
                }
            }
        )

        # Create a thread
        thread = client.beta.threads.create()

        click.echo("Chat started. Type your questions and press Enter. Type 'exit' to quit.")

        while True:
            user_input = click.prompt("You")
            if user_input.lower() == 'exit':
                break

            # Add the user's message to the thread
            client.beta.threads.messages.create(
                thread_id=thread.id,
                role="user",
                content=user_input
            )

            # Create a run
            run = client.beta.threads.runs.create(
                thread_id=thread.id,
                assistant_id=assistant.id
            )

            # Wait for the run to complete
            while run.status != "completed":
                run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

            # Retrieve and display the assistant's response
            messages = client.beta.threads.messages.list(thread_id=thread.id)
            for message in messages.data:
                if message.role == "assistant":
                    click.echo(f"Assistant: {message.content[0].text.value}")
                    break

            # Retrieve and display the run step details
            run_steps = client.beta.threads.runs.steps.list(thread_id=thread.id, run_id=run.id)
            for step in run_steps.data:
                if step.type == "tool_calls":
                    for tool_call in step.step_details.tool_calls:
                        if tool_call.type == "file_search":
                            run_step = client.beta.threads.runs.steps.retrieve(
                                thread_id=thread.id,
                                run_id=run.id,
                                step_id=step.id,
                                include=["step_details.tool_calls[*].file_search.results[*].content"]
                            )
                            click.echo("\nFile Search Results:")
                            click.echo(json.dumps(run_step.step_details.tool_calls[0].file_search.results, indent=2))

    except Exception as e:
        click.echo(f"An error occurred: {str(e)}", err=True)

# ... (rest of the code remains the same)
Here's a breakdown of the changes made to the chat function:


We've added import json at the top of the file to help with pretty-printing the results.


After retrieving and displaying the assistant's response, we now iterate through the run steps:
run_steps = client.beta.threads.runs.steps.list(thread_id=thread.id, run_id=run.id)


For each step that involves tool calls (specifically file searches), we retrieve the detailed step information:
run_step = client.beta.threads.runs.steps.retrieve(
    thread_id=thread.id,
    run_id=run.id,
    step_id=step.id,
    include=["step_details.tool_calls[*].file_search.results[*].content"]
)


We then print out the file search results:
click.echo("\nFile Search Results:")
click.echo(json.dumps(run_step.step_details.tool_calls[0].file_search.results, indent=2))


This modification will now display the file search results after each assistant response, giving you insight into which parts of the vector stores were used to generate the response.
Note that this will increase the verbosity of the output significantly. If you find it too verbose, you might want to add an option to toggle this detailed output on and off.
Also, keep in mind that not all runs may involve file searches, so the code checks for the presence of tool calls and file searches before attempting to retrieve and display this information.
human (Aug 29, 2024, 08:19 PM)
click.echo(json.dumps(run_step.step_details.tool_calls[0].file_search.results, indent=2, default=repr))
Those things are Pydantic objects, how can I make them print as pretty as possible?
assistant (Aug 29, 2024, 08:19 PM)
You're right, and I apologize for the oversight. Pydantic objects aren't directly JSON serializable, so we need a different approach to print them prettily. Here's an improved version that should work better with Pydantic objects:
OpenAI File Chat CLI

import click
import os
from openai import OpenAI
from typing import List
from pydantic import BaseModel
from pprint import pformat

client = OpenAI()

def pretty_print_pydantic(obj):
    if isinstance(obj, BaseModel):
        return pformat(obj.dict(), indent=2, width=120)
    elif isinstance(obj, list):
        return pformat([pretty_print_pydantic(item) for item in obj], indent=2, width=120)
    elif isinstance(obj, dict):
        return pformat({k: pretty_print_pydantic(v) for k, v in obj.items()}, indent=2, width=120)
    else:
        return repr(obj)

# ... (previous code remains the same)

@cli.command()
@click.argument('vector_store_names', nargs=-1)
def chat(vector_store_names: List[str]):
    """Start an interactive chat using specified vector stores."""
    try:
        # Create an assistant with file search enabled
        assistant = client.beta.assistants.create(
            name="File Chat Assistant",
            instructions="You are a helpful assistant. Use the provided vector stores to answer user questions.",
            model="gpt-4o",
            tools=[{"type": "file_search"}],
            tool_resources={
                "file_search": {
                    "vector_store_ids": vector_store_names
                }
            }
        )

        # Create a thread
        thread = client.beta.threads.create()

        click.echo("Chat started. Type your questions and press Enter. Type 'exit' to quit.")

        while True:
            user_input = click.prompt("You")
            if user_input.lower() == 'exit':
                break

            # Add the user's message to the thread
            client.beta.threads.messages.create(
                thread_id=thread.id,
                role="user",
                content=user_input
            )

            # Create a run
            run = client.beta.threads.runs.create(
                thread_id=thread.id,
                assistant_id=assistant.id
            )

            # Wait for the run to complete
            while run.status != "completed":
                run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

            # Retrieve and display the assistant's response
            messages = client.beta.threads.messages.list(thread_id=thread.id)
            for message in messages.data:
                if message.role == "assistant":
                    click.echo(f"Assistant: {message.content[0].text.value}")
                    break

            # Retrieve and display the run step details
            run_steps = client.beta.threads.runs.steps.list(thread_id=thread.id, run_id=run.id)
            for step in run_steps.data:
                if step.type == "tool_calls":
                    for tool_call in step.step_details.tool_calls:
                        if tool_call.type == "file_search":
                            run_step = client.beta.threads.runs.steps.retrieve(
                                thread_id=thread.id,
                                run_id=run.id,
                                step_id=step.id,
                                include=["step_details.tool_calls[*].file_search.results[*].content"]
                            )
                            click.echo("\nFile Search Results:")
                            click.echo(pretty_print_pydantic(run_step.step_details.tool_calls[0].file_search.results))

    except Exception as e:
        click.echo(f"An error occurred: {str(e)}", err=True)

# ... (rest of the code remains the same)
Here are the key changes:


We've added imports for BaseModel from Pydantic and pformat from the pprint module:
from pydantic import BaseModel
from pprint import pformat


We've defined a new pretty_print_pydantic function that recursively handles Pydantic objects, lists, and dictionaries:
def pretty_print_pydantic(obj):
    if isinstance(obj, BaseModel):
        return pformat(obj.dict(), indent=2, width=120)
    elif isinstance(obj, list):
        return pformat([pretty_print_pydantic(item) for item in obj], indent=2, width=120)
    elif isinstance(obj, dict):
        return pformat({k: pretty_print_pydantic(v) for k, v in obj.items()}, indent=2, width=120)
    else:
        return repr(obj)


In the chat function, we've replaced the json.dumps call with our new pretty_print_pydantic function:
click.echo(pretty_print_pydantic(run_step.step_details.tool_calls[0].file_search.results))


This approach should provide a much prettier and more readable output for the Pydantic objects returned by the OpenAI API. It will handle nested Pydantic models, lists, and dictionaries, and fall back to the repr for any other types of objects.
The pformat function from the pprint module is used to create a formatted string representation of the data, which should be more readable than a simple str or repr output.
If you find that certain fields are still not displaying as you'd like, you might need to adjust the pretty_print_pydantic function to handle those specific cases. But this should provide a good starting point for prettily printing the Pydantic objects from the OpenAI API.