Skip to content

Instantly share code, notes, and snippets.

@simonw
Created October 18, 2024 05:36
Show Gist options
  • Save simonw/003e5ac2e453097176fd0a9f93656e3e to your computer and use it in GitHub Desktop.
Save simonw/003e5ac2e453097176fd0a9f93656e3e to your computer and use it in GitHub Desktop.

Interacting with OpenAI's Chat API

human (Oct 17, 2024, 11:57 AM)

curl "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
      "model": "gpt-4o-audio-preview",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "wav" },
      "messages": [
        {
          "role": "user",
          "content": [
            { "type": "text", "text": "What is in this recording?" },
            { 
              "type": "input_audio", 
              "input_audio": { 
                "data": "<base64 bytes here>", 
                "format": "wav" 
              }
            }
          ]
        }
      ]
    }'

assistant (Oct 17, 2024, 11:57 AM)

This code snippet shows a cURL command to make an API request to OpenAI's chat completions endpoint. Let's break down the key components:

  1. The request is sent to "https://api.openai.com/v1/chat/completions"

  2. Headers:

    • Content-Type is set to application/json
    • Authorization uses a bearer token stored in the OPENAI_API_KEY environment variable
  3. The request body is a JSON object with the following structure:

    • "model": "gpt-4o-audio-preview" (specifies the model to use)
    • "modalities": ["text", "audio"] (indicates the types of input/output)
    • "audio": Specifies audio output settings
      • "voice": "alloy" (the voice to use)
      • "format": "wav" (the audio format)
    • "messages": An array containing the conversation history
      • A single message with "role": "user"
      • The "content" of this message is an array with two items:
        1. A text item asking "What is in this recording?"
        2. An input_audio item with base64-encoded WAV audio data

This API call appears to be sending both text and audio to the GPT-4o model, asking it to analyze the contents of an audio recording, and potentially generate a response that includes both text and audio.

Would you like me to explain any specific part of this code in more detail?

human (Oct 17, 2024, 11:58 AM)

Turn this into a bash script I can call like this:

./audio-prompt "what is this?" /tmp/blah.wav

The script should base encode that audio in the right place and make the call

assistant (Oct 17, 2024, 11:58 AM)

Certainly! I'll create a bash script that takes a text prompt and an audio file path as arguments, encodes the audio file, and makes the API call to OpenAI. Here's the script:

OpenAI Audio Prompt Bash Script

#!/bin/bash

# Check if jq is installed
if ! command -v jq &> /dev/null; then
    echo "Error: jq is not installed. Please install it to run this script."
    exit 1
fi

# Check if both arguments are provided
if [ $# -ne 2 ]; then
    echo "Usage: $0 <text_prompt> <audio_file_path>"
    exit 1
fi

TEXT_PROMPT="$1"
AUDIO_FILE="$2"

# Check if the audio file exists
if [ ! -f "$AUDIO_FILE" ]; then
    echo "Error: Audio file does not exist: $AUDIO_FILE"
    exit 1
fi

# Base64 encode the audio file
AUDIO_BASE64=$(base64 -w 0 "$AUDIO_FILE")

# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
    --arg model "gpt-4o-audio-preview" \
    --arg text "$TEXT_PROMPT" \
    --arg audio "$AUDIO_BASE64" \
    '{
        model: $model,
        modalities: ["text", "audio"],
        audio: {voice: "alloy", format: "wav"},
        messages: [
            {
                role: "user",
                content: [
                    {type: "text", text: $text},
                    {
                        type: "input_audio",
                        input_audio: {
                            data: $audio,
                            format: "wav"
                        }
                    }
                ]
            }
        ]
    }')

# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d "$JSON_PAYLOAD" | jq .

This script does the following:

  1. It checks if jq is installed (we use it for JSON processing).
  2. It verifies that two arguments are provided: the text prompt and the audio file path.
  3. It checks if the audio file exists.
  4. It base64 encodes the audio file.
  5. It constructs the JSON payload using jq, incorporating the text prompt and base64-encoded audio.
  6. It makes the API call to OpenAI using curl and pipes the result through jq for pretty-printing.

To use this script:

  1. Save it to a file, e.g., audio-prompt.sh
  2. Make it executable: chmod +x audio-prompt.sh
  3. Set your OpenAI API key as an environment variable: export OPENAI_API_KEY=your_api_key_here
  4. Run it as you specified: ./audio-prompt.sh "what is this?" /tmp/blah.wav

Note that this script assumes:

  • The audio file is in WAV format.
  • You have jq installed (for JSON processing).
  • You have set the OPENAI_API_KEY environment variable.

If you need any modifications or have any questions about the script, feel free to ask!

human (Oct 17, 2024, 11:59 AM)

error:

Usage: base64 [-Ddh] [-b num] [-i in_file] [-o out_file] -b, --break break encoded output up into lines of length num -D, -d, --decode decode input -h, --help display this message -i, --input input file (default: "-" for stdin) -o, --output output file (default: "-" for stdout)

I'm on macos

assistant (Oct 17, 2024, 11:59 AM)

I apologize for the confusion. You're right, and thank you for pointing that out. The base64 command on macOS doesn't have the -w option that's commonly used on Linux systems. Let's modify the script to work correctly on macOS.

OpenAI Audio Prompt Bash Script (macOS compatible)

#!/bin/bash

# Check if jq is installed
if ! command -v jq &> /dev/null; then
    echo "Error: jq is not installed. Please install it to run this script."
    exit 1
fi

# Check if both arguments are provided
if [ $# -ne 2 ]; then
    echo "Usage: $0 <text_prompt> <audio_file_path>"
    exit 1
fi

TEXT_PROMPT="$1"
AUDIO_FILE="$2"

# Check if the audio file exists
if [ ! -f "$AUDIO_FILE" ]; then
    echo "Error: Audio file does not exist: $AUDIO_FILE"
    exit 1
fi

# Base64 encode the audio file (macOS compatible)
AUDIO_BASE64=$(base64 < "$AUDIO_FILE" | tr -d '\n')

# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
    --arg model "gpt-4o-audio-preview" \
    --arg text "$TEXT_PROMPT" \
    --arg audio "$AUDIO_BASE64" \
    '{
        model: $model,
        modalities: ["text", "audio"],
        audio: {voice: "alloy", format: "wav"},
        messages: [
            {
                role: "user",
                content: [
                    {type: "text", text: $text},
                    {
                        type: "input_audio",
                        input_audio: {
                            data: $audio,
                            format: "wav"
                        }
                    }
                ]
            }
        ]
    }')

# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d "$JSON_PAYLOAD" | jq .

I've updated the script to be compatible with macOS. The main change is in the base64 encoding line:

AUDIO_BASE64=$(base64 < "$AUDIO_FILE" | tr -d '\n')

This modification does the following:

  1. Uses base64 < "$AUDIO_FILE" to encode the file content.
  2. Pipes the result through tr -d '\n' to remove newlines, ensuring the base64 output is on a single line.

The rest of the script remains the same. This version should now work correctly on macOS.

To use the script:

  1. Save it to a file, e.g., audio-prompt.sh
  2. Make it executable: chmod +x audio-prompt.sh
  3. Set your OpenAI API key as an environment variable: export OPENAI_API_KEY=your_api_key_here
  4. Run it as you specified: ./audio-prompt.sh "what is this?" /tmp/blah.wav

This script should now work on macOS without any issues. If you encounter any other problems or need further modifications, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment