human (Oct 17, 2024, 11:57 AM)
curl "https://api.openai.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-audio-preview",
"modalities": ["text", "audio"],
"audio": { "voice": "alloy", "format": "wav" },
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this recording?" },
{
"type": "input_audio",
"input_audio": {
"data": "<base64 bytes here>",
"format": "wav"
}
}
]
}
]
}'
assistant (Oct 17, 2024, 11:57 AM)
This code snippet shows a cURL command to make an API request to OpenAI's chat completions endpoint. Let's break down the key components:
-
The request is sent to "https://api.openai.com/v1/chat/completions"
-
Headers:
- Content-Type is set to application/json
- Authorization uses a bearer token stored in the OPENAI_API_KEY environment variable
-
The request body is a JSON object with the following structure:
- "model": "gpt-4o-audio-preview" (specifies the model to use)
- "modalities": ["text", "audio"] (indicates the types of input/output)
- "audio": Specifies audio output settings
- "voice": "alloy" (the voice to use)
- "format": "wav" (the audio format)
- "messages": An array containing the conversation history
- A single message with "role": "user"
- The "content" of this message is an array with two items:
- A text item asking "What is in this recording?"
- An input_audio item with base64-encoded WAV audio data
This API call appears to be sending both text and audio to the GPT-4o model, asking it to analyze the contents of an audio recording, and potentially generate a response that includes both text and audio.
Would you like me to explain any specific part of this code in more detail?
human (Oct 17, 2024, 11:58 AM)
Turn this into a bash script I can call like this:
./audio-prompt "what is this?" /tmp/blah.wav
The script should base encode that audio in the right place and make the call
assistant (Oct 17, 2024, 11:58 AM)
Certainly! I'll create a bash script that takes a text prompt and an audio file path as arguments, encodes the audio file, and makes the API call to OpenAI. Here's the script:
#!/bin/bash
# Check if jq is installed
if ! command -v jq &> /dev/null; then
echo "Error: jq is not installed. Please install it to run this script."
exit 1
fi
# Check if both arguments are provided
if [ $# -ne 2 ]; then
echo "Usage: $0 <text_prompt> <audio_file_path>"
exit 1
fi
TEXT_PROMPT="$1"
AUDIO_FILE="$2"
# Check if the audio file exists
if [ ! -f "$AUDIO_FILE" ]; then
echo "Error: Audio file does not exist: $AUDIO_FILE"
exit 1
fi
# Base64 encode the audio file
AUDIO_BASE64=$(base64 -w 0 "$AUDIO_FILE")
# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
--arg model "gpt-4o-audio-preview" \
--arg text "$TEXT_PROMPT" \
--arg audio "$AUDIO_BASE64" \
'{
model: $model,
modalities: ["text", "audio"],
audio: {voice: "alloy", format: "wav"},
messages: [
{
role: "user",
content: [
{type: "text", text: $text},
{
type: "input_audio",
input_audio: {
data: $audio,
format: "wav"
}
}
]
}
]
}')
# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d "$JSON_PAYLOAD" | jq .
This script does the following:
- It checks if
jqis installed (we use it for JSON processing). - It verifies that two arguments are provided: the text prompt and the audio file path.
- It checks if the audio file exists.
- It base64 encodes the audio file.
- It constructs the JSON payload using
jq, incorporating the text prompt and base64-encoded audio. - It makes the API call to OpenAI using curl and pipes the result through
jqfor pretty-printing.
To use this script:
- Save it to a file, e.g.,
audio-prompt.sh - Make it executable:
chmod +x audio-prompt.sh - Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_api_key_here - Run it as you specified:
./audio-prompt.sh "what is this?" /tmp/blah.wav
Note that this script assumes:
- The audio file is in WAV format.
- You have
jqinstalled (for JSON processing). - You have set the
OPENAI_API_KEYenvironment variable.
If you need any modifications or have any questions about the script, feel free to ask!
human (Oct 17, 2024, 11:59 AM)
error:
Usage: base64 [-Ddh] [-b num] [-i in_file] [-o out_file] -b, --break break encoded output up into lines of length num -D, -d, --decode decode input -h, --help display this message -i, --input input file (default: "-" for stdin) -o, --output output file (default: "-" for stdout)
I'm on macos
assistant (Oct 17, 2024, 11:59 AM)
I apologize for the confusion. You're right, and thank you for pointing that out. The base64 command on macOS doesn't have the -w option that's commonly used on Linux systems. Let's modify the script to work correctly on macOS.
#!/bin/bash
# Check if jq is installed
if ! command -v jq &> /dev/null; then
echo "Error: jq is not installed. Please install it to run this script."
exit 1
fi
# Check if both arguments are provided
if [ $# -ne 2 ]; then
echo "Usage: $0 <text_prompt> <audio_file_path>"
exit 1
fi
TEXT_PROMPT="$1"
AUDIO_FILE="$2"
# Check if the audio file exists
if [ ! -f "$AUDIO_FILE" ]; then
echo "Error: Audio file does not exist: $AUDIO_FILE"
exit 1
fi
# Base64 encode the audio file (macOS compatible)
AUDIO_BASE64=$(base64 < "$AUDIO_FILE" | tr -d '\n')
# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
--arg model "gpt-4o-audio-preview" \
--arg text "$TEXT_PROMPT" \
--arg audio "$AUDIO_BASE64" \
'{
model: $model,
modalities: ["text", "audio"],
audio: {voice: "alloy", format: "wav"},
messages: [
{
role: "user",
content: [
{type: "text", text: $text},
{
type: "input_audio",
input_audio: {
data: $audio,
format: "wav"
}
}
]
}
]
}')
# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d "$JSON_PAYLOAD" | jq .
I've updated the script to be compatible with macOS. The main change is in the base64 encoding line:
AUDIO_BASE64=$(base64 < "$AUDIO_FILE" | tr -d '\n')This modification does the following:
- Uses
base64 < "$AUDIO_FILE"to encode the file content. - Pipes the result through
tr -d '\n'to remove newlines, ensuring the base64 output is on a single line.
The rest of the script remains the same. This version should now work correctly on macOS.
To use the script:
- Save it to a file, e.g.,
audio-prompt.sh - Make it executable:
chmod +x audio-prompt.sh - Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_api_key_here - Run it as you specified:
./audio-prompt.sh "what is this?" /tmp/blah.wav
This script should now work on macOS without any issues. If you encounter any other problems or need further modifications, please let me know!