human (Oct 23, 2024, 10:01 AM)
cat <<EOF > input.json
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "Extract text from this imaage"
},
{
"inlineData": {
"data": "$(base64 -i output_0.png)",
"mimeType": "image/png"
}
}
]
}
]
}
EOF
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GOOGLE_API_KEY" -H 'Content-Type: application/json' -X POST -d -
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b-latest:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d @input.json | jq
Turn this into a Bash script that runs like this:
prompt-gemini "this is the prompt"
prompt-gemini "This is the prompt" blah.png
prompt-gemini "This is the prompt" blah.pdf
prompt-gemini "this is the prompt" -m pro
It should exit with an error if GOOGLE_API_KEY
is not set
It should use a temporary file for input.json which is deleted on completion
If no file was provided it should skip the inlineData bit
It should use the correct mimeType for PNG or PDF or JPG or JPEG or GIF depending on the file extension
The -m option should follow the following rules: it defaults to 8b, or it can be:
8b => gemini-1.5-flash-8b-latest (the default) flash-8b => gemini-1.5-flash-8b-latest flash => gemini-1.5-flash-latest pro => gemini-1.5-pro-latest
Any other value should be passed used directly in the gemini-1.5-flash:generateContent
portion of the URL
assistant (Oct 23, 2024, 10:01 AM)
I'll help you create a Bash script that implements these requirements.
#!/bin/bash
# Check if GOOGLE_API_KEY is set
if [ -z "$GOOGLE_API_KEY" ]; then
echo "Error: GOOGLE_API_KEY environment variable is not set" >&2
exit 1
fi
# Default model
model="8b"
prompt=""
image_file=""
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
-m)
model="$2"
shift 2
;;
*)
if [ -z "$prompt" ]; then
prompt="$1"
elif [ -z "$image_file" ]; then
image_file="$1"
fi
shift
;;
esac
done
# Validate prompt
if [ -z "$prompt" ]; then
echo "Error: No prompt provided" >&2
echo "Usage: prompt-gemini \"prompt\" [image_file] [-m model]" >&2
exit 1
fi
# Map model names to full model strings
case $model in
"8b"|"flash-8b")
model_string="gemini-1.5-flash-8b-latest"
;;
"flash")
model_string="gemini-1.5-flash-latest"
;;
"pro")
model_string="gemini-1.5-pro-latest"
;;
*)
model_string="gemini-1.5-$model"
;;
esac
# Create temporary file
temp_file=$(mktemp)
trap 'rm -f "$temp_file"' EXIT
# Determine mime type if image file is provided
if [ -n "$image_file" ]; then
if [ ! -f "$image_file" ]; then
echo "Error: File '$image_file' not found" >&2
exit 1
fi
# Get file extension and convert to lowercase
ext=$(echo "${image_file##*.}" | tr '[:upper:]' '[:lower:]')
case $ext in
png)
mime_type="image/png"
;;
jpg|jpeg)
mime_type="image/jpeg"
;;
gif)
mime_type="image/gif"
;;
pdf)
mime_type="application/pdf"
;;
*)
echo "Error: Unsupported file type .$ext" >&2
exit 1
;;
esac
# Create JSON with image data
cat <<EOF > "$temp_file"
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "$prompt"
},
{
"inlineData": {
"data": "$(base64 "$image_file")",
"mimeType": "$mime_type"
}
}
]
}
]
}
EOF
else
# Create JSON without image data
cat <<EOF > "$temp_file"
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "$prompt"
}
]
}
]
}
EOF
fi
# Make API request
curl "https://generativelanguage.googleapis.com/v1beta/models/$model_string:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d @"$temp_file" | jq
I've created a script that meets all your requirements. Here's how to use it:
- Save it as
prompt-gemini
and make it executable withchmod +x prompt-gemini
- Set your GOOGLE_API_KEY environment variable
- Use it with any of the formats you specified:
prompt-gemini "Your prompt here"
prompt-gemini "Your prompt" image.png
prompt-gemini "Your prompt" doc.pdf
prompt-gemini "Your prompt" -m pro
The script includes:
- Error checking for the API key
- Automatic mime type detection based on file extension
- Model name mapping as specified
- Temporary file handling with proper cleanup
- Support for all the required file types (PNG, PDF, JPG, JPEG, GIF)
- Pretty-printed JSON output using jq
Would you like me to explain any part of the script or make any modifications?
I manually edited this to use
base64 -i image.png
instead (needed on macOS).I made one modification in a separate session: https://gist.github.com/simonw/b1bffe54ebdf3583ec4e3639fb535567