the documentation provided by Parasol’s MaaS was not very clear or explicit, especially for a first-time user. Here’s a breakdown of what the documentation offered versus what we had to discover through experimentation:
What the Documentation Said Model List: The docs listed available models (like Granite-3.1-8B-Instruct) and their endpoints. API Endpoints: It showed endpoint URLs for each model, e.g.: https://granite-3-8b-instruct-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443 Paths: For each model, it listed available HTTP paths, such as: /v1/chat/completions /v1/completions /v1/embeddings /v1/models (to list available models) No explicit sample cURL or Python code: There was no clear “here’s how to call the chat endpoint” example for your specific model or API key.
The model names in the docs (like Granite-3.1-8B-Instruct
) did not match the actual model ID required by the API (granite-3-8b-instruct
). The docs did not clarify the exact string to use in the model
parameter.
Correct Model Name:
The API required the model name to be exactly granite-3-8b-instruct
(lowercase, hyphens, no version sub-point), not Granite-3.1-8B-Instruct
as indicated in the documentation.
How I Found the Model Name:
I had to call /v1/models
to list the actual model IDs accepted by the API.
Request Format:
The API is OpenAI-compatible, so the request format is the same as OpenAI’s chat/completions
endpoint.
Working Example:
Through trial and error, I arrived at a working cURL and Python example using the correct model name and endpoint.
Here’s how I got a working Python script that securely handles API keys using a .env
file:
-
Create a
.env
file (in your project directory):GRANITE_API_KEY=your_api_key_here GRANITE_API_URL=https://granite-3-8b-instruct-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443/v1/chat/completions
-
Install dependencies (in your virtual environment):
pip install requests python-dotenv
-
Create a Python script (e.g.,
inference.py
):import os import requests from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("GRANITE_API_KEY") API_URL = os.getenv("GRANITE_API_URL") headers = { "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}" } data = { "model": "granite-3-8b-instruct", "messages": [ {"role": "user", "content": "Hello, world!"} ] } response = requests.post(API_URL, headers=headers, json=data) print("Status Code:", response.status_code) print("Response:", response.text)
-
Run your script:
python inference.py
This approach keeps your API key out of your code and lets you easily swap endpoints or keys as needed.
Step | Documentation Provided | What Actually Worked / Discovered |
---|---|---|
Endpoint URL | Yes | Yes |
Model Name | Ambiguous | Had to call /v1/models to get exact |
Example Request | No | Had to construct and test |
Auth/API Key usage | Implied | Standard Bearer token worked |
OpenAI Compatibility | Implied | Confirmed by request/response format |
The documentation gave you the pieces, but not a clear, working example. The most critical gap was the mismatch between the documented model name and the actual model
parameter required by the API. You had to experiment and inspect the /v1/models
response to discover the correct value.