dkypuros/1.md

## 1.md

      
    Raw
  

              1.md
            
          
    the documentation provided by Parasol’s MaaS was not very clear or explicit, especially for a first-time user. Here’s a breakdown of what the documentation offered versus what we had to discover through experimentation:
What the Documentation Said
Model List:
The docs listed available models (like Granite-3.1-8B-Instruct) and their endpoints.
API Endpoints:
It showed endpoint URLs for each model, e.g.:
https://granite-3-8b-instruct-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443
Paths:
For each model, it listed available HTTP paths, such as:
/v1/chat/completions
/v1/completions
/v1/embeddings
/v1/models (to list available models)
No explicit sample cURL or Python code:
There was no clear “here’s how to call the chat endpoint” example for your specific model or API key.
Model Name Ambiguity

The model names in the docs (like Granite-3.1-8B-Instruct) did not match the actual model ID required by the API (granite-3-8b-instruct). The docs did not clarify the exact string to use in the model parameter.

What I Discovered

Correct Model Name:

The API required the model name to be exactly granite-3-8b-instruct (lowercase, hyphens, no version sub-point), not Granite-3.1-8B-Instruct as indicated in the documentation.
How I Found the Model Name:

I had to call /v1/models to list the actual model IDs accepted by the API.
Request Format:

The API is OpenAI-compatible, so the request format is the same as OpenAI’s chat/completions endpoint.
Working Example:

Through trial and error, I arrived at a working cURL and Python example using the correct model name and endpoint.

Quick Start Guide: Calling Granite with API Keys Securely

Here’s how I got a working Python script that securely handles API keys using a .env file:


Create a .env file (in your project directory):
GRANITE_API_KEY=your_api_key_here
GRANITE_API_URL=https://granite-3-8b-instruct-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443/v1/chat/completions


Install dependencies (in your virtual environment):
pip install requests python-dotenv


Create a Python script (e.g., inference.py):
import os
import requests
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("GRANITE_API_KEY")
API_URL = os.getenv("GRANITE_API_URL")

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

data = {
    "model": "granite-3-8b-instruct",
    "messages": [
        {"role": "user", "content": "Hello, world!"}
    ]
}

response = requests.post(API_URL, headers=headers, json=data)
print("Status Code:", response.status_code)
print("Response:", response.text)


Run your script:
python inference.py


This approach keeps your API key out of your code and lets you easily swap endpoints or keys as needed.

Summary Table


Step
Documentation Provided
What Actually Worked / Discovered


Endpoint URL
Yes
Yes


Model Name
Ambiguous
Had to call /v1/models to get exact


Example Request
No
Had to construct and test


Auth/API Key usage
Implied
Standard Bearer token worked


OpenAI Compatibility
Implied
Confirmed by request/response format


Conclusion

The documentation gave you the pieces, but not a clear, working example. The most critical gap was the mismatch between the documented model name and the actual model parameter required by the API. You had to experiment and inspect the /v1/models response to discover the correct value.
Step	Documentation Provided	What Actually Worked / Discovered
Endpoint URL	Yes	Yes
Model Name	Ambiguous	Had to call `/v1/models` to get exact
Example Request	No	Had to construct and test
Auth/API Key usage	Implied	Standard Bearer token worked
OpenAI Compatibility	Implied	Confirmed by request/response format