Straafe/Claude2OobaOpenAIAPIDoc.md

## Claude2OobaOpenAIAPIDoc.md

      
    Raw
  

              Claude2OobaOpenAIAPIDoc.md
            
          
    Here is some documentation for the OpenAI API compatible endpoints:
Completions

/v1/completions

Generates text completions for the provided prompt.
Parameters:


prompt (required): The prompt to generate completions for, as a string or list of strings.


model: Unused parameter. To change the model, use the /v1/internal/model/load endpoint.


stream: If true, will stream back partial responses as text is generated.


max_tokens: The maximum number of tokens to generate.


temperature: Sampling temperature, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.


top_p: An alternative to sampling with temperature, called nucleus sampling.


echo: If true, the prompt will be included in the completion.


stop: Up to 4 sequences where generation will stop if any are matched.


See GenerationOptions in typing.py for other generation parameters.
Returns:


id: ID of the completion.


choices: List containing the generated completions.


usage: Number of prompt tokens, completion tokens, and total tokens used.


/v1/chat/completions

Generates chat message completions based on a provided chat history.
Parameters:


messages (required): Chat history as a list of messages with role (user, assistant) and content.


model: Unused parameter. To change the model, use the /v1/internal/model/load endpoint.


stream: If true, will stream back partial responses as text is generated.


mode: instruct, chat, or chat-instruct. Controls whether assistant is in character.


instruction_template: Name of instruction template file to use.


character: Name of character file to use for assistant.


See ChatCompletionRequest in typing.py for other parameters.
Returns:
Same as /v1/completions.
Models

/v1/models

Lists the currently available models.
/v1/models/{model_id}

Gets information about the specified model.
Billing

/v1/billing/usage

Gets usage statistics for billing purposes.
Parameters:


start_date: Start date for usage stats, in YYYY-MM-DD format.


end_date: End date for usage stats, in YYYY-MM-DD format.


Returns:

total_usage: Total token usage during the specified period.

Audio

/v1/audio/transcriptions

Transcribes an audio file using Whisper.
Parameters:


file (required): The audio file to transcribe.


language: Language spoken in the audio.


model: Whisper model to use, tiny or base.


Returns:

text: Transcription text.

Images

/v1/images/generations

Generates images using Stable Diffusion.
Parameters:


prompt (required): The text prompt to generate images for.


size: Size of images to generate, like 512x512.


n: Number of images to generate.


Returns:

data: List of generated images.

Embeddings

/v1/embeddings

Gets sentence embeddings for the provided input text.
Parameters:


input (required): Input text to get embeddings for, as string or list of strings.


encoding_format: float or base64.


Returns:


object: list


data: List of embeddings, one for each input.


Moderations

/v1/moderations

Checks input text for harmful content.
Parameters:

input (required): Input text to moderate.

Returns:

results: List of moderation results, one for each input text.

Encoding/Decoding

/v1/internal/encode

Encodes text into tokens.
Parameters:

text (required): Text to encode.

/v1/internal/decode

Decodes tokens into text.
Parameters:

tokens (required): Tokens to decode.

/v1/internal/token-count

Gets the number of tokens for text.
Parameters:

text (required): Text to get token count for.

Internal

/v1/internal/model/info

Gets information about the currently loaded model.
/v1/internal/model/load

Loads a new model. Can be used to switch model on the fly.
Parameters:


model_name (required): Name of model to load.


args: Dict of args to pass to model loading.


settings: Dict of settings to update after loading.


/v1/internal/stop-generation

Stops any ongoing text generation.
So in summary, the main endpoints are /v1/completions, /v1/chat/completions, and /v1/models for generating text, chat, and listing models. The /v1/internal endpoints can be used to switch models, encode/decode, and more.