Vultr Inference API (1.2.0)

Download OpenAPI specification:Download

Chat

Create Chat Completion

Create Chat Completion on a specified text generation model.

Authorizations:
API Key
Request Body schema: application/json
model
required
string

The model that will be inferred for chat completion.

required
Array of objects (Chat Completion Message)

The message context to use for the chat completion request, separated by system, user, and assistant roles.

max_tokens
integer
Default: 512

The maximum number of tokens to generate for the chat completion.

seed
integer
Default: -1

If you would like a different response from the same message, changing the seed will change the response. -1 generates a random seed.

temperature
number <float> [ 0 .. 2 ]
Default: 0.8

A value between 0.0 and 2.0 that controls the randomness of the model's output. When set closer to 1, such as 0.8, the outcome is more unpredictable and creative. Values nearing 0, like 0.2, produce more predictable and less creative results. Setting temperature to zero is equivalent to setting a seed, enabling deterministic testing.

top_k
number <float> [ 0 .. 100 ]
Default: 40

A value between 0 and 100 that controls how many tokens are considered. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

top_p
number <float> [ 0 .. 1 ]
Default: 0.9

A value between 0.0 and 1.0 that controls the probability of the model generating a particular token. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

stream
boolean

Indicates whether the response should be streamed.

Responses

Request samples

Content type
application/json
{
  • "model": "string",
  • "messages": [
    ],
  • "max_tokens": 512,
  • "seed": -1,
  • "temperature": 0.8,
  • "top_k": 40,
  • "top_p": 0.9,
  • "stream": true
}

Response samples

Content type
application/json
Example
{
  • "id": "string",
  • "created": 0,
  • "model": "string",
  • "choices": [
    ],
  • "usage": {
    }
}

RAG Chat Completion

Create Chat Completion on a specified text generation model with retrieval-augmented generation, utilizing the context of relevant search results from items or files in a vectore store collection.

Authorizations:
API Key
Request Body schema: application/json
collection
required
string

The vector store collection to search for relevant context.

model
required
string

The model that will be inferred for chat completion.

required
Array of objects (Chat Completion Message)

The message context to use for the chat completion request, separated by system, user, and assistant roles.

max_tokens
integer
Default: 512

The maximum number of tokens to generate for the chat completion. Does not include token usage for the vector search embeddings operation.

seed
integer
Default: -1

If you would like a different response from the same message, changing the seed will change the response. -1 generates a random seed.

temperature
number <float> [ 0 .. 2 ]
Default: 0.8

A value between 0.0 and 2.0 that controls the randomness of the model's output. When set closer to 1, such as 0.8, the outcome is more unpredictable and creative. Values nearing 0, like 0.2, produce more predictable and less creative results. Setting temperature to zero is equivalent to setting a seed, enabling deterministic testing.

top_k
number <float> [ 0 .. 100 ]
Default: 40

A value between 0 and 100 that controls how many tokens are considered. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

top_p
number <float> [ 0 .. 1 ]
Default: 0.9

A value between 0.0 and 1.0 that controls the probability of the model generating a particular token. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

stream
boolean

Indicates whether the response should be streamed.

Responses

Request samples

Content type
application/json
{
  • "collection": "string",
  • "model": "string",
  • "messages": [
    ],
  • "max_tokens": 512,
  • "seed": -1,
  • "temperature": 0.8,
  • "top_k": 40,
  • "top_p": 0.9,
  • "stream": true
}

Response samples

Content type
application/json
Example
{
  • "id": "string",
  • "created": 0,
  • "model": "string",
  • "choices": [
    ],
  • "usage": {
    }
}

Get Chat Completion

Retrieve a Chat Completion job by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the chat completion job.

Responses

Response samples

Content type
application/json
{
  • "id": "string",
  • "created": 0,
  • "model": "string",
  • "choices": [
    ],
  • "usage": {
    }
}

Rate Chat Completion

Rate a Chat Completion job by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the chat completion job.

Request Body schema: application/json
vote
required
string

Your positive or negative rating for this chat completion as up or down.

Responses

Request samples

Content type
application/json
{
  • "vote": "string"
}

Audio

Create Speech

Generates speech audio from input text.

Authorizations:
API Key
Request Body schema: application/json
model
required
string

The model that will be used to generate text-to-speech audio.

input
required
string

The text to generate audio for, up to a maximum of 2,000 characters.

voice
required
string

The voice that will be used in the generated audio.

Responses

Request samples

Content type
application/json
{
  • "model": "string",
  • "input": "string",
  • "voice": "string"
}

List Audio Voices

Get a list of voices for speech generation.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "voices": [
    ]
}

Embeddings

Create Embeddings

Creates an embedding vector representing the given input text.

Authorizations:
API Key
Request Body schema: application/json
One of
input
required
string

The input text to embed.

model
required
string

The model that will be used to generate embeddings.

encoding_format
string
Default: "float"
Enum: "float" "base64"

The format to return embeddings in.

Responses

Request samples

Content type
application/json
Example
{
  • "input": "string",
  • "model": "string",
  • "encoding_format": "float"
}

Response samples

Content type
application/json
{
  • "object": "string",
  • "data": [
    ],
  • "model": "string",
  • "usage": {
    }
}

Vector Store

List Collections

Retrieve a list of vector store collections.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "collections": [
    ]
}

Create Collection

Creates a vector store collection for searchable embeddings.

Authorizations:
API Key
Request Body schema: application/json
name
required
string

The name of the vector store collection. This is also used to auto-generate a unique ID for the record.

Responses

Request samples

Content type
application/json
{
  • "name": "string"
}

Response samples

Content type
application/json
{
  • "collection": {
    }
}

Get Collection

Retrieve a vector store collection by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "collection": {
    }
}

Update Collection

Updates a vector store collection record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Request Body schema: application/json
name
required
string

The name of the vector store collection. Note: the previously generated unique ID will remain the same.

Responses

Request samples

Content type
application/json
{
  • "name": "string"
}

Response samples

Content type
application/json
{
  • "collection": {
    }
}

Delete Collection

Deletes a vector store collection record. This will also remove all items in the collection.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Search Collection

Searches items in a vector store collection for the closest embeddings matches.

Authorizations:
API Key
Request Body schema: application/json
input
required
string

The text query to search against the embeddings items in the vector store collection.

Responses

Request samples

Content type
application/json
{
  • "input": "string"
}

Response samples

Content type
application/json
{
  • "results": [
    ],
  • "usage": {
    }
}

List Collection Items

Retrieve a list of items within a vector store collections.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "items": [
    ]
}

Add Collection Item

Adds an item to a vector store collection.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Request Body schema: application/json
content
required
string

The text to be converted into embeddings and stored in the vector store collection.

description
string

A description of the contents in this collection item record. If omitted, this value will default to a shortened version of the text stored in the collection.

Responses

Request samples

Content type
application/json
{
  • "content": "string",
  • "description": "string"
}

Response samples

Content type
application/json
{
  • "item": {
    },
  • "usage": {
    }
}

Get Collection Item

Retrieve a vector store collection item by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

itemid
required
string

The ID of the vector store collection item.

Responses

Response samples

Content type
application/json
{
  • "item": {
    }
}

Update Collection Item

Updates a vector store collection item record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

itemid
required
string

The ID of the vector store collection item.

Request Body schema: application/json
description
string

A description of the contents in this collection item record.

Responses

Request samples

Content type
application/json
{
  • "description": "string"
}

Response samples

Content type
application/json
{
  • "item": {
    }
}

Delete Collection Item

Deletes a vector store collection item record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

itemid
required
string

The ID of the vector store collection item.

Responses

List Collection Files

Retrieve a list of files within a vector store collections.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "files": [
    ]
}

Add Collection File

Adds a file to a vector store collection.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Request Body schema: multipart/form-data
file
string <binary>

The file object to be uploaded to the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "file": {
    }
}

Get Collection File

Retrieve a vector store collection file by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

fileid
required
string

The ID of the vector store collection file.

Responses

Response samples

Content type
application/json
{
  • "file": {
    }
}

Delete Collection File

Deletes a vector store collection file record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

fileid
required
string

The ID of the vector store collection file.

Responses

Models

List All Models

Retrieve a list of all inference models.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "object": "string",
  • "data": [
    ]
}

Get Model

Retrieve an inference model by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the inference model.

Responses

Response samples

Content type
application/json
{
  • "id": "string",
  • "object": "string",
  • "created": "string",
  • "owned_by": "string"
}

List Chat Models

Retrieve a list of models for chat completion inference.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "models": [
    ],
  • "private_models": [
    ]
}

List Audio Models

Retrieve a list of models for speech and audio inference.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "speech": [
    ]
}

Usage

Get Usage

View usage information for the current and previous months.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "current_month": {
    },
  • "previous_month": {
    }
}

Request Logs

Query Requests

Look up requests sent to the Vultr Inference API along with their response details.

Authorizations:
API Key
Request Body schema: application/json
period
required
integer
Enum: 15 30 45 60

The number of minutes to search back from the chosen timestamp, up to the previous hour.

timestamp
string

The UTC timestamp to search request logs from in ISO 8601 format, e.g. 2024-01-01T12:00:00Z. Omit to default to the current UTC time.

endpoint
string

The name of the endpoint to narrow your request search.

Responses

Request samples

Content type
application/json
{
  • "period": 15,
  • "timestamp": "string",
  • "endpoint": "string"
}

Response samples

Content type
application/json
{
  • "requests": [
    ]
}

Health Check

Get Cluster Status

View the current status of the inference cluster.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "status": "string"
}