Vultr Inference API (1.3.0)

Download OpenAPI specification:Download

Chat

Create Chat Completion

Create Chat Completion on a specified text generation model.

Authorizations:
API Key
Request Body schema: application/json
model
required
string

The model that will be inferred for chat completion.

required
Array of objects (Chat Completion Request Message)

The message context to use for the chat completion request, separated by system, user, and assistant roles.

stream
boolean

Indicates whether the response should be streamed.

max_tokens
integer
Default: 512

The maximum number of tokens to generate for the chat completion.

n
integer
Default: 1

The number of chat completion choices to generate for each input message.

seed
integer

If you would like a different response from the same message, changing the seed will change the response. A null value generates a random seed.

temperature
number <float> [ 0 .. 2 ]
Default: 1

A value between 0.0 and 2.0 that controls the randomness of the model's output. When set closer to 1, such as 0.8, the outcome is more unpredictable and creative. Values nearing 0, like 0.2, produce more predictable and less creative results. Setting temperature to zero is equivalent to setting a seed, enabling deterministic testing.

top_p
number <float> [ 0 .. 1 ]
Default: 1

A value between 0.0 and 1.0 that controls the probability of the model generating a particular token. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

frequency_penalty
number <float> [ -2 .. 2 ]
Default: 0

A value between -2.0 and 2.0 that controls how much the model penalizes generating repetitive responses.

presence_penalty
number <float> [ -2 .. 2 ]
Default: 0

A value between -2.0 and 2.0 that controls how much the model penalizes generating responses that contain certain words or phrases.

stop
Array of strings

A list of strings that the model will stop generating text if it encounters any of them.

logprobs
boolean

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobs
integer [ 0 .. 20 ]

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

Responses

Request samples

Content type
application/json
{
  • "model": "string",
  • "messages": [
    ],
  • "stream": true,
  • "max_tokens": 512,
  • "n": 1,
  • "seed": 0,
  • "temperature": 1,
  • "top_p": 1,
  • "frequency_penalty": 0,
  • "presence_penalty": 0,
  • "stop": [
    ],
  • "logprobs": true,
  • "top_logprobs": 20
}

Response samples

Content type
application/json
Example
{
  • "id": "string",
  • "created": 0,
  • "model": "string",
  • "choices": [
    ],
  • "usage": {
    }
}

RAG Chat Completion

Create Chat Completion on a specified text generation model with retrieval-augmented generation, utilizing the context of relevant search results from items or files in a vectore store collection.

Authorizations:
API Key
Request Body schema: application/json
collection
required
string

The vector store collection to search for relevant context.

model
required
string

The model that will be inferred for chat completion.

required
Array of objects (Chat Completion Request Message)

The message context to use for the chat completion request, separated by system, user, and assistant roles.

max_tokens
integer
Default: 512

The maximum number of tokens to generate for the chat completion.

n
integer
Default: 1

The number of chat completion choices to generate for each input message.

seed
integer

If you would like a different response from the same message, changing the seed will change the response. A null value generates a random seed.

temperature
number <float> [ 0 .. 2 ]
Default: 1

A value between 0.0 and 2.0 that controls the randomness of the model's output. When set closer to 1, such as 0.8, the outcome is more unpredictable and creative. Values nearing 0, like 0.2, produce more predictable and less creative results. Setting temperature to zero is equivalent to setting a seed, enabling deterministic testing.

top_p
number <float> [ 0 .. 1 ]
Default: 1

A value between 0.0 and 1.0 that controls the probability of the model generating a particular token. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

stop
Array of strings

A list of strings that the model will stop generating text if it encounters any of them.

frequency_penalty
number <float> [ -2 .. 2 ]
Default: 0

A value between -2.0 and 2.0 that controls how much the model penalizes generating repetitive responses.

presence_penalty
number <float> [ -2 .. 2 ]
Default: 0

A value between -2.0 and 2.0 that controls how much the model penalizes generating responses that contain certain words or phrases.

stream
boolean

Indicates whether the response should be streamed.

logprobs
boolean

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobs
integer [ 0 .. 20 ]

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

Responses

Request samples

Content type
application/json
{
  • "collection": "string",
  • "model": "string",
  • "messages": [
    ],
  • "max_tokens": 512,
  • "n": 1,
  • "seed": 0,
  • "temperature": 1,
  • "top_p": 1,
  • "stop": [
    ],
  • "frequency_penalty": 0,
  • "presence_penalty": 0,
  • "stream": true,
  • "logprobs": true,
  • "top_logprobs": 20
}

Response samples

Content type
application/json
Example
{
  • "id": "string",
  • "created": 0,
  • "model": "string",
  • "choices": [
    ],
  • "usage": {
    }
}

Audio

Create Speech

Generates speech audio from input text.

Authorizations:
API Key
Request Body schema: application/json
model
required
string

The model that will be used to generate text-to-speech audio.

input
required
string

The text to generate audio for, up to a maximum of 2,000 characters.

voice
required
string

The voice that will be used in the generated audio.

Responses

Request samples

Content type
application/json
{
  • "model": "string",
  • "input": "string",
  • "voice": "string"
}

List Audio Voices

Get a list of voices for speech generation.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "voices": [
    ]
}

Vector Store

List Collections

Retrieve a list of vector store collections.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "collections": [
    ]
}

Create Collection

Creates a vector store collection for searchable embeddings.

Authorizations:
API Key
Request Body schema: application/json
name
required
string

The name of the vector store collection. This is also used to auto-generate a unique ID for the record.

Responses

Request samples

Content type
application/json
{
  • "name": "string"
}

Response samples

Content type
application/json
{
  • "collection": {
    }
}

Get Collection

Retrieve a vector store collection by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "collection": {
    }
}

Update Collection

Updates a vector store collection record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Request Body schema: application/json
name
required
string

The name of the vector store collection. Note: the previously generated unique ID will remain the same.

Responses

Request samples

Content type
application/json
{
  • "name": "string"
}

Response samples

Content type
application/json
{
  • "collection": {
    }
}

Delete Collection

Deletes a vector store collection record. This will also remove all items in the collection.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Search Collection

Searches items in a vector store collection for the closest embeddings matches.

Authorizations:
API Key
Request Body schema: application/json
input
required
string

The text query to search against the embeddings items in the vector store collection.

Responses

Request samples

Content type
application/json
{
  • "input": "string"
}

Response samples

Content type
application/json
{
  • "results": [
    ],
  • "usage": {
    }
}

List Collection Items

Retrieve a list of items within a vector store collections.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "items": [
    ]
}

Add Collection Item

Adds an item to a vector store collection.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Request Body schema: application/json
content
required
string

The text to be converted into embeddings and stored in the vector store collection.

description
string

A description of the contents in this collection item record. If omitted, this value will default to a shortened version of the text stored in the collection.

Responses

Request samples

Content type
application/json
{
  • "content": "string",
  • "description": "string"
}

Response samples

Content type
application/json
{
  • "item": {
    },
  • "usage": {
    }
}

Get Collection Item

Retrieve a vector store collection item by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

itemid
required
string

The ID of the vector store collection item.

Responses

Response samples

Content type
application/json
{
  • "item": {
    }
}

Update Collection Item

Updates a vector store collection item record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

itemid
required
string

The ID of the vector store collection item.

Request Body schema: application/json
description
string

A description of the contents in this collection item record.

Responses

Request samples

Content type
application/json
{
  • "description": "string"
}

Response samples

Content type
application/json
{
  • "item": {
    }
}

Delete Collection Item

Deletes a vector store collection item record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

itemid
required
string

The ID of the vector store collection item.

Responses

List Collection Files

Retrieve a list of files within a vector store collections.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "files": [
    ]
}

Add Collection File

Adds a file to a vector store collection.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

Request Body schema: multipart/form-data
file
string <binary>

The file object to be uploaded to the vector store collection.

Responses

Response samples

Content type
application/json
{
  • "file": {
    }
}

Get Collection File

Retrieve a vector store collection file by the ID.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

fileid
required
string

The ID of the vector store collection file.

Responses

Response samples

Content type
application/json
{
  • "file": {
    }
}

Delete Collection File

Deletes a vector store collection file record.

Authorizations:
API Key
path Parameters
id
required
string

The ID of the vector store collection.

fileid
required
string

The ID of the vector store collection file.

Responses

Models

List Models

Retrieve a list of inference models.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "object": "string",
  • "data": [
    ]
}

Get Model

Retrieves a specific inference model.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "id": "string",
  • "created": "string",
  • "object": "string",
  • "owned_by": "string",
  • "features": [
    ]
}

List Audio Models

Retrieve a list of models for speech and audio inference.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "speech": [
    ]
}

Usage

Get Usage

View usage information for the current and previous months.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "current_month": {
    },
  • "previous_month": {
    }
}

Request Logs

Query Requests

Look up requests sent to the Vultr Inference API along with their response details.

Authorizations:
API Key
Request Body schema: application/json
period
required
integer
Enum: 15 30 45 60

The number of minutes to search back from the chosen timestamp, up to the previous hour.

timestamp
string

The UTC timestamp to search request logs from in ISO 8601 format, e.g. 2024-01-01T12:00:00Z. Omit to default to the current UTC time.

endpoint
string

The name of the endpoint to narrow your request search.

Responses

Request samples

Content type
application/json
{
  • "period": 15,
  • "timestamp": "string",
  • "endpoint": "string"
}

Response samples

Content type
application/json
{
  • "requests": [
    ]
}

Health Check

Get Cluster Status

View the current status of the inference cluster.

Authorizations:
API Key

Responses

Response samples

Content type
application/json
{
  • "status": "string"
}