Vultr Inference API (1.0.6)

Download OpenAPI specification:Download

E-mail: support@vultr.com URL: https://www.vultr.com

Chat

Create Chat Completion

Create Chat Completion on a specified text generation model.

Authorizations:

API Key

Request Body schema: application/json

model required	string The model that will be inferred for chat completion.
required	Array of objects (Chat Completion Request Message) The message context to use for the chat completion request, separated by system, user, and assistant roles.
stream	boolean Indicates whether the response should be streamed.
max_tokens	integer Default: 512 The maximum number of tokens to generate for the chat completion.
n	integer Default: 1 The number of chat completion choices to generate for each input message.
seed	integer If you would like a different response from the same message, changing the seed will change the response. A null value generates a random seed.
temperature	number <float> [ 0 .. 2 ] Default: 1 A value between 0.0 and 2.0 that controls the randomness of the model's output. When set closer to 1, such as 0.8, the outcome is more unpredictable and creative. Values nearing 0, like 0.2, produce more predictable and less creative results. Setting temperature to zero is equivalent to setting a seed, enabling deterministic testing.
top_p	number <float> [ 0 .. 1 ] Default: 1 A value between 0.0 and 1.0 that controls the probability of the model generating a particular token. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.
frequency_penalty	number <float> [ -2 .. 2 ] Default: 0 A value between -2.0 and 2.0 that controls how much the model penalizes generating repetitive responses.
presence_penalty	number <float> [ -2 .. 2 ] Default: 0 A value between -2.0 and 2.0 that controls how much the model penalizes generating responses that contain certain words or phrases.
stop	Array of strings A list of strings that the model will stop generating text if it encounters any of them.
logprobs	boolean Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
top_logprobs	integer [ 0 .. 20 ] An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to true if this parameter is used.

Responses

Request samples

Payload

Content type

application/json

{"model": "string",
"messages": [{"role": "system",
"content": "string"
}
],
"stream": true,
"max_tokens": 512,
"n": 1,
"seed": 0,
"temperature": 1,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
"stop": ["string"
],
"logprobs": true,
"top_logprobs": 20
}

Response samples

200

Content type

application/json

Example

Chat Completion Response Payload

{"id": "string",
"created": 0,
"model": "string",
"choices": [{"index": 0,
"message": {"role": "system",
"content": "string",
"tool_calls": [{"id": "string",
"type": "string",
"function": {"name": "string",
"arguments": "string"
}
}
]
},
"logprobs": {"content": [{"token": "string",
"logprob": 0.1,
"bytes": [0
],
"top_logprobs": [{"token": "string",
"logprob": 0.1,
"bytes": [0
]
}
]
}
]
},
"finish_reason": "string"
}
],
"usage": {"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
}
}

RAG Chat Completion

Create Chat Completion on a specified text generation model with retrieval-augmented generation, utilizing the context of relevant search results from items or files in a vectore store collection.

Authorizations:

API Key

Request Body schema: application/json

collection required	string The vector store collection to search for relevant context.
model required	string The model that will be inferred for chat completion.
required	Array of objects (Chat Completion Request Message) The message context to use for the chat completion request, separated by system, user, and assistant roles.
max_tokens	integer Default: 512 The maximum number of tokens to generate for the chat completion.
n	integer Default: 1 The number of chat completion choices to generate for each input message.
seed	integer If you would like a different response from the same message, changing the seed will change the response. A null value generates a random seed.
temperature	number <float> [ 0 .. 2 ] Default: 1 A value between 0.0 and 2.0 that controls the randomness of the model's output. When set closer to 1, such as 0.8, the outcome is more unpredictable and creative. Values nearing 0, like 0.2, produce more predictable and less creative results. Setting temperature to zero is equivalent to setting a seed, enabling deterministic testing.
top_p	number <float> [ 0 .. 1 ] Default: 1 A value between 0.0 and 1.0 that controls the probability of the model generating a particular token. A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.
stop	Array of strings A list of strings that the model will stop generating text if it encounters any of them.
frequency_penalty	number <float> [ -2 .. 2 ] Default: 0 A value between -2.0 and 2.0 that controls how much the model penalizes generating repetitive responses.
presence_penalty	number <float> [ -2 .. 2 ] Default: 0 A value between -2.0 and 2.0 that controls how much the model penalizes generating responses that contain certain words or phrases.
stream	boolean Indicates whether the response should be streamed.
logprobs	boolean Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
top_logprobs	integer [ 0 .. 20 ] An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to true if this parameter is used.

Responses

Request samples

Payload

Content type

application/json

{"collection": "string",
"model": "string",
"messages": [{"role": "system",
"content": "string"
}
],
"max_tokens": 512,
"n": 1,
"seed": 0,
"temperature": 1,
"top_p": 1,
"stop": ["string"
],
"frequency_penalty": 0,
"presence_penalty": 0,
"stream": true,
"logprobs": true,
"top_logprobs": 20
}

Response samples

200

Content type

application/json

Example

Chat Completion Response Payload

{"id": "string",
"created": 0,
"model": "string",
"choices": [{"index": 0,
"message": {"role": "system",
"content": "string",
"tool_calls": [{"id": "string",
"type": "string",
"function": {"name": "string",
"arguments": "string"
}
}
]
},
"logprobs": {"content": [{"token": "string",
"logprob": 0.1,
"bytes": [0
],
"top_logprobs": [{"token": "string",
"logprob": 0.1,
"bytes": [0
]
}
]
}
]
},
"finish_reason": "string"
}
],
"usage": {"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
}
}

Audio

Create Speech

Generates speech audio from input text.

Authorizations:

API Key

Request Body schema: application/json

model required	string The model that will be used to generate text-to-speech audio.
input required	string The text to generate audio for, up to a maximum of 2,000 characters.
voice required	string The voice that will be used in the generated audio.

Responses

Request samples

Payload

Content type

application/json

{"model": "string",
"input": "string",
"voice": "string"
}

List Audio Voices

Get a list of voices for speech generation.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"bark": ["string"
],
"xtts": ["string"
]
}

Vector Store

List Collections

Retrieve a list of vector store collections.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"collections": [{"id": "string",
"name": "string",
"created": "string"
}
]
}

Create Collection

Creates a vector store collection for searchable embeddings.

Authorizations:

API Key

Request Body schema: application/json

name

required

string

The name of the vector store collection. This is also used to auto-generate a unique ID for the record.

Responses

Request samples

Payload

Content type

application/json

{"name": "string"
}

Response samples

201

Content type

application/json

{"collection": {"id": "string",
"name": "string",
"created": "string"
}
}

Get Collection

Retrieve a vector store collection by the ID.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Responses

Response samples

200

Content type

application/json

{"collection": {"id": "string",
"name": "string",
"created": "string"
}
}

Update Collection

Updates a vector store collection record.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Request Body schema: application/json

name

required

string

The name of the vector store collection. Note: the previously generated unique ID will remain the same.

Responses

Request samples

Payload

Content type

application/json

{"name": "string"
}

Response samples

202

Content type

application/json

{"collection": {"id": "string",
"name": "string",
"created": "string"
}
}

Delete Collection

Deletes a vector store collection record. This will also remove all items in the collection.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Responses

Search Collection

Searches items in a vector store collection for the closest embeddings matches.

Authorizations:

API Key

Request Body schema: application/json

input

required

string

The text query to search against the embeddings items in the vector store collection.

Responses

Request samples

Payload

Content type

application/json

{"input": "string"
}

Response samples

200

Content type

application/json

{"results": [{"id": "string",
"created": "string",
"content": "string"
}
],
"usage": {"prompt_tokens": 0,
"total_tokens": 0
}
}

List Collection Items

Retrieve a list of items within a vector store collections.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Responses

Response samples

200

Content type

application/json

{"items": [{"id": "string",
"created": "string",
"description": "string"
}
]
}

Add Collection Item

Adds an item to a vector store collection.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Request Body schema: application/json

content required	string The text to be converted into embeddings and stored in the vector store collection.
description	string A description of the contents in this collection item record. If omitted, this value will default to a shortened version of the text stored in the collection.
auto_chunk	boolean Indicates whether the system will automatically chunk the content if it exceeds the embeddings model's maximum sequence length. If set to true, the content will be split into 300 token chunks with 20 tokens of overlap before and after each piece.

Responses

Request samples

Payload

Content type

application/json

{"content": "string",
"description": "string",
"auto_chunk": true
}

Response samples

201

Content type

application/json

Example

Add Vector Store Item Response Payload

{"item": {"id": "string",
"created": "string",
"description": "string",
"content": "string"
},
"usage": {"prompt_tokens": 0,
"total_tokens": 0
}
}

Get Collection Item

Retrieve a vector store collection item by the ID.

Authorizations:

API Key

path Parameters

id required	string The ID of the vector store collection.
itemid required	string The ID of the vector store collection item.

Responses

Response samples

200

Content type

application/json

{"item": {"id": "string",
"created": "string",
"description": "string",
"content": "string"
}
}

Update Collection Item

Updates a vector store collection item record.

Authorizations:

API Key

path Parameters

id required	string The ID of the vector store collection.
itemid required	string The ID of the vector store collection item.

Request Body schema: application/json

description

string

A description of the contents in this collection item record.

Responses

Request samples

Payload

Content type

application/json

{"description": "string"
}

Response samples

202

Content type

application/json

{"item": {"id": "string",
"created": "string",
"description": "string",
"content": "string"
}
}

Delete Collection Item

Deletes a vector store collection item record.

Authorizations:

API Key

path Parameters

id required	string The ID of the vector store collection.
itemid required	string The ID of the vector store collection item.

Responses

List Collection Files

Retrieve a list of files within a vector store collections.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Responses

Response samples

200

Content type

application/json

{"files": [{"id": "string",
"filename": "string",
"status": "enqueued",
"error": "string",
"items": 0,
"tokens": 0
}
]
}

Add Collection File

Adds a file to a vector store collection.

Authorizations:

API Key

path Parameters

id

required

string

The ID of the vector store collection.

Request Body schema: multipart/form-data

file	string <binary> The file object to be uploaded to the vector store collection.

Responses

Response samples

201

Content type

application/json

{"file": {"id": "string",
"filename": "string",
"status": "enqueued",
"error": "string",
"items": 0,
"tokens": 0
}
}

Get Collection File

Retrieve a vector store collection file by the ID.

Authorizations:

API Key

path Parameters

id required	string The ID of the vector store collection.
fileid required	string The ID of the vector store collection file.

Responses

Response samples

200

Content type

application/json

{"file": {"id": "string",
"filename": "string",
"status": "enqueued",
"error": "string",
"items": 0,
"tokens": 0
}
}

Delete Collection File

Deletes a vector store collection file record.

Authorizations:

API Key

path Parameters

id required	string The ID of the vector store collection.
fileid required	string The ID of the vector store collection file.

Responses

Images

Create Image

Creates an image given a prompt.

Authorizations:

API Key

Request Body schema: application/json

prompt required	string A text description of the desired image(s). The maximum length is 2000 characters.
model	string Default: "flux.1-dev" The model to use for image generation. Defaults to `flux.1-dev`.
n	integer [ 1 .. 10 ] Default: 1 The number of images to generate. Must be between 1 and 10.
response_format	string Default: "url" The format in which generated images are returned. Must be one of `url` or `b64_json`.
size	string Default: "1024x1024" The size of the generated images. Must be one of `256x256`, `512x512`, `1024x1024` (default value), `1792x1024`, and `1024x1792`.

Responses

Request samples

Payload

Content type

application/json

{"prompt": "string",
"model": "flux.1-dev",
"n": 1,
"response_format": "url",
"size": "1024x1024"
}

Response samples

201

Content type

application/json

{"created": 0,
"data": [{"b64_json": "string",
"url": "string"
}
]
}

Models

List Models

Retrieve a list of inference models.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"object": "string",
"data": [{"id": "string",
"created": "string",
"object": "string",
"owned_by": "string",
"features": ["string"
]
}
]
}

Get Model

Retrieves a specific inference model.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"id": "string",
"created": "string",
"object": "string",
"owned_by": "string",
"features": ["string"
]
}

List Audio Models

Retrieve a list of models for speech and audio inference.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"speech": [{"id": "string",
"created": "string",
"price": 0.1
}
]
}

List Image Models

Retrieve a list of models for image gneration.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"models": [{"id": "string",
"created": 0,
"price": 0.1
}
]
}

Usage

Get Usage

View usage information for the current and previous months.

Authorizations:

API Key

Responses

Response samples

200

Content type

application/json

{"current_month": {"chat": 0,
"tts": 0,
"tts_sm": 0,
"image": 0.1,
"image_sm": 0.1
},
"previous_month": {"chat": 0,
"tts": 0,
"tts_sm": 0,
"image": 0.1,
"image_sm": 0.1
}
}

Request Logs

Query Requests

Look up requests sent to the Vultr Inference API along with their response details.

Authorizations:

API Key

Request Body schema: application/json

period required	integer Enum: 15 30 45 60 The number of minutes to search back from the chosen timestamp, up to the previous hour.
timestamp	string The UTC timestamp to search request logs from in ISO 8601 format, e.g. `2024-01-01T12:00:00Z`. Omit to default to the current UTC time.
endpoint	string The name of the endpoint to narrow your request search.

Responses

Request samples

Payload

Content type

application/json

{"period": 15,
"timestamp": "string",
"endpoint": "string"
}

Response samples

200

Content type

application/json

{"requests": [{"timestamp": "string",
"method": "string",
"endpoint": "string",
"request_headers": "string",
"request_body": "string",
"response_body": "string",
"response_code": 0
}
]
}

Health Check

Get Cluster Status

View the current status of the inference cluster.

Authorizations:

API Key

Responses

Response samples

200
503

Content type

application/json

{"status": "string"
}