Skip to main content

DeepInfra

https://deepinfra.com/

tip

We support ALL DeepInfra models, just set model=deepinfra/<any-model-on-deepinfra> as a prefix when sending litellm requests

Table of Contentsโ€‹

API Keyโ€‹

# env variable
os.environ['DEEPINFRA_API_KEY']

Sample Usageโ€‹

from litellm import completion
import os

os.environ['DEEPINFRA_API_KEY'] = ""
response = completion(
model="deepinfra/meta-llama/Llama-2-70b-chat-hf",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

Sample Usage - Streamingโ€‹

from litellm import completion
import os

os.environ['DEEPINFRA_API_KEY'] = ""
response = completion(
model="deepinfra/meta-llama/Llama-2-70b-chat-hf",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}],
stream=True
)

for chunk in response:
print(chunk)

Chat Modelsโ€‹

Model NameFunction Call
meta-llama/Meta-Llama-3-8B-Instructcompletion(model="deepinfra/meta-llama/Meta-Llama-3-8B-Instruct", messages)
meta-llama/Meta-Llama-3-70B-Instructcompletion(model="deepinfra/meta-llama/Meta-Llama-3-70B-Instruct", messages)
meta-llama/Llama-2-70b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-70b-chat-hf", messages)
meta-llama/Llama-2-7b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-7b-chat-hf", messages)
meta-llama/Llama-2-13b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-13b-chat-hf", messages)
codellama/CodeLlama-34b-Instruct-hfcompletion(model="deepinfra/codellama/CodeLlama-34b-Instruct-hf", messages)
mistralai/Mistral-7B-Instruct-v0.1completion(model="deepinfra/mistralai/Mistral-7B-Instruct-v0.1", messages)
jondurbin/airoboros-l2-70b-gpt4-1.4.1completion(model="deepinfra/jondurbin/airoboros-l2-70b-gpt4-1.4.1", messages)

Rerank Endpointโ€‹

LiteLLM provides a Cohere API compatible /rerank endpoint for DeepInfra rerank models.

Supported Rerank Modelsโ€‹

Model NameDescription
deepinfra/Qwen/Qwen3-Reranker-0.6BLightweight rerank model (0.6B parameters)
deepinfra/Qwen/Qwen3-Reranker-4BMedium rerank model (4B parameters)
deepinfra/Qwen/Qwen3-Reranker-8BLarge rerank model (8B parameters)

Usage - LiteLLM Python SDKโ€‹

from litellm import rerank
import os

os.environ["DEEPINFRA_API_KEY"] = "your-api-key"

response = rerank(
model="deepinfra/Qwen/Qwen3-Reranker-0.6B",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
]
)
print(response)

Supported Cohere Rerank API Paramsโ€‹

ParamTypeDescription
querystrThe query to rerank the documents against
documentslist[str]The documents to rerank

Provider-specific parametersโ€‹

Pass any deepinfra specific parameters as a keyword argument to the rerank function, e.g.

response = rerank(
model="deepinfra/Qwen/Qwen3-Reranker-0.6B",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
],
my_custom_param="my_custom_value", # any other deepinfra specific parameters
)

Response Formatโ€‹

{
"id": "request-id",
"results": [
{
"index": 0,
"relevance_score": 0.9975274205207825
},
{
"index": 1,
"relevance_score": 0.011687257327139378
}
],
"meta": {
"billed_units": {
"total_tokens": 427
},
"tokens": {
"input_tokens": 427,
"output_tokens": 0
}
}
}