Infuzu Documentation Help

Request Parameters

This page provides a detailed description of the various parameters that can be included in your Infuzu API request. Understanding these parameters will help you customize your queries and make the most out of the Infuzu API's capabilities.

Basic Parameters

messages (Required)

The core of any Infuzu API request is the messages array. This parameter holds the conversation context, allowing the model to generate appropriate responses based on the provided history.

Each message object in the array should have the following structure:

  • role: Specifies the sender's role. Must be one of: "system", "user" or "assistant".

  • content (optional): The text content of the message.

  • content_parts (optional): Used for more advanced configurations like image URLs or input audio, specified as a list.

Message Object Example:

{ "role": "user", "content": "Hello, how are you?" }

model (Optional)

Specifies the model to be used for generating the completion. It can be:

  • A simple string indicating the model name (e.g., "deepinfra.deepseek-r1").

  • An object containing detailed settings and constraints.

Simple Model Example:

{ "model": "openai.o3-mini-2025-01-31" }

Model Configuration

Advanced configurations allow you to fine-tune which model(s) to use and include additional weighting factors. This can be done using the InfuzuModelParams object for the model parameter.

InfuzuModelParams

Parameter

Type

Description

llms

list of strings

Array of model names to specifically include

exclude_llms

list of strings

Array of model names to explicitly exclude

weights

ModelWeights

Object containing custom weights for price, error, start, and end latency

imsn

int

Number of model suggestions to consider

max_input_cost

float

Maximum allowed cost per million character for input

max_output_cost

float

Maximum allowed cost per million character for output

ModelWeights

Used to set custom weights for various parameters affecting model selection.

Parameter

Type

Description

price

float (0-1000)

Custom weight for price

error

float (0-1000)

Custom weight for error rates

start_latency

float (0-1000)

Custom weight for start latency

end_latency

float (0-1000)

Custom weight for end latency

Example of InfuzuModelParams:

{ "llms": ["openai.o1-2024-12-17", "openai.o1-preview-2024-09-12"], "exclude_llms": ["openai.o1-mini-2024-09-12"], "weights": { "price": 50, "error": 30, "start_latency": 10, "end_latency": 10 }, "imsn": 2, "max_input_cost": 5, "max_output_cost": 10 }

Advanced Configuration using InfuzuModelParams

Using the InfuzuModelParams object allows for in-depth customization of how the Infuzu API selects the model for your requests. Below descriptions detail each field and its usage.

llms

Specify a list of models to prefer in the selection process.

{ "llms": ["anthropic.claude-3-7-sonnet-20250219", "google.gemini-2.0-flash-001"] }

exclude_llms

Specify models that should be excluded from consideration.

{ "exclude_llms": ["xai.grok-2-1212"] }

weights

Customize the importance of factors such as price, error rates, and latency. Quality determined by IMS is weighted by default at 100. You can set the weights of all other considerations to your relative preference.

{ "weights": { "price": 70, "error": 20, "start_latency": 5, "end_latency": 5 } }

imsn

The number of model suggestions to consider from the Intelligent Model Selection (IMS).

{ "imsn": 3 }

max_input_cost & max_output_cost

Set cost thresholds per million input and output characters.

{ "max_input_cost": 2, "max_output_cost": 5 }

Examples

Text Completion

Here is a basic example of a text completion request using cURL and the Infuzu Python library.

Using cURL:

curl -X POST https://chat.infuzu.com/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Infuzu-API-Key: your_api_key_here" \ -d '{ "messages": [ { "role": "user", "content": "Hello, how are you?" } ], }'

Using Python:

import os from infuzu import create_chat_completion, ChatCompletionsHandlerRequestMessage api_key = os.getenv("INFUZU_API_KEY") if not api_key: raise ValueError("API key required.") messages = [ ChatCompletionsHandlerRequestMessage( role="user", content="Hello, how are you?" ) ] response = create_chat_completion( messages=messages, api_key=api_key, ) print("Response:", response.choices[0].message.content)

Custom Model Selection

Using InfuzuModelParams to customize model selection based on preference and cost constraints.

Using cURL:

curl -X POST https://chat.infuzu.com/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Infuzu-API-Key: your_api_key_here" \ -d '{ "messages": [ { "role": "user", "content": "Can you give me a summary of the recent news?" } ], "model": { "llms": ["modelA", "modelB"], "exclude_llms": ["modelC"], "weights": { "price": 0.5, "error": 0.3, "start_latency": 0.1, "end_latency": 0.1 }, "imsn": 2, "max_input_cost": 0.05, "max_output_cost": 0.10 } }'

Using Python:

import os from infuzu import create_chat_completion, ChatCompletionsHandlerRequestMessage, InfuzuModelParams, ModelWeights api_key = os.getenv("INFUZU_API_KEY") if not api_key: raise ValueError("API key required.") messages = [ ChatCompletionsHandlerRequestMessage( role="user", content="Can you give me a summary of the recent news?" ) ] model_params = InfuzuModelParams( llms=["modelA", "modelB"], exclude_llms=["modelC"], weights=ModelWeights( price=0.5, error=0.3, start_latency=0.1, end_latency=0.1 ), imsn=2, max_input_cost=0.05, max_output_cost=0.10 ) response = create_chat_completion( messages=messages, api_key=api_key, model=model_params ) print("Response:", response.choices[0].message.content)

Advanced Weighting

Custom weighting factors for price, error rate, and latency.

Using cURL:

curl -X POST https://chat.infuzu.com/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Infuzu-API-Key: your_api_key_here" \ -d '{ "messages": [ { "role": "user", "content": "Explain the implications of quantum computing." } ], "model": { "weights": { "price": 0.4, "error": 0.4, "start_latency": 0.1, "end_latency": 0.1 }, "imsn": 3 } }'

Using Python:

import os from infuzu import create_chat_completion, ChatCompletionsHandlerRequestMessage, InfuzuModelParams, ModelWeights api_key = os.getenv("INFUZU_API_KEY") if not api_key: raise ValueError("API key required.") messages = [ ChatCompletionsHandlerRequestMessage( role="user", content="Explain the implications of quantum computing." ) ] model_params = InfuzuModelParams( weights=ModelWeights( price=0.4, error=0.4, start_latency=0.1, end_latency=0.1 ), imsn=3, ) response = create_chat_completion( messages=messages, api_key=api_key, model=model_params ) print("Response:", response.choices[0].message.content)

With these parameters and examples, you can tailor your API requests to your specific needs, optimizing for cost, performance, and model preferences. By understanding and using these parameters effectively, you'll be able to fully leverage the capabilities of the Infuzu API.

Last modified: 25 February 2025