Infron AI routes requests to the best available providers for your model.
You can customize how your requests are routed using the provider object in the request body for LLM and generative Model.
The provider object can contain the following fields:
Field
Type
Default
Description
allow_fallbacks
boolean
true
Whether to allow backup providers when the primary is unavailable.
sort
string
-
Sort providers by price or throughput. (e.g. "price" or "throughput").
Uptime-Based Load Balancing (Default Strategy)
By default, requests are load balanced across the top providers to maximize uptime.
from openai import OpenAIclient =OpenAI(base_url="https://llm.onerouter.pro/v1",api_key="<API_KEY>",)completion = client.chat.completions.create(model="google-ai-studio/gemini-2.5-flash-preview-09-2025",messages=[{"role":"user","content":"What is the meaning of life?"}],'provider':{'sort':'price'})print(completion.choices[0].message.content)
Price-Based Load Balancing
For each model in your request, Infron AI can load balance requests across providers, prioritizing price.
If you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.
Here is Infron AI's default load balancing strategy:
Prioritize providers that have not seen significant outages in the last 30 seconds.
For the stable providers, look at the lowest-cost candidates and select one weighted by inverse square of the price (example below).
Use the remaining providers as fallbacks.
A Load Balancing Example
If Provider A costs $1 per million tokens, Provider B costs $2, and Provider C costs $3, and Provider B recently saw a few outages.
Your request is routed to Provider A. Provider A is 9x more likely to be first routed to Provider A than Provider C.
If Provider A fails, then Provider C will be tried next.
If Provider C also fails, Provider B will be tried last.
If you have sort set in your provider preferences, load balancing will be disabled.
To always prioritize low prices, and not apply any load balancing, set sort to "price".
Throughput-Based Load Balancing
To always prioritize low latency, and not apply any load balancing, set sort to "throughput".
Disabling Fallbacks
To guarantee that your request is only served by specific provider, you can disable fallbacks.
Provider prioritize & Auto Fallback
You can also force-designate a specific model provider, requiring OneRouter to route all requests to your chosen provider. But don't worry - OneRouter will still automatically fall back to alternative providers if it detects any issues with your primary source.
When provider fallback kicks in, some of your prompt caches might become invalid.