Provider Routing & Selection

Route requests to the best provider

Infron AI routes requests to the best available providers for your model.

You can customize how your requests are routed using the provider object in the request body for LLM and generative Model.

The provider object can contain the following fields:

Field

Type

Default

Description

allow_fallbacks

boolean

true

Whether to allow backup providers when the primary is unavailable.

sort

string

Sort providers by price or throughput. (e.g. "price" or "throughput").

Uptime-Based Load Balancing (Default Strategy)

By default, requests are load balanced across the top providers to maximize uptime.

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="google-ai-studio/gemini-2.5-flash-preview-09-2025",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'sort': 'price'
  }
)

print(completion.choices[0].message.content)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
    'provider': {
        'sort': 'price'
    }
  });

  console.log(completion.choices[0].message);
}

main();

Price-Based Load Balancing

For each model in your request, Infron AI can load balance requests across providers, prioritizing price.

If you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.

Here is Infron AI's default load balancing strategy:

Prioritize providers that have not seen significant outages in the last 30 seconds.
For the stable providers, look at the lowest-cost candidates and select one weighted by inverse square of the price (example below).
Use the remaining providers as fallbacks.

A Load Balancing Example

If Provider A costs $1 per million tokens, Provider B costs $2, and Provider C costs $3, and Provider B recently saw a few outages.

Your request is routed to Provider A. Provider A is 9x more likely to be first routed to Provider A than Provider C.
If Provider A fails, then Provider C will be tried next.
If Provider C also fails, Provider B will be tried last.

If you have sort set in your provider preferences, load balancing will be disabled.

To always prioritize low prices, and not apply any load balancing, set sort to "price".

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="google-ai-studio/gemini-2.5-flash-preview-09-2025",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'sort': 'price'
  }
)

print(completion.choices[0].message.content)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
    'provider': {
        'sort': 'price'
    }
  });

  console.log(completion.choices[0].message);
}

main();

Throughput-Based Load Balancing

To always prioritize low latency, and not apply any load balancing, set sort to "throughput".

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="google-ai-studio/gemini-2.5-flash-preview-09-2025",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'sort': 'throughput'
  }
)

print(completion.choices[0].message.content)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
    'provider': {
        'sort': 'throughput'
    }
  });

  console.log(completion.choices[0].message);
}

main();

Disabling Fallbacks

To guarantee that your request is only served by specific provider, you can disable fallbacks.

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="google-ai-studio/gemini-2.5-flash",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'allow_fallbacks': false
  }
)

print(completion.choices[0].message.content)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
    'provider': {
        'allow_fallbacks': false
    }
  });

  console.log(completion.choices[0].message);
}

main();

Provider prioritize & Auto Fallback

You can also force-designate a specific model provider, requiring OneRouter to route all requests to your chosen provider. But don't worry - OneRouter will still automatically fall back to alternative providers if it detects any issues with your primary source.

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="google-ai-studio/gemini-2.5-flash",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'allow_fallbacks': True
  }
)

print(completion.choices[0].message.content)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'google-ai-studio/gemini-2.5-flash',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
    'provider': {
        'allow_fallbacks': true
    }
  });

  console.log(completion.choices[0].message);
}

main();

When provider fallback kicks in, some of your prompt caches might become invalid.

Last updated 23 days ago

hashtagUptime-Based Load Balancing (Default Strategy)

hashtagPrice-Based Load Balancing

hashtagThroughput-Based Load Balancing

hashtagDisabling Fallbacks

hashtagProvider prioritize & Auto Fallback

Uptime-Based Load Balancing (Default Strategy)

Price-Based Load Balancing

Throughput-Based Load Balancing

Disabling Fallbacks

Provider prioritize & Auto Fallback