Reasoning & Thinking

For models that support it, the Infron API can return Reasoning Tokens, also known as Thinking Tokens. Infron normalizes the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified reasoning & thinking interface across different providers.

Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.

Infron provides a unified parameter wrapper for Reasoning & Thinking.

  • If the "reasoning" field is not included in the request, Infron will keep this parameter in an "unset" state, that is, follow the default value of the provider origin site.

  • If the "reasoning" field is included in the request, Infron will uniformly perform parameter conversion to adapt to the Reasoning & Thinking parameter formats of different providers.

Reasoning tokens are included in the response by default if the model decides to output them. Reasoning tokens will appear in the reasoning field of each message.

Controlling Reasoning Tokens in OpenAI Chat Completions

You can control reasoning tokens in your requests using the reasoning parameter:

{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    // One of the following (not both):
    "effort": "high", // Can be "xhigh", "high", "medium", "low", "minimal" or "none"
    "max_tokens": 2000, // Specific token limit
  }
}

The reasoning config object consolidates settings for controlling reasoning strength across different models.

The effort can be one of below list:

  • "effort": "xhigh" - Allocates the largest portion of tokens for reasoning (approximately 95% of max_tokens)

  • "effort": "high" - Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)

  • "effort": "medium" - Allocates a moderate portion of tokens (approximately 50% of max_tokens)

  • "effort": "low" - Allocates a smaller portion of tokens (approximately 20% of max_tokens)

  • "effort": "minimal" - Allocates an even smaller portion of tokens (approximately 10% of max_tokens)

  • "effort": "none" - Disables reasoning entirely

For models that only support reasoning.max_tokens, the effort level will be set based on the percentages above.

Examples

Basic Usage with Reasoning Tokens

Using Max Tokens for Reasoning

You can specify the exact number of tokens to use for reasoning:

Disables reasoning entirely

Streaming mode with reasoning tokens

Responses API Shape

When reasoning models generate responses, the reasoning information is structured in a standardized format through the reasoning_content item.

Last updated