Usage Accounting
Track AI Model Token Usage and Cost Breakdowns
Infron provides built-in Usage Accounting that allows you to track AI model usage and cost breakdowns. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.
Benefits
Efficiency: Get usage information without making separate API calls
Accuracy: Token counts are calculated using the model's native tokenizer
Transparency: Track costs and cached token usage in real-time
Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens
Usage Information
When enabled, the API will return detailed usage information including:
Prompt and completion token counts using the model's native tokenizer
Cost in credits
Reasoning token counts (if applicable)
Cached token counts (if available)
This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.
Enabling Usage Accounting
You can enable usage accounting in your requests by including the usage parameter:
Response Format
When usage accounting is enabled, the response will include a usage object with detailed token information and a cost item and a cost_details object with detailed costs:
costis the total amount charged to your account balance.cost_detailsis the breakdown of the total cost.usageis the breakdown of the tokens structure.
Enabling usage accounting will add 100~200ms to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.
Examples
Basic Usage with Token Tracking
Streaming with Token Tracking
According to the OpenAI specification, to request token usage information in a streaming response, you would include the following parameters in your request:
This configuration tells the API to:
Use the
google-ai-studio/gemini-2.5-flash-preview-09-2025Stream the response incrementally
Include token usage statistics in the stream response
The stream_options.include_usage parameter specifically requests that token usage information be returned as part of the streaming response.
The response example is below:
The cost & usage is in the last chat.completion.chunk.
Best Practices
Enable usage tracking when you need to monitor token consumption or costs
Account for the slight delay in the final response when usage accounting is enabled
Consider implementing usage tracking in development to optimize token usage before production
Use the cached token information to optimize your application's performance
Last updated