backpackBatch Processing API

What's Batch Processing

Batch processing is a powerful approach for handling large volumes of requests efficiently. Instead of processing requests one at a time with immediate responses, batch processing allows you to submit multiple requests together for asynchronous processing. This pattern is particularly useful when:

  • You need to process large volumes of data

  • Immediate responses are not required

  • You want to optimize for cost efficiency

  • You're running large-scale evaluations or analyses

Batch processing (batching) allows you to send multiple message requests in a single batch and retrieve the results later (within up to 24 hour). The main goals are to reduce costs by up to 50% and increase throughput for analytical or offline workloads.

How to use the Batches API

A Batch is composed of a list of requests. The shape of an individual request is comprised of:

  • A unique custom_id for identifying the Messages request

  • A params object with the standard Messages API parameters

You can create a batch by passing this list into the requests parameter:

Create a message batch

Create a batch of messages for asynchronous processing. All usage is charged at 50% of the standard API prices.

import requests
import json

headers = {
    "Authorization": "Bearer <<API_KEY>>",
    "Content-Type": "application/json"
}

data = {
  "requests": [
    {
      "custom_id": "my-request-01",
      "params": {
        "model": "gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn nestjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": null,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    },
    {
      "custom_id": "my-request-02",
      "params": {
        "model": "gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn Reactjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": null,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    },
    {
      "custom_id": "my-request-03",
      "params": {
        "model": "gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn Nextjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": null,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    }
  ]
}

response = requests.post("https://llm.onerouter.pro/v1/batches", headers=headers, data=json.dumps(data))

data = response.json()
print("Batch created:", json.dumps(data, indent=2, ensure_ascii=False))

In this example, three separate requests are batched together for asynchronous processing. Each request has a unique custom_id and contains the standard parameters you'd use for a Messages API call.

Get status or results of a specific message batch

Get batch status if in progress, or stream results if completed in JSONL format.

Cancel a specific batch

You can cancel a Batch that is currently processing using the cancel endpoint. Immediately after cancellation, a batch's processing_status will be canceling. Canceled batches end up with a status of ended and may contain partial results for requests that were processed before cancellation.

Last updated