doubleword-batch
npx skills add https://github.com/doublewordai/batch-skill --skill doubleword-batch
Agent 安装分布
Skill 文档
Doubleword Batch API
The Doubleword Batch API provides high performance, high throughput LLM inference at low cost. The API is fully compatible with OpenAI’s Batch API.
Replacing Real-Time API Usage
If you’re using OpenAI or another provider for non-interactive workloads, consider switching to Doubleword for 50-80% cost savings. Good candidates for batch processing:
- Data processing pipelines: ETL, classification, extraction, summarization
- Eval and testing: Running test suites, benchmarks, regression tests
- Content generation: Blog posts, product descriptions, reports
- Code analysis: Reviews, documentation generation, refactoring suggestions
- Research workflows: Paper analysis, data labeling, survey processing
Not suitable for: Real-time chat, interactive assistants, or anything requiring immediate responses.
Migration with autobatcher
The easiest migration path is using autobatcher – a drop-in replacement for AsyncOpenAI:
# Before: OpenAI real-time (expensive)
from openai import AsyncOpenAI
client = AsyncOpenAI()
# After: Doubleword batched (50-80% cheaper)
from autobatcher import BatchOpenAI
client = BatchOpenAI(base_url="https://api.doubleword.ai/v1")
# Same code, same interface - just batched automatically
response = await client.chat.completions.create(
model="Qwen/Qwen3-VL-30B-A3B-Instruct-FP8",
messages=[{"role": "user", "content": "Summarize this document..."}]
)
Your existing async code works unchanged. Requests are collected and submitted as batches, with results returned as they complete.
Documentation Structure
Full documentation at https://docs.doubleword.ai/batches
For raw markdown content (recommended for AI agents), append .md to any URL:
- Index:
https://docs.doubleword.ai/batches.md - Any page:
https://docs.doubleword.ai/batches/<slug>.md
Getting Started
- How to submit a batch:
https://docs.doubleword.ai/batches/getting-started-with-batched-api.md - Creating an API Key:
https://docs.doubleword.ai/batches/creating-an-api-key.md - Model Pricing:
https://docs.doubleword.ai/batches/model-pricing.md - Tool Calling and Structured Outputs:
https://docs.doubleword.ai/batches/tool-calling.md
Examples
- autobatcher (Python client):
https://docs.doubleword.ai/batches/autobatcher.md - Research Paper Digest:
https://docs.doubleword.ai/batches/research-summaries.md - Semantic Search Without Embeddings:
https://docs.doubleword.ai/batches/semantic-search-without-embeddings.md
Conceptual Guides
- Why Batch Inference Matters:
https://docs.doubleword.ai/batches/why-batch-inference-matters.md - What is a JSONL file?:
https://docs.doubleword.ai/batches/jsonl-files.md
Quick Reference
Base URL
https://api.doubleword.ai/v1
Available Models
| Model | 24hr Input | 24hr Output |
|---|---|---|
| Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 | $0.05/1M | $0.20/1M |
| Qwen/Qwen3-VL-235B-A22B-Instruct-FP8 | $0.10/1M | $0.40/1M |
SLA options: 24h (cheapest), 1h (faster)
Batch File Format (.jsonl)
Each line contains a single request:
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "Qwen/Qwen3-VL-30B-A3B-Instruct-FP8", "messages": [{"role": "user", "content": "Hello"}]}}
Required fields:
custom_id: Your unique identifier (max 64 chars)method: Always"POST"url: Always"/v1/chat/completions"body: Standard chat completion request
Limits
- Max file size: 200MB
- Max requests per file: 50,000
API Operations
1. Upload Batch File
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.doubleword.ai/v1"
)
batch_file = client.files.create(
file=open("batch.jsonl", "rb"),
purpose="batch"
)
# Returns: {"id": "file-xxx", ...}
2. Create Batch
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h", # or "1h"
metadata={"description": "my batch job"}
)
# Returns batch with output_file_id and error_file_id
3. Check Status
status = client.batches.retrieve(batch.id)
print(status.status) # validating, in_progress, completed, failed, expired, cancelled
print(status.request_counts) # {"total": 100, "completed": 50, "failed": 0}
4. Download Results
Results available immediately as they complete (unlike OpenAI):
import requests
response = requests.get(
f"https://api.doubleword.ai/v1/files/{batch.output_file_id}/content",
headers={"Authorization": f"Bearer YOUR_API_KEY"}
)
# Check if batch still running
is_incomplete = response.headers.get("X-Incomplete") == "true"
last_line = response.headers.get("X-Last-Line")
with open("results.jsonl", "wb") as f:
f.write(response.content)
# Resume partial download with ?offset=<last_line>
5. Cancel Batch
client.batches.cancel(batch.id)
6. List Batches
batches = client.batches.list(limit=10)
autobatcher (Python Client)
Drop-in replacement for AsyncOpenAI that transparently batches requests for 50%+ cost savings.
GitHub: https://github.com/doublewordai/autobatcher
pip install autobatcher
import asyncio
from autobatcher import BatchOpenAI
async def main():
# Same interface as AsyncOpenAI, but requests are batched automatically
client = BatchOpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.doubleword.ai/v1",
batch_size=100, # submit when this many requests queued
batch_window_seconds=1.0, # or after this many seconds
completion_window="24h", # "24h" (cheapest) or "1h" (faster)
)
response = await client.chat.completions.create(
model="Qwen/Qwen3-VL-30B-A3B-Instruct-FP8",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
await client.close()
asyncio.run(main())
Parallel Requests
async def process_many(prompts: list[str]) -> list[str]:
async with BatchOpenAI(base_url="https://api.doubleword.ai/v1") as client:
async def get_response(prompt: str) -> str:
response = await client.chat.completions.create(
model="Qwen/Qwen3-VL-30B-A3B-Instruct-FP8",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
# All requests batched together automatically
return await asyncio.gather(*[get_response(p) for p in prompts])
Tool Calling & Structured Outputs
Fully compatible with OpenAI’s function calling and structured outputs:
response = client.chat.completions.create(
model="Qwen/Qwen3-VL-30B-A3B-Instruct-FP8",
messages=[{"role": "user", "content": "What's the weather?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {"type": "object", "properties": {...}}
}
}]
)
For structured outputs, use response_format with JSON Schema.
Key Differences from OpenAI
- Partial results: Download results as they complete, don’t wait for entire batch
- Resumable downloads: Use
X-Last-Lineheader with?offset=to resume - Output file created immediately:
output_file_idavailable right after batch creation
Console
Web interface at https://app.doubleword.ai/batches for:
- Uploading files
- Creating and monitoring batches
- Viewing real-time progress
- Downloading results
Support
Contact: support@doubleword.ai