cost-aware-llm-pipeline
3
总安装量
3
周安装量
#56151
全站排名
安装命令
npx skills add https://github.com/shimo4228/claude-code-learned-skills --skill cost-aware-llm-pipeline
Agent 安装分布
replit
3
openclaw
3
mcpjam
2
claude-code
2
windsurf
2
zencoder
2
Skill 文档
Cost-Aware LLM Pipeline
ã³ã¹ãæé©åLLMãã¤ãã©ã¤ã³
Extracted / æ½åºæ¥: 2026-02-08 Context / ã³ã³ããã¹ã: LLMã使ãã¢ããªã§ãã³ã¹ãå¶å¾¡ããªããå質ãç¶æãããã¿ã¼ã³
Problem / 課é¡
LLM APIã¯é«ã³ã¹ããå ¨ãªã¯ã¨ã¹ãã«æé«æ§è½ã¢ãã«ã使ãã¨äºç®è¶ éããã ãªãã©ã¤ããã£ãã·ã¥ã®ä»çµã¿ããªãã¨ç¡é§ãªã³ã¹ããçºçããã
- åç´ãªã¿ã¹ã¯ã«ãé«ä¾¡ãªã¢ãã«ã使ã£ã¦ãã¾ã
- 䏿çãªã¨ã©ã¼ã§ãªãã©ã¤ãã失æãã
- åãã·ã¹ãã ããã³ãããæ¯åéä¿¡ããã¼ã¯ã³ã浪費ãã
- äºç®è¶ éã«æ°ã¥ããªã
Solution / 解決ç
4ã¤ã®è¦ç´ ãçµã¿åãããï¼
1. Model Routingï¼ã¢ãã«èªå鏿ï¼
ã¿ã¹ã¯ã®è¤é度ã«åºã¥ãã¦ã¢ãã«ãèªå鏿ããã
MODEL_SONNET = "claude-sonnet-4-5-20250929"
MODEL_HAIKU = "claude-haiku-4-5-20251001"
_SONNET_TEXT_THRESHOLD = 10_000 # chars
_SONNET_CARD_THRESHOLD = 30 # items
def select_model(
text_length: int,
item_count: int,
force_model: str | None = None,
) -> str:
"""Automatically select model based on task complexity."""
if force_model is not None:
return force_model
if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_CARD_THRESHOLD:
return MODEL_SONNET # Complex task
return MODEL_HAIKU # Simple task (3-4x cheaper)
2. Immutable Cost Trackingï¼ä¸å¤ã³ã¹ã追跡ï¼
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CostRecord:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
@dataclass(frozen=True, slots=True)
class CostTracker:
budget_limit: float = 1.00
records: tuple[CostRecord, ...] = ()
def add(self, record: CostRecord) -> "CostTracker":
"""Return new tracker with added record (never mutates self)."""
return CostTracker(
budget_limit=self.budget_limit,
records=(*self.records, record),
)
@property
def total_cost(self) -> float:
return sum(r.cost_usd for r in self.records)
@property
def over_budget(self) -> bool:
return self.total_cost > self.budget_limit
3. Narrow Retry Logicï¼éå®çãªãã©ã¤ï¼
from anthropic import (
APIConnectionError,
InternalServerError,
RateLimitError,
)
_RETRYABLE_ERRORS = (APIConnectionError, RateLimitError, InternalServerError)
_MAX_RETRIES = 3
def _call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
"""Retry only on transient errors, fail fast on others."""
for attempt in range(max_retries):
try:
return func()
except _RETRYABLE_ERRORS:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
# AuthenticationError, BadRequestError etc. â raise immediately
4. Prompt Cachingï¼ããã³ãããã£ãã·ã¥ï¼
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}, # Cache this
},
{
"type": "text",
"text": user_input, # Variable part
},
],
}
]
Composition / çµã¿åããæ¹
def process(text: str, config: Config, tracker: CostTracker) -> tuple[Result, CostTracker]:
# 1. Route model
model = select_model(len(text), estimated_items, config.force_model)
# 2. Check budget
if tracker.over_budget:
raise BudgetExceededError(tracker.total_cost, tracker.budget_limit)
# 3. Call with retry + caching
response = _call_with_retry(lambda: client.messages.create(
model=model,
messages=build_cached_messages(system_prompt, text),
))
# 4. Track cost (immutable)
record = CostRecord(model=model, input_tokens=..., output_tokens=..., cost_usd=...)
tracker = tracker.add(record)
return parse_result(response), tracker
Pricing Reference (2025-2026) / ä¾¡æ ¼åè
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 |
| Sonnet 4.5 | $3.00 | $15.00 |
| Opus 4.5 | $15.00 | $75.00 |
When to Use / 使ç¨ãã¹ãå ´é¢
- Claude/OpenAI APIã使ãã¢ããªã±ã¼ã·ã§ã³å ¨è¬
- ãããå¦çã§ã³ã¹ã管çãå¿ è¦ãªå ´å
- è¤æ°ã¢ãã«ã使ãåãããå ´å
- é·ãã·ã¹ãã ããã³ãããç¹°ãè¿ãéä¿¡ããå ´å
Related Patterns / é¢é£ãã¿ã¼ã³
python-immutable-accumulator.mdâ CostTrackerã®ä¸å¤èç©ãã¿ã¼ã³immutable-model-updates.mdâ Swiftçã®ä¸å¤æ´æ°ãã¿ã¼ã³