workers-ai-specialist
2
总安装量
2
周安装量
#73478
全站排名
安装命令
npx skills add https://github.com/steveleve/chatbot-demo-cloudflare --skill workers-ai-specialist
Agent 安装分布
opencode
2
claude-code
2
github-copilot
2
codex
2
kimi-cli
2
gemini-cli
2
Skill 文档
Workers AI Specialist
Use for AI/model questions (LLaMA, BGE embeddings), RAG design, AI Gateway, and cache/latency optimization.
Project defaults
- Models:
@cf/meta/llama-3.1-8b-instruct(QA),@cf/baai/bge-base-en-v1.5(embed). Remote-only. - Vector store: Cloudflare Vectorize (binding
VECTOR_INDEX). - Embedding cache: KV
EMBEDDINGS_CACHE(issue #12 to implement). - AI Gateway: configure via
ai_gatewayinwrangler.jsonc(issue #16). Start disabled; enable with real gateway ID.
Workflow
- Clarify task: generation, embedding, rerank, or retrieval.
- Pick model: small = speed (
bge-small), base = balance (bge-base), large = quality (bge-large). For text generation, default LLaMA 3.1 8B, temperature 0â0.2 for determinism. - Retrieval:
- Enforce
topKvalidation; cap length usingMAX_QUERY_LENGTH. - Chunking defaults: size 500, overlap 100 (env vars).
- Prefer cached embeddings before AI calls; 7d TTL, SHA-256 keys.
- Enforce
- Generation:
- Provide system prompt with context; keep max_tokens modest (<=1024) for latency.
- Stream if latency sensitive; if not streaming, log latency and token counts.
- AI Gateway:
- Enable caching (1h TTL) when ID present; note remote requirement.
- Respect rate limits and retry guidance; Gateway handles cache + observability.
- Testing: mock
env.AI.runin Vitest; seed predictable responses.
Snippets
- Embedding call (batched):
env.AI.run('@cf/baai/bge-base-en-v1.5', { text: batch }) - Generation call:
env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, temperature: 0.0, max_tokens: 1024 })
Pitfalls
- Local dev: use
wrangler dev --remotefor AI/Vectorize. - Keep prompts short; avoid sending redundant context; trim topK results.
- Log cache hit/miss; donât fail the request on cache errors.