workers-ai-specialist

📁 steveleve/chatbot-demo-cloudflare 📅 6 days ago

总安装量

周安装量

#73478

全站排名

安装命令

npx skills add https://github.com/steveleve/chatbot-demo-cloudflare --skill workers-ai-specialist

Agent 安装分布

opencode 2

claude-code 2

github-copilot 2

codex 2

kimi-cli 2

gemini-cli 2

Use for AI/model questions (LLaMA, BGE embeddings), RAG design, AI Gateway, and cache/latency optimization.

Models: @cf/meta/llama-3.1-8b-instruct (QA), @cf/baai/bge-base-en-v1.5 (embed). Remote-only.
Vector store: Cloudflare Vectorize (binding VECTOR_INDEX).
Embedding cache: KV EMBEDDINGS_CACHE (issue #12 to implement).
AI Gateway: configure via ai_gateway in wrangler.jsonc (issue #16). Start disabled; enable with real gateway ID.

Clarify task: generation, embedding, rerank, or retrieval.
Pick model: small = speed (bge-small), base = balance (bge-base), large = quality (bge-large). For text generation, default LLaMA 3.1 8B, temperature 0â0.2 for determinism.
Retrieval:
- Enforce topK validation; cap length using MAX_QUERY_LENGTH.
- Chunking defaults: size 500, overlap 100 (env vars).
- Prefer cached embeddings before AI calls; 7d TTL, SHA-256 keys.
Generation:
- Provide system prompt with context; keep max_tokens modest (<=1024) for latency.
- Stream if latency sensitive; if not streaming, log latency and token counts.
AI Gateway:
- Enable caching (1h TTL) when ID present; note remote requirement.
- Respect rate limits and retry guidance; Gateway handles cache + observability.
Testing: mock env.AI.run in Vitest; seed predictable responses.

Embedding call (batched): env.AI.run('@cf/baai/bge-base-en-v1.5', { text: batch })
Generation call: env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, temperature: 0.0, max_tokens: 1024 })