agent-model-selection

📁 oocx/tfplan2md 📅 1 day ago

总安装量

周安装量

#77818

全站排名

安装命令

npx skills add https://github.com/oocx/tfplan2md --skill agent-model-selection

Agent 安装分布

amp 1

cline 1

opencode 1

cursor 1

continue 1

kimi-cli 1

Provides data-driven guidance for selecting the most appropriate language model when creating or modifying agent definitions.

Current performance benchmarks by category (Coding, Reasoning, Language, Instruction Following, etc.)
Model availability in GitHub Copilot Pro
Premium request multipliers (cost)
Recommended model assignments by agent type
Task-based guidance and tutorials (via external links with descriptions)

This reference is updated periodically with latest benchmark data.

Use task-specific benchmarks, not overall scores
- Different models excel at different tasks
- Example: GPT-5.2-Codex excels in Coding while Claude Sonnet 4.5 is better for Language (76.00)
Claude Sonnet 4.5 has poor Instruction Following (score: 23.52)
- Unsuitable for agents that follow templates (Task Planner, Quality Engineer)
- Use Gemini models instead for structured output (scores: 65-75)
Gemini 3 Flash offers best value for many tasks
- 0.33x premium multiplier (cost-effective)
- Strong Instruction Following (74.86)
- Good Language performance (84.56)
- Ideal for: Task Planner, Release Manager, high-frequency agents
GPT-5.2-Codex is the latest coding model
- Latest generation Codex model (improved over 5.1 Codex Max)
- Specialized for agentic coding tasks
- Primary choice for Developer agent
- Also solid for Code Reviewer
Always verify model availability
- Check against official GitHub Copilot documentation
- Model names must match exactly (case-sensitive)
- Include “(Preview)” suffix for preview models (e.g., “Gemini 3 Pro (Preview)”)

When selecting or changing a model:

Identify the agent’s primary task categories (from ai-model-reference.md)
- Coding, Reasoning, Language, Instruction Following, etc.
Check category-specific performance
- Look up relevant benchmarks in ai-model-reference.md
- Compare top 3-5 performers in that category
Consider cost vs frequency
- High-frequency agents â favor lower multipliers (0.33x, 0x)
- Critical accuracy agents â favor best performer regardless of cost
Verify availability
- Confirm model is listed in “Available Models” section
- Check it’s available for VS Code (required)
Document your reasoning
- Include benchmark scores in proposal
- Explain trade-offs made

Scenario: Selecting model for Quality Engineer agent

Primary tasks: Define test plans following specific template format
Key categories: Instruction Following (critical), Reasoning (important)
Benchmark lookup (from ai-model-reference.md):
- Gemini 3 Flash: Instruction Following 74.86, 0.33x cost â
- Gemini 3 Pro: Instruction Following 65.85, 1x cost â
- Claude Sonnet 4.5: Instruction Following 23.52 â (disqualified)
Decision: Gemini 3 Pro (balance of performance and cost)
Rationale: Strong instruction following (65.85), reasonable cost (1x), good for template-based work

Reassess models when: