agent-model-selection

📁 oocx/tfplan2md 📅 1 day ago
1
总安装量
1
周安装量
#77818
全站排名
安装命令
npx skills add https://github.com/oocx/tfplan2md --skill agent-model-selection

Agent 安装分布

amp 1
cline 1
opencode 1
cursor 1
continue 1
kimi-cli 1

Skill 文档

Agent Model Selection Skill

Purpose

Provides data-driven guidance for selecting the most appropriate language model when creating or modifying agent definitions.

When to Use

  • When creating a new agent and need to assign a model
  • When modifying an existing agent’s model assignment
  • When troubleshooting agent performance issues related to model capabilities
  • When optimizing costs across the agent ecosystem

Reference Data

Always consult docs/ai-model-reference.md for:

  • Current performance benchmarks by category (Coding, Reasoning, Language, Instruction Following, etc.)
  • Model availability in GitHub Copilot Pro
  • Premium request multipliers (cost)
  • Recommended model assignments by agent type
  • Task-based guidance and tutorials (via external links with descriptions)

This reference is updated periodically with latest benchmark data.

Critical Learnings

  1. Use task-specific benchmarks, not overall scores

    • Different models excel at different tasks
    • Example: GPT-5.2-Codex excels in Coding while Claude Sonnet 4.5 is better for Language (76.00)
  2. Claude Sonnet 4.5 has poor Instruction Following (score: 23.52)

    • Unsuitable for agents that follow templates (Task Planner, Quality Engineer)
    • Use Gemini models instead for structured output (scores: 65-75)
  3. Gemini 3 Flash offers best value for many tasks

    • 0.33x premium multiplier (cost-effective)
    • Strong Instruction Following (74.86)
    • Good Language performance (84.56)
    • Ideal for: Task Planner, Release Manager, high-frequency agents
  4. GPT-5.2-Codex is the latest coding model

    • Latest generation Codex model (improved over 5.1 Codex Max)
    • Specialized for agentic coding tasks
    • Primary choice for Developer agent
    • Also solid for Code Reviewer
  5. Always verify model availability

    • Check against official GitHub Copilot documentation
    • Model names must match exactly (case-sensitive)
    • Include “(Preview)” suffix for preview models (e.g., “Gemini 3 Pro (Preview)”)

Model Selection Process

When selecting or changing a model:

  1. Identify the agent’s primary task categories (from ai-model-reference.md)

    • Coding, Reasoning, Language, Instruction Following, etc.
  2. Check category-specific performance

    • Look up relevant benchmarks in ai-model-reference.md
    • Compare top 3-5 performers in that category
  3. Consider cost vs frequency

    • High-frequency agents → favor lower multipliers (0.33x, 0x)
    • Critical accuracy agents → favor best performer regardless of cost
  4. Verify availability

    • Confirm model is listed in “Available Models” section
    • Check it’s available for VS Code (required)
  5. Document your reasoning

    • Include benchmark scores in proposal
    • Explain trade-offs made

Example Model Selection

Scenario: Selecting model for Quality Engineer agent

  1. Primary tasks: Define test plans following specific template format
  2. Key categories: Instruction Following (critical), Reasoning (important)
  3. Benchmark lookup (from ai-model-reference.md):
    • Gemini 3 Flash: Instruction Following 74.86, 0.33x cost ✅
    • Gemini 3 Pro: Instruction Following 65.85, 1x cost ✅
    • Claude Sonnet 4.5: Instruction Following 23.52 ❌ (disqualified)
  4. Decision: Gemini 3 Pro (balance of performance and cost)
  5. Rationale: Strong instruction following (65.85), reasonable cost (1x), good for template-based work

When to Update Model Assignments

Reassess models when:

  • New benchmark data shows significant performance changes
  • Agent is underperforming its tasks consistently
  • New models are released with better performance
  • Cost optimization is needed
  • ai-model-reference.md is updated with new data

Key Principles

  • Task-specific benchmarks matter more than overall scores
  • Balance cost with performance based on agent frequency and criticality
  • Always verify availability against official documentation
  • Document your rationale for model selection decisions