evals-context
3
总安装量
3
周安装量
#59412
全站排名
安装命令
npx skills add https://github.com/roocodeinc/roo-code --skill evals-context
Agent 安装分布
opencode
3
gemini-cli
3
github-copilot
3
codex
3
kimi-cli
3
amp
3
Skill 文档
Evals Codebase Context
When to Use This Skill
Use this skill when the task involves:
- Modifying or debugging the evals execution infrastructure
- Adding new eval exercises or languages
- Working with the evals web interface (apps/web-evals)
- Modifying the public evals display page on roocode.com
- Understanding where evals code lives in this monorepo
When NOT to Use This Skill
Do NOT use this skill when:
- Working on unrelated parts of the codebase (extension, webview-ui, etc.)
- The task is purely about the VS Code extension’s core functionality
- Working on the main website pages that don’t involve evals
Key Disambiguation: Two “Evals” Locations
This monorepo has two distinct evals-related locations that can cause confusion:
| Component | Path | Purpose |
|---|---|---|
| Evals Execution System | packages/evals/ |
Core eval infrastructure: CLI, DB schema, Docker configs |
| Evals Management UI | apps/web-evals/ |
Next.js app for creating/monitoring eval runs (localhost:3446) |
| Website Evals Page | apps/web-roo-code/src/app/evals/ |
Public roocode.com page displaying eval results |
| External Exercises Repo | Roo-Code-Evals | Actual coding exercises (NOT in this monorepo) |
Directory Structure Reference
packages/evals/ – Core Evals Package
packages/evals/
âââ ARCHITECTURE.md # Detailed architecture documentation
âââ ADDING-EVALS.md # Guide for adding new exercises/languages
âââ README.md # Setup and running instructions
âââ docker-compose.yml # Container orchestration
âââ Dockerfile.runner # Runner container definition
âââ Dockerfile.web # Web app container
âââ drizzle.config.ts # Database ORM config
âââ src/
â âââ index.ts # Package exports
â âââ cli/ # CLI commands for running evals
â â âââ runEvals.ts # Orchestrates complete eval runs
â â âââ runTask.ts # Executes individual tasks in containers
â â âââ runUnitTest.ts # Validates task completion via tests
â â âââ redis.ts # Redis pub/sub integration
â âââ db/
â â âââ schema.ts # Database schema (runs, tasks)
â â âââ queries/ # Database query functions
â â âââ migrations/ # SQL migrations
â âââ exercises/
â âââ index.ts # Exercise loading utilities
âââ scripts/
âââ setup.sh # Local macOS setup script
apps/web-evals/ – Evals Management Web App
apps/web-evals/
âââ src/
â âââ app/
â â âââ page.tsx # Home page (runs list)
â â âââ runs/
â â â âââ new/ # Create new eval run
â â â âââ [id]/ # View specific run status
â â âââ api/runs/ # SSE streaming endpoint
â âââ actions/ # Server actions
â â âââ runs.ts # Run CRUD operations
â â âââ tasks.ts # Task queries
â â âââ exercises.ts # Exercise listing
â â âââ heartbeat.ts # Controller health checks
â âââ hooks/ # React hooks (SSE, models, etc.)
â âââ lib/ # Utilities and schemas
apps/web-roo-code/src/app/evals/ – Public Website Evals Page
apps/web-roo-code/src/app/evals/
âââ page.tsx # Fetches and displays public eval results
âââ evals.tsx # Main evals display component
âââ plot.tsx # Visualization component
âââ types.ts # EvalRun type (extends packages/evals types)
This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.
Architecture Overview
The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Web App (apps/web-evals) ââââââââââââââââââââââââââââââââ â
â â â
â â¼ â
â PostgreSQL âââââ⺠Controller Container â
â â â â
â â¼ â¼ â
â Redis ââââ⺠Runner Containers (1-25 parallel) â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Key components:
- Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
- Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes
- Redis: Pub/sub for real-time events (NOT task queuing)
- PostgreSQL: Stores runs, tasks, metrics
Common Tasks Quick Reference
Adding a New Eval Exercise
- Add exercise to Roo-Code-Evals repo (external)
- See
packages/evals/ADDING-EVALS.mdfor structure
Modifying Eval CLI Behavior
Edit files in packages/evals/src/cli/:
runEvals.ts– Run orchestrationrunTask.ts– Task executionrunUnitTest.ts– Test validation
Modifying the Evals Web Interface
Edit files in apps/web-evals/src/:
app/runs/new/new-run.tsx– New run formactions/runs.ts– Run server actions
Modifying the Public Evals Display Page
Edit files in apps/web-roo-code/src/app/evals/:
Database Schema Changes
- Edit
packages/evals/src/db/schema.ts - Generate migration:
cd packages/evals && pnpm drizzle-kit generate - Apply migration:
pnpm drizzle-kit migrate
Running Evals Locally
# From repo root
pnpm evals
# Opens web UI at http://localhost:3446
Ports (defaults):
- PostgreSQL: 5433
- Redis: 6380
- Web: 3446
Testing
# packages/evals tests
cd packages/evals && npx vitest run
# apps/web-evals tests
cd apps/web-evals && npx vitest run
Key Types/Exports from @roo-code/evals
The package exports are defined in packages/evals/src/index.ts:
- Database queries:
getRuns,getTasks,getTaskMetrics, etc. - Schema types:
Run,Task,TaskMetrics - Used by both
apps/web-evalsandapps/web-roo-code