agent-evals
3
总安装量
3
周安装量
#56420
全站排名
安装命令
npx skills add https://github.com/bagelhole/devops-security-agent-skills --skill agent-evals
Agent 安装分布
mcpjam
3
claude-code
3
replit
3
junie
3
windsurf
3
zencoder
3
Skill 文档
Agent Evals
Create repeatable checks so agent behavior improves safely over time.
Evaluation Layers
- Unit evals: prompt-level correctness
- Tool evals: API/tool call decision quality
- End-to-end evals: realistic multi-step tasks
- Safety evals: prompt injection and data leak resistance
CI/CD Integration
# Example eval pipeline steps
make evals-smoke
make evals-regression
make evals-safety
Best Practices
- Version datasets with expected outputs.
- Track pass rates and score drift over time.
- Block deploys on critical safety regressions.
Related Skills
- github-actions – Eval automation in CI
- ai-agent-security – Security-focused eval cases