agent-evals

📁 bagelhole/devops-security-agent-skills 📅 6 days ago
3
总安装量
3
周安装量
#56420
全站排名
安装命令
npx skills add https://github.com/bagelhole/devops-security-agent-skills --skill agent-evals

Agent 安装分布

mcpjam 3
claude-code 3
replit 3
junie 3
windsurf 3
zencoder 3

Skill 文档

Agent Evals

Create repeatable checks so agent behavior improves safely over time.

Evaluation Layers

  • Unit evals: prompt-level correctness
  • Tool evals: API/tool call decision quality
  • End-to-end evals: realistic multi-step tasks
  • Safety evals: prompt injection and data leak resistance

CI/CD Integration

# Example eval pipeline steps
make evals-smoke
make evals-regression
make evals-safety

Best Practices

  • Version datasets with expected outputs.
  • Track pass rates and score drift over time.
  • Block deploys on critical safety regressions.

Related Skills