eval-accuracy
4
总安装量
3
周安装量
#51407
全站排名
安装命令
npx skills add https://github.com/whitespectre/ai-assistant-evals --skill eval-accuracy
Agent 安装分布
cursor
3
opencode
3
openclaw
2
claude-code
2
github-copilot
2
codex
2
Skill 文档
Eval Accuracy
Use this skill to evaluate how factually accurate an assistant response is.
Inputs
Require:
- The assistant response text to evaluate.
Internal Rubric (1â5)
5 = Factually correct, no misleading claims, no hallucinations, claims are well-supported or appropriately qualified
4 = Mostly correct, minor imprecision that does not materially affect meaning
3 = Partially correct, contains one significant inaccuracy or unsupported claim
2 = Multiple inaccuracies or misleading statements
1 = Fundamentally incorrect, fabricated, or contradicts known facts
Workflow
- Evaluate factual claims in the response.
- Compare them against widely accepted knowledge.
- Score accuracy on a 1-5 integer scale using the rubric only.
- Write concise rationale tied directly to rubric criteria.
- Produce actionable suggestions that improve factual correctness.
Output Contract
Return JSON only. Do not include markdown, backticks, prose, or extra keys.
Use exactly this schema:
{ “dimension”: “accuracy”, “score”: 1, “rationale”: “…”, “improvement_suggestions”: [ “…” ] }
Hard Rules
dimensionmust always equal"accuracy".scoremust be an integer from 1 to 5.rationalemust be concise (max 3 sentences).- Do not include step-by-step reasoning.
improvement_suggestionsmust be a non-empty array of concrete edits.- Never output text outside the JSON object.