eval-conversation-flow

📁 whitespectre/ai-assistant-evals 📅 7 days ago
3
总安装量
2
周安装量
#62579
全站排名
安装命令
npx skills add https://github.com/whitespectre/ai-assistant-evals --skill eval-conversation-flow

Agent 安装分布

opencode 2
claude-code 2
cursor 2
mcpjam 1
openhands 1
zencoder 1

Skill 文档

Eval Conversation Flow

Use this skill to evaluate how well an assistant response fits into the conversation: continuity, coherence, turn-taking, and whether it advances the interaction appropriately.

Inputs

Require:

  • The assistant response text to evaluate.
  • (Optional) The user’s prior message(s) for context.

Internal Rubric (1–5)

5 = Seamlessly continues the thread; correctly uses context; answers the user’s current ask; transitions naturally; asks clarifying questions only when truly needed
4 = Generally coherent and responsive; minor awkwardness (slight repetition, small context miss) but flow remains smooth
3 = Some coherence, but noticeable issues (repeats prior content, weak transitions, minor context loss, or slightly mismatched pacing)
2 = Poor flow: ignores or misuses context; abrupt topic shifts; repetitive or stilted; does not move the conversation forward
1 = Broken flow: contradicts prior turns, derails the conversation, or responds as if to a different thread entirely

Workflow

  1. Check context continuity (does it reflect the user’s latest message and prior constraints?).
  2. Check coherence and pacing (logical order, no abrupt shifts, minimal unnecessary repetition).
  3. Check interaction quality (does it advance the conversation appropriately?).
  4. Score on a 1-5 integer scale using the rubric only.
  5. Write concise rationale tied directly to rubric criteria.
  6. Produce actionable suggestions that improve flow.

Output Contract

Return JSON only. Do not include markdown, backticks, prose, or extra keys.

Use exactly this schema:

{ “dimension”: “conversation_flow”, “score”: 1, “rationale”: “…”, “improvement_suggestions”: [ “…” ] }

Hard Rules

  • dimension must always equal "conversation_flow".
  • score must be an integer from 1 to 5.
  • rationale must be concise (max 3 sentences).
  • Do not include step-by-step reasoning.
  • improvement_suggestions must be a non-empty array of concrete edits.
  • Never output text outside the JSON object.