ai-checking-outputs
npx skills add https://github.com/lebsral/dspy-programming-not-prompting-lms-skills --skill ai-checking-outputs
Agent 安装分布
Skill 文档
Check AI Output Before It Ships
Guide the user through adding verification and guardrails so bad AI outputs never reach users. The pattern: generate, check, fix or reject.
Step 1: Understand what to check
Ask the user:
- What could go wrong? (hallucinations, wrong format, offensive content, missing info, factual errors?)
- How strict does it need to be? (reject bad outputs vs. try to fix them?)
- What’s the cost of a bad output reaching users? (annoyance vs. legal/safety risk)
Step 2: Quick wins â DSPy assertions
The simplest way to add checks. dspy.Assert is a hard stop (retry if violated), dspy.Suggest is a soft nudge:
import dspy
class CheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
result = self.respond(question=question)
# Hard checks â will retry if these fail
dspy.Assert(
len(result.answer) > 0,
"Must produce an answer"
)
dspy.Assert(
len(result.answer.split()) <= 200,
"Answer must be under 200 words"
)
# Soft checks â hints for improvement
dspy.Suggest(
"I don't know" not in result.answer.lower(),
"Try to provide a substantive answer"
)
dspy.Suggest(
not any(word in result.answer.lower() for word in ["definitely", "absolutely", "100%"]),
"Avoid overconfident language"
)
return result
DSPy will automatically retry the LM call (with the assertion feedback) when an Assert fails, up to a configurable number of times.
Step 3: Format validation
Type-based validation (automatic)
DSPy validates typed outputs automatically:
from typing import Literal
from pydantic import BaseModel, Field
class Response(BaseModel):
answer: str = Field(min_length=1, max_length=500)
confidence: float = Field(ge=0.0, le=1.0)
category: str
class MySignature(dspy.Signature):
question: str = dspy.InputField()
response: Response = dspy.OutputField()
Pydantic catches malformed JSON, out-of-range values, and wrong types before your code ever sees them.
Custom validation in the module
import re
class ValidatedExtractor(dspy.Module):
def __init__(self):
self.extract = dspy.ChainOfThought(ExtractContact)
def forward(self, text):
result = self.extract(text=text)
# Validate email format
dspy.Assert(
re.match(r"[^@]+@[^@]+\.[^@]+", result.email or ""),
"Email must be a valid email address"
)
# Validate phone format
dspy.Assert(
len(re.sub(r"\D", "", result.phone or "")) >= 10,
"Phone must have at least 10 digits"
)
return result
Step 4: Factual verification
Self-check â ask the AI to verify its own output
class VerifyFacts(dspy.Signature):
"""Check if the answer is supported by the given context."""
context: list[str] = dspy.InputField(desc="Source documents")
answer: str = dspy.InputField(desc="Generated answer to verify")
is_supported: bool = dspy.OutputField(desc="Is the answer fully supported by the context?")
unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")
class GroundedResponder(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(AnswerFromDocs)
self.verify = dspy.Predict(VerifyFacts)
def forward(self, question):
context = self.retrieve(question).passages
response = self.answer(context=context, question=question)
# Verify the answer is grounded in sources
check = self.verify(context=context, answer=response.answer)
dspy.Assert(
check.is_supported,
f"Answer contains unsupported claims: {check.unsupported_claims}. "
"Rewrite using only information from the context."
)
return response
Cross-check â generate two ways, compare
class CrossCheckedAnswer(dspy.Module):
def __init__(self):
self.answer_a = dspy.ChainOfThought(AnswerQuestion)
self.answer_b = dspy.ChainOfThought(AnswerQuestion)
self.compare = dspy.ChainOfThought(CompareAnswers)
def forward(self, question):
a = self.answer_a(question=question)
b = self.answer_b(question=question)
comparison = self.compare(
question=question,
answer_a=a.answer,
answer_b=b.answer,
)
dspy.Assert(
comparison.agree,
"Two independent generations disagree â the answer may be unreliable"
)
return a
class CompareAnswers(dspy.Signature):
"""Check if two independently generated answers agree."""
question: str = dspy.InputField()
answer_a: str = dspy.InputField()
answer_b: str = dspy.InputField()
agree: bool = dspy.OutputField(desc="Do the answers substantially agree?")
discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")
Step 5: Safety and content filtering
Block harmful outputs
BLOCKED_PATTERNS = [
r"\b(password|secret|api.?key)\b",
r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern
]
class SafeResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
result = self.respond(question=question)
# Check for leaked sensitive data
for pattern in BLOCKED_PATTERNS:
dspy.Assert(
not re.search(pattern, result.answer, re.IGNORECASE),
f"Response may contain sensitive data (pattern: {pattern})"
)
return result
AI-as-safety-judge
class SafetyCheck(dspy.Signature):
"""Check if the response is safe and appropriate."""
question: str = dspy.InputField()
response: str = dspy.InputField()
is_safe: bool = dspy.OutputField()
concern: str = dspy.OutputField(desc="Safety concern if not safe, empty if safe")
class SafetyCheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
self.check = dspy.Predict(SafetyCheck)
def forward(self, question):
result = self.respond(question=question)
safety = self.check(question=question, response=result.answer)
dspy.Assert(
safety.is_safe,
f"Response flagged as unsafe: {safety.concern}. Regenerate."
)
return result
Step 6: Generate â Filter â Pick best (ensemble pattern)
For high-stakes outputs, generate multiple candidates and filter:
class FilteredEnsemble(dspy.Module):
def __init__(self, num_candidates=5):
self.generators = [dspy.ChainOfThought(GenerateAnswer) for _ in range(num_candidates)]
self.judge = dspy.ChainOfThought(RankAnswers)
def forward(self, question):
candidates = []
for gen in self.generators:
try:
result = gen(question=question)
# Only keep candidates that pass basic checks
if len(result.answer) > 0 and len(result.answer.split()) < 200:
candidates.append(result.answer)
except Exception:
continue
dspy.Assert(len(candidates) > 0, "No valid candidates generated")
return self.judge(question=question, candidates=candidates)
class RankAnswers(dspy.Signature):
"""Pick the best answer from the candidates."""
question: str = dspy.InputField()
candidates: list[str] = dspy.InputField()
best_answer: str = dspy.OutputField()
How backtracking works
When dspy.Assert fails, DSPy doesn’t just retry blindly:
- The assertion failure is caught
- The error message is fed back to the LM as additional context
- The LM retries with this feedback (e.g., “your answer was 350 words, must be under 280”)
- This repeats up to
max_backtrack_attemptstimes (default: 2) - If all retries fail, the assertion raises an error
This is why specific error messages matter â they’re the model’s self-correction instructions. “Response is 350 words, must be under 280” is much more useful than “too long.”
When combined with optimization (/ai-improving-accuracy), the model learns to satisfy constraints on the first try, reducing retries in production.
Key patterns
- Assert for hard requirements â format, length, safety. DSPy retries automatically.
- Suggest for soft preferences â style, tone, detail level. Won’t block but nudges.
- Pydantic for structure â catches malformed output automatically.
- Self-verification for facts â ask the AI “is this grounded in the sources?”
- Cross-checking for reliability â generate twice independently, compare.
- Regex for sensitive data â block SSNs, API keys, passwords in output.
- Ensemble for high stakes â generate many, filter, pick the best.
Checklist: what to check
| Check | When to use | How |
|---|---|---|
| Non-empty output | Always | dspy.Assert(len(answer) > 0, ...) |
| Length limits | User-facing text | dspy.Assert(len(answer.split()) < N, ...) |
| Valid format | Structured output | Pydantic model + dspy.Assert |
| Grounded in sources | RAG / doc search | Verification signature |
| No sensitive data | Any user-facing output | Regex patterns |
| Safe content | Public-facing apps | AI safety judge |
| Consistent | Critical decisions | Cross-check with two generations |
| High quality | High-stakes outputs | Ensemble + ranking |
Additional resources
- Use
/ai-stopping-hallucinationsfor citation enforcement, faithfulness verification, and grounding AI in facts - Use
/ai-following-rulesfor defining and enforcing content policies, format rules, and business constraints - Use
/ai-building-pipelinesto wire checks into multi-step systems - Use
/ai-making-consistentfor output consistency (not correctness) - Use
/ai-testing-safetyto stress-test your guardrails with adversarial attacks - Need to evaluate human work against criteria? Use
/ai-scoring - Next:
/ai-improving-accuracyto measure and improve quality