ai-making-consistent
npx skills add https://github.com/lebsral/dspy-programming-not-prompting-lms-skills --skill ai-making-consistent
Agent 安装分布
Skill 文档
Make Your AI Consistent
Guide the user through making their AI give reliable, predictable outputs. This is different from “wrong answers” â the AI might be right 80% of the time but unpredictably different each run.
Step 1: Diagnose the inconsistency
Ask the user:
- What’s varying? (the answer itself, the format, the length, the level of detail?)
- How bad is it? (slightly different wording vs. completely different answers)
- Does it matter for your use case? (sometimes variation is fine, sometimes it breaks downstream code)
Quick test: run the same input 5 times
import dspy
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
for i in range(5):
result = my_program(question="What is the capital of France?")
print(f"Run {i+1}: {result.answer}")
If outputs vary, apply the fixes below in order.
Step 2: Set temperature to 0
The single biggest consistency fix. Temperature controls randomness â set it to 0 for deterministic outputs:
lm = dspy.LM("openai/gpt-4o-mini", temperature=0)
dspy.configure(lm=lm)
This alone fixes most consistency issues. Some providers may still have slight variation even at temperature=0 due to floating point non-determinism, but it’s minimal.
Step 3: Constrain output types
Loose output types = more room for variation. Lock them down.
Use Literal for fixed categories
from typing import Literal
class Classify(dspy.Signature):
"""Classify the text."""
text: str = dspy.InputField()
# BAD: label: str â AI can say "positive", "Positive", "pos", "POSITIVE", etc.
# GOOD: locked to exact values
label: Literal["positive", "negative", "neutral"] = dspy.OutputField()
Use Pydantic models for structured output
from pydantic import BaseModel, Field
class StructuredOutput(BaseModel):
category: str
confidence: float = Field(ge=0.0, le=1.0)
tags: list[str]
class MySignature(dspy.Signature):
"""Process the input."""
text: str = dspy.InputField()
result: StructuredOutput = dspy.OutputField()
Pydantic validates the output structure, catching format inconsistencies.
Use bool and int for simple outputs
class CheckFact(dspy.Signature):
"""Is this statement true?"""
statement: str = dspy.InputField()
is_true: bool = dspy.OutputField() # Only True or False â no variation
Step 4: Add output constraints with assertions
Use dspy.Assert for hard requirements and dspy.Suggest for soft preferences:
class ConsistentResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(MySignature)
def forward(self, text):
result = self.respond(text=text)
# Hard constraint â retry if violated
dspy.Assert(
len(result.answer) < 200,
"Answer must be under 200 characters"
)
# Soft constraint â hint to improve
dspy.Suggest(
result.answer.endswith("."),
"Answer should end with a period"
)
return result
Common consistency assertions
# Length constraints
dspy.Assert(len(result.answer.split()) <= 50, "Keep answer under 50 words")
# Format constraints
dspy.Assert(result.answer[0].isupper(), "Answer should start with a capital letter")
# Content constraints
dspy.Assert(
not any(word in result.answer.lower() for word in ["maybe", "perhaps", "i think"]),
"Answer should be definitive, not hedging"
)
Step 5: Optimize to lock in patterns
Optimization teaches the AI consistent patterns through examples. Even a simple BootstrapFewShot run dramatically improves consistency:
# The few-shot examples teach the AI what "good" looks like,
# including format, length, and style
optimizer = dspy.BootstrapFewShot(
metric=metric,
max_bootstrapped_demos=4,
)
optimized = optimizer.compile(my_program, trainset=trainset)
For best consistency, make your metric penalize inconsistency:
def consistency_metric(example, prediction, trace=None):
correct = prediction.answer.lower() == example.answer.lower()
# Penalize answers that are too long or too short
right_length = 5 <= len(prediction.answer.split()) <= 30
# Penalize hedging language
no_hedging = not any(w in prediction.answer.lower() for w in ["maybe", "perhaps"])
return correct and right_length and no_hedging
Step 6: Use caching for identical inputs
DSPy caches LM calls by default. For identical inputs, you’ll always get the same output:
# First call â hits the API
result1 = my_program(question="What is Python?")
# Second call with same input â returns cached result (instant, identical)
result2 = my_program(question="What is Python?")
# result1 and result2 are guaranteed identical
Consistency checklist
- Set
temperature=0 - Use
Literaltypes for categorical outputs - Use Pydantic models for structured outputs
- Use
bool/intfor simple yes/no or numeric outputs - Add
dspy.Assertfor format constraints - Optimize with BootstrapFewShot to lock in patterns
- Rely on caching for repeated identical inputs
Additional resources
- If the AI is consistent but wrong, use
/ai-improving-accuracy - If the AI is throwing errors, use
/ai-fixing-errors