confidence-calibration
3
总安装量
3
周安装量
#59601
全站排名
安装命令
npx skills add https://github.com/jamelna-apps/claude-dash --skill confidence-calibration
Agent 安装分布
gemini-cli
3
claude-code
3
codex
3
opencode
3
qoder
2
replit
2
Skill 文档
Confidence Calibration Framework
When This Activates
This skill activates when:
- Expressing uncertainty about a suggestion
- Working in a domain with past errors
- User asks “how confident are you?”
- Making predictions or recommendations
Domain Tracking
The system tracks prediction accuracy across domains:
| Domain Category | Examples |
|---|---|
| Infrastructure | docker, kubernetes, nginx, ci/cd |
| Frontend | react, react-native, nextjs, expo |
| Languages | typescript, javascript, python |
| Backend | firebase, firestore, authentication |
| Operations | testing, git, database, api |
| Optimization | performance, security, caching |
Calibration Data Structure
{
"domain_stats": {
"docker": {
"correct": 12,
"incorrect": 3,
"partial": 2,
"accuracy": 0.71
}
},
"overall": {
"correct": 145,
"incorrect": 23,
"partial": 18
}
}
How to Express Calibrated Confidence
High Confidence (>85% domain accuracy)
"This approach should work well - it follows established patterns."
Medium Confidence (60-85% accuracy)
"This is my best assessment, though you may want to verify [specific aspect]."
Low Confidence (<60% accuracy, or past errors in domain)
"I've had some misses in [domain] before. Let me double-check this..."
"I'm less certain here - consider testing thoroughly before proceeding."
Unknown Domain
"I don't have much track record in [area]. Proceed with appropriate caution."
Self-Awareness Triggers
When working in a domain with past errors:
- Check track record before making recommendations
- Acknowledge past mistakes if relevant: “I’ve gotten Docker networking wrong before…”
- Suggest verification for uncertain areas
- Ask clarifying questions rather than guessing
Recording Outcomes
When the user indicates an outcome:
Success signals:
- “That worked!”
- “Perfect”
- “Thanks, it’s fixed”
Failure signals:
- “That didn’t work”
- “Still broken”
- “Wrong”
Partial signals:
- “Almost”
- “Partly fixed”
- “One issue remaining”
Domain Detection Keywords
DOMAIN_KEYWORDS = {
"docker": ["docker", "container", "dockerfile", "compose"],
"react": ["react", "component", "jsx", "hooks", "useState"],
"react-native": ["react native", "expo", "metro"],
"nextjs": ["next.js", "nextjs", "getServerSideProps"],
"typescript": ["typescript", "type", "interface"],
"firebase": ["firebase", "firestore"],
"authentication": ["auth", "login", "token", "jwt"],
"testing": ["test", "jest", "mock", "coverage"],
"git": ["git", "commit", "branch", "merge"],
"performance": ["slow", "optimize", "cache", "memory"]
}
Integration with Learning System
Confidence data feeds into:
<semantic-memory>context injection- ReasoningBank for pattern matching
- Preference learner for style calibration
Example Workflow
User: "Set up Docker networking between containers"
1. Detect domain: docker
2. Check calibration: docker accuracy = 71%
3. Check past corrections: "Docker can't use Metal GPU on Mac"
4. Respond with calibrated confidence:
"For container networking, you'll want a bridge network.
Note: I've had some edge cases with Docker networking before,
so if this doesn't work immediately, the issue is usually
DNS resolution between containers."