sherlock-review
1
总安装量
1
周安装量
#41428
全站排名
安装命令
npx skills add https://github.com/proffesor-for-testing/sentinel-api-testing --skill sherlock-review
Agent 安装分布
mcpjam
1
claude-code
1
replit
1
windsurf
1
zencoder
1
Skill 文档
Sherlock Review
<default_to_action> When investigating code claims:
- OBSERVE: Gather all evidence (code, tests, history, behavior)
- DEDUCE: What does evidence actually show vs. what was claimed?
- ELIMINATE: Rule out what cannot be true
- CONCLUDE: Does evidence support the claim?
- DOCUMENT: Findings with proof, not assumptions
The 3-Step Investigation:
# 1. OBSERVE: Gather evidence
git diff <commit>
npm test -- --coverage
# 2. DEDUCE: Compare claim vs reality
# Does code match description?
# Do tests prove the fix/feature?
# 3. CONCLUDE: Verdict with evidence
# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED
Holmesian Principles:
- “Data! Data! Data!” – Collect before concluding
- “Eliminate the impossible” – What cannot be true?
- “You see, but do not observe” – Run code, don’t just read
- Trust only reproducible evidence </default_to_action>
Quick Reference Card
Evidence Collection Checklist
| Category | What to Check | How |
|---|---|---|
| Claim | PR description, commit messages | Read thoroughly |
| Code | Actual file changes | git diff |
| Tests | Coverage, assertions | Run independently |
| Behavior | Runtime output | Execute locally |
| Timeline | When things happened | git log, git blame |
Verdict Levels
| Verdict | Meaning |
|---|---|
| â TRUE | Evidence fully supports claim |
| â PARTIALLY TRUE | Claim accurate but incomplete |
| â FALSE | Evidence contradicts claim |
| ? NONSENSICAL | Claim doesn’t apply to context |
Investigation Template
## Sherlock Investigation: [Claim]
### The Claim
"[What PR/commit claims to do]"
### Evidence Examined
- Code changes: [files, lines]
- Tests added: [count, coverage]
- Behavior observed: [what actually happens]
### Deductive Analysis
**Claim**: [specific assertion]
**Evidence**: [what you found]
**Deduction**: [logical conclusion]
**Verdict**: â/â /â
### Findings
- What works: [with evidence]
- What doesn't: [with evidence]
- What's missing: [gaps in implementation/testing]
### Recommendations
1. [Action based on findings]
Investigation Scenarios
Scenario 1: “This Fixed the Bug”
Steps:
- Reproduce bug on commit before fix
- Verify bug is gone on commit with fix
- Check if fix addresses root cause or symptom
- Test edge cases not in original report
Red Flags:
- Fix that just removes error logging
- Works only for specific test case
- Workarounds instead of root cause fix
- No regression test added
Scenario 2: “Improved Performance by 50%”
Steps:
- Run benchmark on baseline commit
- Run same benchmark on optimized commit
- Compare in identical conditions
- Verify measurement methodology
Red Flags:
- Tested only on toy data
- Different comparison conditions
- Trade-offs not mentioned
Scenario 3: “Handles All Edge Cases”
Steps:
- List all edge cases in code path
- Check each has test coverage
- Test boundary conditions
- Verify error handling paths
Red Flags:
catch {}swallowing errors- Generic error messages
- No logging of critical errors
Example Investigation
## Case: PR #123 "Fix race condition in async handler"
### Claims Examined:
1. "Eliminates race condition"
2. "Adds mutex locking"
3. "100% thread safe"
### Evidence:
- File: src/handlers/async-handler.js
- Changes: Added `async/await`, removed callbacks
- Tests: 2 new tests for async flow
- Coverage: 85% (was 75%)
### Analysis:
**Claim 1: "Eliminates race condition"**
Evidence: Added `await` to sequential operations. No actual mutex.
Deduction: Race avoided by removing concurrency, not synchronization.
Verdict: â PARTIALLY TRUE (solved differently than claimed)
**Claim 2: "Adds mutex locking"**
Evidence: No mutex library, no lock variables, no sync primitives.
Verdict: â FALSE
**Claim 3: "100% thread safe"**
Evidence: JavaScript is single-threaded. No worker threads used.
Verdict: ? NONSENSICAL (meaningless in this context)
### Conclusion:
Fix works but not for reasons claimed. Race condition avoided by
making operations sequential, not by adding synchronization.
### Recommendations:
1. Update PR description to accurately reflect solution
2. Add test for concurrent request handling
3. Remove incorrect technical claims
Agent Integration
// Evidence-based code review
await Task("Sherlock Review", {
prNumber: 123,
claims: [
"Fixes memory leak",
"Improves performance 30%"
],
verifyReproduction: true,
testEdgeCases: true
}, "qe-code-reviewer");
// Bug fix verification
await Task("Verify Fix", {
bugCommit: 'abc123',
fixCommit: 'def456',
reproductionSteps: steps,
testBoundaryConditions: true
}, "qe-code-reviewer");
Agent Coordination Hints
Memory Namespace
aqe/sherlock/
âââ investigations/* - Investigation reports
âââ evidence/* - Collected evidence
âââ verdicts/* - Claim verdicts
âââ patterns/* - Common deception patterns
Fleet Coordination
const investigationFleet = await FleetManager.coordinate({
strategy: 'evidence-investigation',
agents: [
'qe-code-reviewer', // Code analysis
'qe-security-auditor', // Security claim verification
'qe-performance-validator' // Performance claim verification
],
topology: 'parallel'
});
Related Skills
- brutal-honesty-review – Direct technical criticism
- context-driven-testing – Adapt to context
- bug-reporting-excellence – Document findings
Remember
“It is a capital mistake to theorize before one has data.” Trust only reproducible evidence. Don’t trust commit messages, documentation, or “works on my machine.”
The Sherlock Standard: Every claim must be verified empirically. What does the evidence actually show?