pr-review
npx skills add https://github.com/klohto/agent-skills --skill pr-review
Agent 安装分布
Skill 文档
PR Review Skill
Systematic, thorough pull request review process using parallel analysis and expert validation.
Announce each phase as you enter it, including what you MUST do:
- “## Phase 1: Gather Context” â checkout, get metadata, get diff, read all changed files
- “## Phase 2: Note Concerns” â write inline observations, fill execution-context table (including test code)
- “## Phase 3: Parallel Deep Analysis” â spawn 3+ Task subagents, document findings, verify factual claims
- “## Phase 4: Expert Validation” â Oracle broad sweep, then focused follow-ups on subagent findings
- “## Phase 5: Verify Every Claim” â verify each claim before including
- “## Phase 6: Programmatic Verification” â run scripts to produce reproducible evidence for final findings
Phase 1: Gather Context
-
Checkout the PR branch
git fetch origin pull/<PR_NUMBER>/head:pr-<PR_NUMBER> && git checkout pr-<PR_NUMBER>All files now available locally. (This works even when
gh pr checkoutfails due to deleted remote branches.) -
Get PR metadata (title, body, author – not the diff)
gh pr view <PR_NUMBER> --json title,body,author -
Get the diff
gh pr diff <PR_NUMBER> -
Work locally – Read files, grep patterns. Everything is local after checkout.
â DO NOT RUN TESTS. No pytest, pnpm test, cargo test, go test, etc. CI handles tests. Running them wastes context and tokens.
Important: Read EVERY file in the diff thoroughly, not just skim:
- Production code: trace data flow, check error handling
- Test code: examine fixtures, check how test data is created/modified, look for shared state
- Config files: check for downstream impacts
This deep reading generates the observations for Phase 2. Rushed reading = missed bugs.
Phase 2: Note Concerns as You Go
As you read through changes, write observations directly in your response for anything needing deeper investigation:
“I notice this uses a union type with Temporal – need to verify serialization works correctly.” “This changes the URL format – need to check downstream consumers.”
This creates a visible trail of concerns to address in Phase 3, preventing lost context in large diffs.
Red Flags
- Config file changes (eslint, tsconfig, vite, package.json, etc.) – Check
.github/workflowsand other dependents that may be affected. - New patterns – Before reviewing new code, Grep for how similar things are done elsewhere. Review new code against existing conventions.
- Deleted code – Check nothing else depends on it. Grep for usages.
- Renamed/moved files – Check all imports are updated.
Phase 3: Parallel Deep Analysis (REQUIRED)
For each observation from Phase 2, spawn a Task subagent to investigate in parallel. Do NOT proceed to Phase 4 until subagents complete.
Why subagents: They avoid polluting main context, run in parallel, and go deep on specific concerns.
Minimum expectations:
- 3+ subagents for any non-trivial PR
- Each subagent investigates ONE specific concern
- Run them in parallel (single assistant message with multiple Task calls)
Spawn a Task For
- Tracing data flow / ownership (where does X come from, who mutates it)
- Checking if pattern is consistent with rest of codebase
- Verifying test fixtures match production behavior
- Checking exception handling paths
- Any “I wonder if…” observation from Phase 2
- If test files changed: One subagent MUST analyze test data isolation – grep for ALL copy/fixture patterns, not just spot-check one file
Task Prompt Template
Investigate: [specific concern from Phase 2]
Questions:
1. [Specific question]
2. [Specific question]
Files: [relevant files]
Return: [concrete answer, not "might be an issue"]
After subagents return:
- Document each finding with reasoning
- Verify factual claims – If subagent says “X provides Y format” or “DB stores Z”, confirm by reading actual code/fixtures/schemas. Don’t accept unverified assertions about system behavior.
Phase 4: Expert Validation (REQUIRED)
Oracle – Two-Pass Minimum
You MUST call Oracle at least twice:
Pass 1: Broad sweep (before subagents complete is fine)
Review PR for bugs, edge cases, architectural issues.
Files: [all significant changed files]
Pass 2+: Focused follow-ups (REQUIRED after subagents return) For each subagent finding that isn’t trivially resolved, call Oracle:
Subagent found: [summary of finding]
Is this correct? What edge cases could break this?
Files: [relevant files]
Do NOT skip focused follow-ups. Subagent findings need expert validation before being accepted or dismissed.
Librarian – For External APIs
Use when the PR uses external library APIs or integrates with third-party services.
Phase 5: Verify Every Claim (MANDATORY)
STOP. Before writing ANY claim – issue OR “what’s good” – you MUST verify it first.
For EACH claim you want to include (bugs, risks, AND positive statements):
- Identify the claim type
- Run the required verification (not optional)
- Only include if verification confirms the issue
| Claim Type | Required Verification | If You Skip This |
|---|---|---|
| “X is missing” | Grep for X, check defaults, confirm actually absent | FALSE POSITIVE |
| “X is wrong/buggy” | Oracle review OR trace the logic yourself step-by-step | FALSE POSITIVE |
| “X is inconsistent” | Grep for similar patterns, confirm actual difference | FALSE POSITIVE |
| “X should be Y” | Librarian or official docs to confirm Y is correct | FALSE POSITIVE |
| “Default might be wrong” | Read the default value, check if it’s actually inadequate | FALSE POSITIVE |
| “May fail under repeat/parallel/reorder” | Trace what state crosses invocation boundaries, confirm it’s actually shared/mutable | FALSE POSITIVE |
| “System X provides/stores Y format” | Read actual webhook fixtures, DB schemas, or trace the data flow to confirm | UNVERIFIED ASSUMPTION |
| “This change is correct/good” | Verify the before/after behavior matches expectations by reading actual data | UNVERIFIED PRAISE |
| “Pattern X is used consistently” | Grep for ALL instances, not just one example | SAMPLING BIAS |
DO NOT:
- Include findings from Oracle/Librarian without verifying them yourself
- Assume something is missing without grepping for defaults
- Flag “missing X” when X has a sensible default
- Include hunches or “might be an issue” without verification
If you cannot verify a claim, DROP IT. Do not include it.
Phase 6: Programmatic Verification (REQUIRED)
Before including any finding in the FINAL review, verify it with deterministic, reproducible evidence.
Why: LLM analysis can hallucinate. Scripts execute against actual code and produce objective, reproducible evidence that cannot be faked.
Allowed verification methods:
- Fast local commands (
rg,grep,jq, etc.) - Small read-only Python scripts via
uv run python(no network, no writes, bounded output)
Process (for Python scripts):
- Create temp script file, iterate via
edit_file(avoids re-reading, saves tokens) - Run with
uv run python <script> - Keep output minimal (counts + first N examples). Avoid tracebacks and large dumps.
- Output is the source of truth â if it contradicts the claim, DROP or correct it
Each final finding must include evidence:
**Evidence:**
- Command: `uv run python /tmp/verify_x.py`
- Output: (â¤10 lines)
- Interpretation: [what this proves]
This is NOT running tests. These are small, fast, read-only verification scripts â not pytest, pnpm test, etc.
Organize findings by severity:
| Priority | Category | Criteria |
|---|---|---|
| ð´ P0 | Critical | Runtime failures, data loss, security |
| ð¡ P1 | Medium | Semantic changes, inconsistencies, tech debt |
| ð¢ P2 | Minor | Style, naming, documentation |
Output Format
# PR #<NUMBER> Review: `<title>`
## Summary
Brief description of what the PR does.
## ð´ Critical Issues
[Issues that will cause runtime failures]
## ð¡ Medium Issues
[Semantic changes, inconsistencies]
## ð¢ Minor Issues / Suggestions
[Style, improvements]
## â
What's Good
[Acknowledge well-implemented parts]
## Action Items Summary
| Priority | Issue | Action |
|----------|-------|--------|
| ð´ P0 | Issue name | Required fix |
| ð¡ P1 | Issue name | Recommended change |
Key Principles
- Checkout first –
git checkoutbefore anything, then work locally - No local tests – DO NOT run test suites; CI handles that
- Note concerns inline – Write observations in your response as you review (Phase 2)
- Subagents for depth – Spawn 3+ Task subagents in parallel for non-trivial PRs (Phase 3)
- Oracle twice minimum – Broad sweep + focused follow-ups for each unresolved concern (Phase 4)
- VERIFY BEFORE REPORTING – Every claim must be verified. No exceptions.
- Programmatic evidence – Final findings require reproducible script output (Phase 6)
- Execution-context check – Fill the table for repeat/parallel/reorder analysis