solo-retro
npx skills add https://github.com/fortunto2/solo-factory --skill solo-retro
Agent 安装分布
Skill 文档
/retro
This skill is self-contained â follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
Post-pipeline retrospective. Parses pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.
When to use
After a pipeline completes (or gets cancelled). This is the process quality check â /review checks code quality, /retro checks pipeline process quality.
Can also be used standalone on any project â with or without pipeline logs.
MCP Tools (use if available)
session_search(query)â find past pipeline runs and known issuescodegraph_explain(project)â understand project architecture contextcodegraph_query(query)â query code graph for project metadata
If MCP tools are not available, fall back to Glob + Grep + Read.
Phase 1: Locate Artifacts
-
Detect project from
$ARGUMENTSor CWD:- If argument provided: use it as project name
- Otherwise: extract from CWD basename (e.g.,
~/projects/my-app->my-app)
-
Find pipeline state file:
.solo/pipelines/solo-pipeline-{project}.local.md(project-local) or~/.solo/pipelines/solo-pipeline-{project}.local.md(global fallback)- If it exists: pipeline is still running or wasn’t cleaned up â read YAML frontmatter for
project_root: - If not: pipeline completed â use CWD as project root
- If it exists: pipeline is still running or wasn’t cleaned up â read YAML frontmatter for
-
Verify artifacts exist (parallel reads):
- Pipeline log:
{project_root}/.solo/pipelines/pipeline.log - Iter logs:
{project_root}/.solo/pipelines/iter-*.log - Progress file:
{project_root}/.solo/pipelines/progress.md - Plan-done directory:
{project_root}/docs/plan-done/ - Active plan:
{project_root}/docs/plan/
- Pipeline log:
-
Determine analysis mode:
- If pipeline log exists: proceed with full log-based analysis (Phases 2-4)
- If NO pipeline log: switch to fallback mode (see Fallback Analysis below)
-
Count iter logs (if they exist):
ls {project_root}/.solo/pipelines/iter-*.log | wc -l- Report: “Found {N} iteration logs”
Fallback Analysis (No Pipeline Logs)
If no pipeline logs exist, the retro can still provide value by analyzing:
- Git history:
git log --oneline --since="1 week ago"â commit frequency, patterns, conventional format - Test results: run test suite if configured in CLAUDE.md or package.json
- Build status: run build if configured
- CLAUDE.md changes:
git log --oneline -- CLAUDE.mdâ how docs evolved - Code quality metrics: file counts, TODO/FIXME density, dead code indicators
- Project structure: completeness of docs/, tests/, CI config
Skip Phases 2-4 and proceed directly to Phase 5 (Plan Fidelity) and Phase 6 (Git & Code Quality). Adjust Phase 7 scoring to weight available data more heavily.
Phase 2: Parse Pipeline Log (quantitative)
Read pipeline.log in full. Parse line-by-line, extracting structured data from log tags:
Log format: [HH:MM:SS] TAG | message
Extract by tag:
| Tag | What to extract |
|---|---|
START |
Pipeline run boundary â count restarts (multiple START lines = restarts) |
STAGE |
iter N/M | stage S/T: {stage_id} â iteration count per stage |
SIGNAL |
<solo:done/> or <solo:redo/> â which stages got completion signals |
INVOKE |
Skill invoked â extract skill name, check for wrong names |
ITER |
commit: {sha} | result: {stage complete|continuing} â per-iteration outcome |
CHECK |
{stage} | {path} -> FOUND|NOT FOUND â marker file checks |
FINISH |
Duration: {N}m â total duration per run |
MAXITER |
Reached max iterations ({N}) â hit iteration ceiling |
QUEUE |
Plan cycling events (activating, archiving) |
CIRCUIT |
Circuit breaker triggered (if present) |
CWD |
Working directory changes |
CTRL |
Control signals (pause/stop/skip) |
Compute metrics:
total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"
per_stage = {
stage_id: {
attempts: count of STAGE lines for this stage,
successes: count of ITER lines with "stage complete" for this stage,
waste_ratio: (attempts - successes) / attempts * 100,
}
}
Phase 3: Parse Progress.md (qualitative)
Read progress.md and scan for error patterns:
- Unknown skill errors: grep for
Unknown skill:â extract which skill name was wrong - Empty iterations: iterations where “Last 5 lines” show only errors or session header (no actual work done)
- Repeated errors: same error appearing in consecutive iterations -> spin-loop indicator
- Doubled signals:
<solo:done/><solo:done/>in same iteration -> minor noise (note but don’t penalize) - Redo loops: count how many times build->review->redo->build cycles occurred
For each error pattern found, record:
- Pattern name
- First occurrence (iteration number)
- Total occurrences
- Consecutive streak (max)
Phase 4: Analyze Iter Logs (sample-based)
Do NOT read all iter logs â could be 60+. Use smart sampling:
-
First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
- Strip ANSI codes when reading:
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
- Strip ANSI codes when reading:
-
First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
- Look for
<solo:done/>in the output
- Look for
-
Final review iter: Read the last
iter-*-review.log(the verdict) -
Extract from each sampled log:
- Tools called (count of tool_use blocks)
- Errors encountered (grep for
Error,error,Unknown,failed) - Signal output (
<solo:done/>or<solo:redo/>present?) - First 5 and last 10 meaningful lines (skip blank lines)
Phase 5: Plan Fidelity Check
For each track directory in docs/plan-done/ and docs/plan/:
-
Read spec.md (if exists):
- Count acceptance criteria: total
- [ ]and- [x]checkboxes - Calculate:
criteria_met = checked / total * 100
- Count acceptance criteria: total
-
Read plan.md (if exists):
- Count tasks: total
- [ ]and- [x]checkboxes - Count phases (## headers)
- Check for SHA annotations (
<!-- sha:... -->) - Calculate:
tasks_done = checked / total * 100
- Count tasks: total
-
Compile per-track summary:
- Track ID, criteria met %, tasks done %, has SHAs
Phase 6: Git & Code Quality (lightweight)
Quick checks only â NOT a full /review:
-
Commit count and format:
git -C {project_root} log --oneline | wc -l git -C {project_root} log --oneline | head -30- Count commits with conventional format (
feat:,fix:,chore:,test:,docs:,refactor:,build:,ci:,perf:) - Calculate:
conventional_pct = conventional / total * 100
- Count commits with conventional format (
-
Committer breakdown:
git -C {project_root} shortlog -sn --no-merges | head -10 -
Test status (if test command exists in CLAUDE.md or package.json):
- Run test suite, capture pass/fail count
- If no test command found, skip and note “no tests configured”
-
Build status (if build command exists):
- Run build, capture success/fail
- If no build command found, skip and note “no build configured”
Phase 7: Score & Report
Load scoring rubric from ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md.
If plugin root not available, use the embedded weights:
Scoring weights:
- Efficiency (waste %): 25%
- Stability (restarts): 20%
- Fidelity (criteria met): 20%
- Quality (test pass rate): 15%
- Commits (conventional %): 5%
- Docs (plan staleness): 5%
- Signals (clean signals): 5%
- Speed (total duration): 5%
Note: In fallback mode (no pipeline logs), redistribute Efficiency and Stability weights to Fidelity, Quality, and Commits.
Generate report at {project_root}/docs/retro/{date}-retro.md:
# Pipeline Retro: {project} ({date})
## Overall Score: {N}/10
## Pipeline Efficiency
| Metric | Value | Rating |
|--------|-------|--------|
| Total iterations | {N} | |
| Productive iterations | {N} ({pct}%) | {emoji} |
| Wasted iterations | {N} ({pct}%) | {emoji} |
| Pipeline restarts | {N} | {emoji} |
| Max-iter hits | {N} | {emoji} |
| Total duration | {time} | {emoji} |
## Per-Stage Breakdown
| Stage | Attempts | Successes | Waste % | Notes |
|-------|----------|-----------|---------|-------|
| scaffold | | | | |
| setup | | | | |
| plan | | | | |
| build | | | | |
| deploy | | | | |
| review | | | | |
## Failure Patterns
### Pattern 1: {name}
- **Occurrences:** {N} iterations
- **Root cause:** {analysis}
- **Wasted:** {N} iterations
- **Fix:** {concrete suggestion with file reference}
### Pattern 2: ...
## Plan Fidelity
| Track | Criteria Met | Tasks Done | SHAs | Rating |
|-------|-------------|------------|------|--------|
| {track-id} | {N}% | {N}% | {yes/no} | {emoji} |
## Code Quality (Quick)
- **Tests:** {N} pass, {N} fail (or "not configured")
- **Build:** PASS / FAIL (or "not configured")
- **Commits:** {N} total, {pct}% conventional format
## Three-Axis Growth
| Axis | Score | Evidence |
|------|-------|----------|
| **Technical** (code, tools, architecture) | {0-10} | {what changed} |
| **Cognitive** (understanding, strategy, decisions) | {0-10} | {what improved} |
| **Process** (harness, skills, pipeline, docs) | {0-10} | {what evolved} |
If only one axis is served â note what's missing.
## Recommendations
1. **[CRITICAL]** {patch suggestion with file:line reference}
2. **[HIGH]** {improvement}
3. **[MEDIUM]** {optimization}
4. **[LOW]** {nice-to-have}
## Suggested Patches
### Patch 1: {file} â {description}
**What:** {one-line description}
**Why:** {root cause reference from Failure Patterns}
\```diff
- old line
+ new line
\```
Rating guide (use these emojis):
- GREEN = excellent
- YELLOW = acceptable
- RED = needs attention
Phase 8: Interactive Patching
After generating the report:
-
Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
-
For each suggested patch (if any), use
AskUserQuestion:- Question: “Apply patch to {file}? {one-line description}”
- Options: “Apply” / “Skip” / “Show diff first”
-
If “Show diff first”: display the full diff, then ask again (Apply / Skip)
-
If “Apply”: use Edit tool to apply the change directly
-
After all patches processed:
- If any patches were applied: suggest committing with
fix(retro): {description} - Do NOT auto-commit â just suggest the command
- If any patches were applied: suggest committing with
Phase 9: CLAUDE.md Revision
After patching, revise the project’s CLAUDE.md to keep it lean and useful for future agents.
Steps:
- Read CLAUDE.md and check size:
wc -c CLAUDE.md - Add learnings from this retro:
- Pipeline failure patterns worth remembering (avoid next time)
- New workflow rules or process improvements
- Updated commands or tooling changes
- Architecture decisions that emerged during the pipeline run
- If over 40,000 characters â trim ruthlessly:
- Collapse completed phase/milestone histories into one line each
- Remove verbose explanations â keep terse, actionable notes
- Remove duplicate info (same thing explained in multiple sections)
- Remove historical migration notes, old debugging context
- Remove examples that are obvious from code or covered by skill/doc files
- Remove outdated troubleshooting for resolved issues
- Verify result <= 40,000 characters â if still over, cut least actionable content
- Write updated CLAUDE.md, update “Last updated” date
Priority (keep -> cut):
- ALWAYS KEEP: Tech stack, directory structure, Do/Don’t rules, common commands, architecture decisions
- KEEP: Workflow instructions, troubleshooting for active issues, key file references
- CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
- CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues
Rules:
- Never remove Do/Don’t sections â critical guardrails
- Preserve overall section structure and ordering
- Every line must earn its place: “would a future agent need this to do their job?”
- Commit the update:
git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"
Phase 10: Factory Critic (optional)
Run this phase only if ${CLAUDE_PLUGIN_ROOT} is available (i.e., solo-factory is installed). Skip if running as a standalone skill without the factory context.
After evaluating the project pipeline, step back and evaluate the factory itself â the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.
What to evaluate:
-
Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
- For each skill:
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md - Did the skill have the right instructions for this project’s needs?
- Did it miss context it should have had?
- For each skill:
-
Read pipeline script signal handling and stage logic:
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh- Were there structural issues (wrong stage order, missing re-exec, broken redo)?
-
Cross-reference with failure patterns from Phase 3:
- For each failure: was the root cause in the skill, the script, or the project?
- Skills that caused waste = factory defects
Score the factory (not the project):
Factory Score: {N}/10
Skill quality:
- {skill}: {score}/10 â {why}
- {skill}: {score}/10 â {why}
Pipeline reliability: {N}/10 â {why}
Missing capabilities:
- {what the factory couldn't do that it should have}
Top factory defects:
1. {defect} â {which file to fix} â {concrete fix}
2. {defect} â {which file to fix} â {concrete fix}
Harness Evolution â think about the bigger picture
After scoring the factory, step back further and think about the harness â the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:
-
Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
- Missing docs -> add to
docs/or CLAUDE.md - Stale docs -> flag for doc-gardening
- Knowledge only in your head -> encode it
- Missing docs -> add to
-
Architectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
- Repeated boundary violations -> need a linter or structural test
- Inconsistent patterns -> need golden principle in CLAUDE.md
- Data shape errors -> need parse-at-boundary enforcement
-
Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
- Good patterns -> capture as precedent in docs or CLAUDE.md
- Bad patterns -> encode as anti-pattern or lint rule
- Think: “if another agent hits this same problem tomorrow, what should it find?”
-
Skill gaps: Which skills need better instructions? Which new skills should exist?
- Skill that caused waste -> concrete SKILL.md patch
- Missing capability -> new skill idea for evolution log
Write to evolution log:
Append findings to {project_root}/docs/evolution.md (create if not exists). If ~/.solo/evolution.md exists, append there as well for cross-project tracking.
## {YYYY-MM-DD} | {project} | Factory Score: {N}/10
Pipeline: {stages run} | Iters: {total} | Waste: {pct}%
### Defects
- **{severity}** | {skill/script}: {description}
- Fix: {concrete file:change}
### Harness Gaps
- **Context:** {what knowledge was missing or stale for the agent}
- **Constraints:** {what boundary violations or inconsistencies occurred}
- **Precedents:** {patterns worth capturing for future agents â good or bad}
### Missing
- {capability the factory lacked}
### What worked well
- {skill/pattern that performed efficiently}
Rules:
- Be brutally honest â if a skill is broken, say so
- Every defect must have a concrete fix (file + what to change)
- Track what works well too â don’t regress good patterns
- Keep entries compact â this file accumulates over time
Signal Output
Output signal: <solo:done/>
Important: /retro always outputs <solo:done/> â it never needs redo. Even if pipeline was terrible, the retro itself always completes.
Edge Cases
- No pipeline.log and no git history: abort with clear message â “No pipeline log or git history found. Nothing to analyze.”
- No pipeline.log but git history exists: switch to fallback mode (see Fallback Analysis)
- Empty pipeline.log: report “Pipeline log is empty â was the pipeline cancelled before any iteration?”
- No iter logs: skip Phase 4 sampling, note in report
- No plan-done: skip Phase 5, note “No completed plans found”
- No test/build commands: skip those checks in Phase 6, note in report
- Pipeline still running: warn user â “State file exists, pipeline may still be running. Retro on partial data.”
Reference Files
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.mdâ scoring rubric (8 axes, weights)${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.mdâ known failure patterns and fixes