subagent-testing

📁 athola/claude-night-market 📅 Jan 22, 2026

总安装量

周安装量

#18613

全站排名

安装命令

npx skills add https://github.com/athola/claude-night-market --skill subagent-testing

Agent 安装分布

claude-code 11

opencode 6

codex 6

gemini-cli 5

cursor 5

antigravity 4

Skill 文档

Subagent Testing – TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

Overview

Fresh instances prevent priming: Each test uses a new Claude conversation to verify the skill’s impact is measured, not conversation history effects.

Why Fresh Instances Matter

The Priming Problem

Running tests in the same conversation creates bias:

Prior context influences responses
Skill effects get mixed with conversation history
Can’t isolate skill’s true impact

Fresh Instance Benefits

Isolation: Each test starts clean
Reproducibility: Consistent baseline state
Measurement: Clear before/after comparison
Validation: Proves skill effectiveness, not priming

Testing Methodology

Three-phase TDD-style approach:

Phase 1: Baseline Testing (RED)

Test without skill to establish baseline behavior.

Phase 2: With-Skill Testing (GREEN)

Test with skill loaded to measure improvements.

Phase 3: Rationalization Testing (REFACTOR)

Test skill’s anti-rationalization guardrails.

Quick Start

# 1. Create baseline tests (without skill)
# Use 5 diverse scenarios
# Document full responses

# 2. Create with-skill tests (fresh instances)
# Load skill explicitly
# Use identical prompts
# Compare to baseline

# 3. Create rationalization tests
# Test anti-rationalization patterns
# Verify guardrails work

Detailed Testing Guide

For complete testing patterns, examples, and templates:

Testing Patterns – Full TDD methodology
Test Examples – Baseline, with-skill, rationalization tests
Analysis Templates – Scoring and comparison frameworks

Success Criteria

Baseline: Document 5+ diverse baseline scenarios
Improvement: â¥50% improvement in skill-related metrics
Consistency: Results reproducible across fresh instances
Rationalization Defense: Guardrails prevent â¥80% of rationalization attempts