run-simulations
npx skills add https://github.com/ratacat/claude-skills --skill run-simulations
Agent 安装分布
Skill 文档
Run Simulations Skill
You are helping set up and run trading engine simulations, then performing comprehensive diagnostic analysis.
Current Phase: DUAL-FOCUS (Validation + Optimization)
System confidence: ~85% accurate. The simulation system is largely trustworthy, but discrepancies still exist and are critical to find.
Priority #1: ACCURACY DISCREPANCIES (Most Important)
Any gap between expected vs actual behavior is a potential bug or misunderstanding. These must be surfaced prominently:
- Does behavior match config parameters?
- Are there logical inconsistencies in the data?
- Do results align with what the strategy SHOULD produce?
When you spot a discrepancy, flag it prominently. Even small accuracy issues compound over thousands of trades.
Priority #2: PROFIT OPTIMIZATION (Active Research)
With a trustworthy system, we’re now actively researching:
- Which configs perform best?
- What market conditions favor which strategies?
- How do parameters affect PnL?
Surface interesting profit patterns AND accuracy concerns. Both matter now.
IMPORTANT: This skill should be invoked BEFORE running simulations to help with setup decisions (config selection, segment selection via collections, batch sizing).
Recommended Batch Size
Current recommendation: 2,000-3,000 simulations per batch.
- Runs quickly (~30-60 seconds with tick caching)
- Provides statistically meaningful results
- More than 3,000 doesn’t add much value at current stage
- Formula: configs à segments = total runs (e.g., 20 configs à 100 segments = 2,000 runs)
Phase 1: Setup & Run
â ï¸ USER INPUT TAKES PRIORITY
If the user provides ANY arguments, instructions, or context when invoking this skill, those take absolute priority over all defaults below.
- User says “run config X” â run config X, ignore default config selection
- User says “test these 5 segments” â test those 5 segments, ignore default segment query
- User asks a specific question â answer that question, don’t run the full default workflow
- User provides partial instructions â fill in gaps with defaults, but honor what they specified
The defaults below are ONLY for when the skill is invoked with no arguments at all.
Default Behavior (no args only)
Pick a diverse spread of configs and segments using segment collections (legacy filters are deprecated):
- Query available configs:
SELECT name, config FROM strategy_configs ORDER BY name - Select 10-20 configs covering different indicator types (EMA, RSI, Bollinger, combined)
- Choose or create a segment collection (AND-only annotations in v1):
bun src/systems/trading-engine/bd-segments.ts list bun src/systems/trading-engine/bd-segments.ts create --name=high-activity --annotations=high_activity --limit=100 bun src/systems/trading-engine/bd-segments.ts preview --name=high-activity - Run batch simulation:
bun src/systems/trading-engine/batch-runner.ts --config-pattern="<pattern>" --collection=high-activity --save
Batch Runner CLI Reference
Location: src/systems/trading-engine/batch-runner.ts
Usage:
bun src/systems/trading-engine/batch-runner.ts [options]
Options:
| Option | Short | Description |
|---|---|---|
--config=ID |
-c |
Strategy config ID (can specify multiple) |
--all-configs |
-a |
Run all available strategy configs |
--config-pattern=PAT |
-p |
Run configs matching name pattern (SQL LIKE) |
--strategy-type=TYPE |
Filter configs by strategy type (volatility, arb, grid) | |
| `–collection=NAME | ID` | -C |
--show-selection |
Preview collection selection and exit | |
--capital=CENTS |
Initial capital in cents (default: 10000 = $100) | |
--workers=N |
-w |
Number of parallel workers (default: 8) |
--save |
-S |
Save results to database |
--dry-run |
Show what would run without executing | |
--warmup-candles=N |
Pre-load N candles before segment for indicator warmup (default: 50) | |
--db-pool-max=N |
DB pool size for main batch process (default: 10) | |
--db-pool-max-worker=N |
DB pool size for worker processes (default: 3) |
Examples:
# Run one config on a segment collection
bun src/systems/trading-engine/bd-segments.ts create --name=swings100 --annotations=high_activity
bun src/systems/trading-engine/batch-runner.ts --config=abc123 --collection=swings100 --save
# Run all configs on a collection
bun src/systems/trading-engine/batch-runner.ts --all-configs --collection=swings50 --save
# Run all configs on specific segment IDs (snapshot collection)
bun src/systems/trading-engine/bd-segments.ts create --name=segment-set --segment-ids=949,950,951,952 --mode=snapshot
bun src/systems/trading-engine/batch-runner.ts --all-configs --collection=segment-set --save
# Run configs matching "RSI-*" on high_activity segments
bun src/systems/trading-engine/bd-segments.ts create --name=rsi-activity --annotations=high_activity
bun src/systems/trading-engine/batch-runner.ts --config-pattern="RSI-%" --collection=rsi-activity --save
# Dry run to see what would execute
bun src/systems/trading-engine/batch-runner.ts --all-configs --collection=swings100 --dry-run
Key CLI Flags (Quick Reference)
--collection=NAME|IDor-C: REQUIRED segment collection selector--show-selection: Preview resolved segment list and exit--config-pattern="TEST-%"or-p: Filter configs by name pattern--strategy-type=grid: Filter configs by strategy type
Phase 2: Comprehensive Analysis (ALWAYS RUN)
After simulations complete, run ALL of the following diagnostic queries. Present results in tables.
2.1 Core Stats
SELECT
sc.name,
COUNT(*) as runs,
ROUND(AVG(trade_count)::numeric, 1) as avg_trades,
ROUND(STDDEV(trade_count)::numeric, 1) as stddev_trades,
MIN(trade_count) as min_trades,
MAX(trade_count) as max_trades
FROM (
SELECT r.id, sc.name, COUNT(t.id) as trade_count
FROM sim_runs r
JOIN strategy_configs sc ON r.config_id = sc.id
LEFT JOIN sim_trades t ON t.run_id = r.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
GROUP BY r.id, sc.name
) sub
JOIN strategy_configs sc ON sc.name = sub.name
GROUP BY sc.name
ORDER BY avg_trades;
2.2 Entry Signal Frequency (RED FLAG: median < 60s is suspicious)
WITH entries AS (
SELECT
t.run_id, sc.name as config, t.entry_at,
LAG(t.entry_at) OVER (PARTITION BY t.run_id ORDER BY t.entry_at) as prev_entry
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
)
SELECT
config,
COUNT(*) as entry_gaps,
ROUND(AVG((entry_at - prev_entry)/1000.0)::numeric, 1) as avg_gap_sec,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (entry_at - prev_entry)/1000.0)::numeric, 1) as median_gap_sec,
COUNT(*) FILTER (WHERE entry_at - prev_entry < 1000) as gaps_under_1s,
COUNT(*) FILTER (WHERE entry_at - prev_entry < 5000) as gaps_under_5s,
COUNT(*) FILTER (WHERE entry_at - prev_entry < 60000) as gaps_under_1min
FROM entries
WHERE prev_entry IS NOT NULL
GROUP BY config
ORDER BY median_gap_sec;
Interpretation: Median gaps < 60s suggest overtrading. Healthy configs should have gaps in minutes (20-30min typical). Gaps < 1s are a MAJOR RED FLAG – indicates indicator firing constantly.
2.3 Position Overlap
-- How many concurrent positions on average?
WITH trade_events AS (
SELECT t.run_id, sc.name as config, t.entry_at as ts, 1 as delta
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
UNION ALL
SELECT t.run_id, sc.name as config, t.exit_at as ts, -1 as delta
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour' AND t.exit_at IS NOT NULL
),
running_pos AS (
SELECT run_id, config, ts,
SUM(delta) OVER (PARTITION BY run_id ORDER BY ts, delta DESC) as open_positions
FROM trade_events
)
SELECT
config,
MAX(open_positions) as max_concurrent,
ROUND(AVG(open_positions)::numeric, 2) as avg_concurrent,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY open_positions) as median_concurrent
FROM running_pos
GROUP BY config
ORDER BY avg_concurrent DESC;
2.4 Holding Duration Distribution
SELECT
sc.name as config,
CASE
WHEN t.holding_duration_ms < 30000 THEN '<30s'
WHEN t.holding_duration_ms < 60000 THEN '30-60s'
WHEN t.holding_duration_ms < 120000 THEN '1-2min'
WHEN t.holding_duration_ms < 180000 THEN '2-3min'
WHEN t.holding_duration_ms < 240000 THEN '3-4min'
WHEN t.holding_duration_ms < 300000 THEN '4-5min'
ELSE '5min+'
END as hold_bucket,
COUNT(*) as trades,
ROUND(AVG(t.net_pnl_cents)::numeric, 1) as avg_pnl,
ROUND(COUNT(*)::numeric / SUM(COUNT(*)) OVER (PARTITION BY sc.name) * 100, 1) as pct
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour' AND t.holding_duration_ms IS NOT NULL
GROUP BY sc.name, hold_bucket
ORDER BY sc.name, MIN(t.holding_duration_ms);
2.5 Exit Reason Analysis
SELECT
sc.name as config,
t.exit_reason,
COUNT(*) as trades,
ROUND(AVG(t.net_pnl_cents)::numeric, 1) as avg_pnl,
ROUND(COUNT(*)::numeric / SUM(COUNT(*)) OVER (PARTITION BY sc.name) * 100, 1) as pct
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
GROUP BY sc.name, t.exit_reason
ORDER BY sc.name, trades DESC;
2.6 YES vs NO Side Analysis (IMPORTANT – check for asymmetries)
SELECT
sc.name as config,
t.side,
COUNT(*) as trades,
ROUND(AVG(t.net_pnl_cents)::numeric, 1) as avg_pnl,
ROUND(AVG(CASE WHEN t.net_pnl_cents > 0 THEN 1 ELSE 0 END)::numeric * 100, 1) as win_pct,
ROUND(AVG(t.entry_price)::numeric, 1) as avg_entry_price,
ROUND(AVG(t.exit_price)::numeric, 1) as avg_exit_price,
ROUND(AVG(t.quantity)::numeric, 1) as avg_qty
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
GROUP BY sc.name, t.side
ORDER BY sc.name, t.side;
Interpretation: Significant asymmetry between YES/NO performance suggests bugs in price conversion or side selection logic. Both sides should have similar characteristics unless the strategy explicitly favors one.
2.7 Position Sizing Verification
-- Check if fills match expected sizes and align with order book
SELECT
sc.name as config,
t.quantity as fill_qty,
COUNT(*) as occurrences,
ROUND(AVG(t.entry_price)::numeric, 1) as avg_entry_price,
ROUND(AVG(t.net_pnl_cents)::numeric, 1) as avg_pnl
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
GROUP BY sc.name, t.quantity
ORDER BY sc.name, t.quantity;
Check: Does quantity match config’s maxTradeSize? Are partial fills happening? Compare with maxPositionPerMarket.
2.8 Trade Sequence Performance
WITH numbered_trades AS (
SELECT t.*, sc.name as config,
ROW_NUMBER() OVER (PARTITION BY t.run_id ORDER BY t.entry_at) as trade_num
FROM sim_trades t
JOIN sim_runs r ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
)
SELECT
config,
CASE
WHEN trade_num = 1 THEN '1st'
WHEN trade_num BETWEEN 2 AND 5 THEN '2-5th'
WHEN trade_num BETWEEN 6 AND 10 THEN '6-10th'
WHEN trade_num BETWEEN 11 AND 20 THEN '11-20th'
ELSE '21st+'
END as position,
COUNT(*) as trades,
ROUND(AVG(net_pnl_cents)::numeric, 1) as avg_pnl,
ROUND(AVG(CASE WHEN net_pnl_cents > 0 THEN 1 ELSE 0 END)::numeric * 100, 1) as win_pct
FROM numbered_trades
GROUP BY config, position
ORDER BY config, MIN(trade_num);
2.9 Market Segment Analysis
SELECT
ms.definition_name,
COUNT(DISTINCT r.id) as runs,
COUNT(t.id) as trades,
ROUND(AVG(r.total_pnl_cents)::numeric, 1) as avg_run_pnl,
ROUND(AVG((ms.metrics->>'swing_count')::int)::numeric, 1) as avg_swings,
ROUND(AVG((ms.metrics->>'reversion_rate')::float)::numeric, 2) as avg_reversion_rate
FROM sim_runs r
JOIN market_segments ms ON ms.source_meta->>'legacy_test_case_id' = r.test_case_id::text
LEFT JOIN sim_trades t ON t.run_id = r.id
JOIN strategy_configs sc ON r.config_id = sc.id
WHERE r.created_at > NOW() - INTERVAL '1 hour'
GROUP BY ms.definition_name
ORDER BY runs DESC;
2.10 Run Health Check
SELECT status, COUNT(*) as count
FROM sim_runs
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY status;
RED FLAG: Any “running” status after batch completes = crashed simulations.
Phase 3: Log Analysis (ALWAYS RUN)
Check recent simulation logs for anomalies:
# Find most recent sim log
ls -lt logs/sim/ | head -5
# Sample entries from most recent log with content
LOG=$(ls -ltS logs/sim/*.log | head -1 | awk '{print $NF}')
echo "Analyzing: $LOG ($(du -h "$LOG" | cut -f1))"
# Check log volume by category (PERFORMANCE RED FLAG: >1M indicator logs)
echo "Log volume by category:"
rg '"category"' "$LOG" | jq -r '.category' 2>/dev/null | sort | uniq -c | sort -rn
# Check run duration from timestamps
echo "Run duration:"
FIRST_TS=$(head -1 "$LOG" | jq -r '.time' 2>/dev/null)
LAST_TS=$(tail -1 "$LOG" | jq -r '.time' 2>/dev/null)
if [ "$FIRST_TS" != "null" ] && [ "$LAST_TS" != "null" ]; then
echo "Duration: $(( (LAST_TS - FIRST_TS) / 1000 )) seconds"
fi
# Count exit reasons
echo "Exit reason distribution:"
rg '"reason"' "$LOG" | jq -r '.reason' 2>/dev/null | sort | uniq -c | sort -rn | head -20
# Check for errors
echo "Errors in log:"
rg -i '"level":50' "$LOG" | head -10
# Check for warnings
echo "Warnings in log:"
rg -i '"level":40' "$LOG" | head -10
Look for:
- Unusual exit reasons (not profit_target, time_limit, collapse, end_of_data)
- Error level (50) entries
- Patterns suggesting bugs (repeated identical entries, missing fields)
- PERFORMANCE: Log file > 500MB = excessive logging
- PERFORMANCE: >1M indicator category logs = debug logging bottleneck
Phase 3.5: Performance Analysis
Check simulation throughput:
-- Runs per second over recent batch
SELECT
DATE_TRUNC('minute', created_at) as minute,
COUNT(*) as runs,
ROUND(COUNT(*)::numeric / 60, 1) as runs_per_sec
FROM sim_runs
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY minute
ORDER BY minute DESC
LIMIT 10;
Expected throughput: 20-40 runs/sec with tick caching. If < 10 runs/sec, investigate:
- Is tick cache being used? (check for “Pre-loaded” in batch script output)
- Is indicator debug logging enabled? (check log size)
- Are workers I/O bound? (check worker count vs CPU cores)
Phase 4: Analysis Summary
DUAL FOCUS: Surface both accuracy concerns AND profit insights.
First: Accuracy Check (Priority #1)
Flag any discrepancies between expected vs actual behavior:
- Does config behavior match parameters?
- Are there unexplained patterns or anomalies?
- Do metrics make logical sense?
Accuracy issues take precedence. A 1% accuracy bug can invalidate profit analysis.
Then: Profit Insights (Priority #2)
With accuracy validated, analyze:
- Which configs are most profitable?
- What patterns emerge across market conditions?
- Are there optimization opportunities?
Validation Questions to Answer
For each finding, ask: “Does this match what we’d expect? If not, why?”
| Observation | Expected Behavior | Actual | Match? | If No, Investigate |
|---|---|---|---|---|
| Entry gaps | Based on indicator params | ? | ? | Indicator calculation? Candle aggregation? |
| Exit reasons | Balanced mix | ? | ? | Exit logic? Threshold bugs? |
| YES vs NO performance | Similar (market is symmetric) | ? | ? | Price conversion? Side selection? |
| Position sizes | Match config maxTradeSize | ? | ? | Fill logic? Depth parsing? |
| Holding durations | Spread across buckets | ? | ? | Exit conditions firing? |
| Trade sequence | Later trades â earlier | ? | ? | State accumulation bugs? |
| Bollinger/other indicators | Should trigger sometimes | ? | ? | Indicator implementation? |
Key Validation Checks
-
Indicator Firing Frequency
- Does the gap between entries match what the indicator period implies?
- EMA(20) with 60s candles = 20 min of data needed. Are we seeing entries faster than that makes sense?
-
Price/Side Consistency
- YES and NO should be mirror images. Major asymmetry = likely bug in price handling.
- Check: Are we converting bid/ask correctly for each side?
-
Config Parameters Actually Applied
- Is stopLossCents actually triggering at that price?
- Is profitTargetCents actually triggering?
- Is maxSpreadToEnter being checked?
-
Data Flow Integrity
- Are ticks reaching the strategy in order?
- Is depth data being parsed correctly?
- Are fills being recorded accurately?
Phase 5: Findings & Next Steps
Based on analysis, organize findings into:
Accuracy Concerns (Surface First)
- Likely Bugs – Discrepancies that suggest code errors
- “X should be Y but we see Z” â investigate code path
- Unclear Behavior – Things we don’t understand yet
- “We expected X, got Y, not sure why” â needs deeper investigation
- Confirmed Working – Things that match expectations
- “This behaves as expected” â increased confidence
Profit Insights (Surface Second)
- Top Performers – Which configs show best results?
- Market Patterns – What conditions favor which strategies?
- Optimization Opportunities – Parameter tuning suggestions
Next Steps
- Investigation priorities for accuracy concerns
- Experiments to validate profit hypotheses
- Config variations to test
Remember: Accuracy comes first. Surface discrepancies prominently, then provide profit optimization insights. Both matter, but a small accuracy bug can invalidate all profit analysis.