prove-it

📁 tkersey/dotfiles 📅 Today
3
总安装量
1
周安装量
#55620
全站排名
安装命令
npx skills add https://github.com/tkersey/dotfiles --skill prove-it

Agent 安装分布

amp 1
cline 1
opencode 1
cursor 1
continue 1
kimi-cli 1

Skill 文档

Prove It

When to use

  • The user asserts certainty: “always”, “never”, “guaranteed”, “optimal”, “cannot fail”, “no downside”, “100%”.
  • The user asks for a devil’s advocate or proof.
  • The claim feels too clean for the domain.

Round cadence (mandatory)

  • Definition: one “turn” means one assistant reply.
  • Default: autoloop (no approvals). Run exactly one gauntlet round per assistant turn, publish results, then continue on the next turn until Oracle synthesis.
  • In default mode, after each round, publish:
    • Round Ledger
    • Knowledge Delta
  • If confidence remains low after Oracle synthesis, continue with additional rounds (11+) and publish an updated Oracle synthesis.
  • Do not ask for permission to continue. In default mode, do not wait for “next” between rounds. Pause only when you must ask the user a question or the user says “stop”.
  • Step mode (explicit): if the user asks to “pause” / “step” / “one round at a time”, run one round then wait for “next”.
  • Full auto mode (explicit): if the user asks for “full auto” / “fast mode”, run rounds 1-10 + Oracle synthesis in one assistant turn while still reporting each round in order.

Mode invocation

Mode Default? How to invoke Cadence
Autoloop yes (no phrase) 1 round/turn; auto-continue until Oracle
Step mode no “step mode” / “pause each round” / “pause” / “step” / “one round at a time” 1 round/turn; wait for “next”
Full auto no “full auto” / “fast mode” rounds 1-10 + Oracle in one turn; publish Round Ledger + Knowledge Delta after each round

Quick start

  1. Restate the claim and its scope.
  2. Default to autoloop. If the user explicitly requests “step mode” or “full auto”, use that instead.
  3. Run round 1 and publish the Round Ledger + Knowledge Delta.
  4. Continue automatically with one round per turn until round 10 (Oracle synthesis).
  5. If confidence remains low, run additional rounds (11+) and publish an updated Oracle synthesis.

Ten-round gauntlet

  1. Counterexamples: smallest concrete break.
  2. Logic traps: missing quantifiers/premises.
  3. Boundary cases: zero/one/max/empty/extreme scale.
  4. Adversarial inputs: worst-case distributions/abuse.
  5. Alternative paradigms: different model flips the conclusion.
  6. Operational constraints: latency/cost/compliance/availability.
  7. Probabilistic uncertainty: variance, tail risk, sampling bias.
  8. Comparative baselines: “better than what?”, which metric?
  9. Meta-test: fastest disproof experiment.
  10. Oracle synthesis: tightest surviving claim with boundaries. If confidence is still low, repeat rounds 1-9 as needed, then re-run Oracle synthesis.

Round self-prompt bank (pick exactly 1)

Internal self-prompts for selecting round focus. Do not ask the user unless blocked.

  • Counterexamples: What is the smallest input that breaks this?
  • Logic traps: What unstated assumption must hold?
  • Boundary cases: Which boundary is most likely in real use?
  • Adversarial: What does worst-case input look like?
  • Alternative paradigm: What objective makes the opposite true?
  • Operational: Which dependency/policy is a hard stop?
  • Uncertainty: What distribution shift flips the result?
  • Baseline: Better than what, on which metric?
  • Meta-test: What experiment would change your mind fastest?
  • Oracle: What explicit boundaries keep this honest?

Core artifacts

Argument map

Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:

Round Ledger (update every round)

Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:

Knowledge Delta (publish every round)

- New:
- Updated:
- Invalidated:

Claim boundary table

| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale         |           |              |             |           |
| Data quality  |           |              |             |           |
| Environment   |           |              |             |           |
| Adversary     |           |              |             |           |

Next-tests plan

| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|

Domain packs

Performance

Use when the claim is about speed, latency, throughput, or resources.

  • Clarify: median vs tail latency vs throughput.
  • Identify workload shape (spiky vs steady) and bottleneck resource.

Product

Use when the claim is about user impact, adoption, or behavior.

  • Clarify user segment and success metric.
  • State the baseline/counterfactual.
  • Name the likely unintended behavior/tradeoff.

Oracle synthesis template (round 10 / as needed)

Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...

Deliverable format (per turn)

  • Round number + focus.
  • Round Ledger + Knowledge Delta.
  • At most one question for the user (only when blocked).
  • In default autoloop, run one round in that turn and continue to the next round in the next turn.
  • In step mode, run one round and wait for “next”.
  • In full auto (or “fast mode”), run rounds 1-10 + Oracle synthesis in one turn (repeat the above per round).

Activation cues

  • “always” / “never” / “guaranteed” / “optimal” / “cannot fail” / “no downside” / “100%”
  • “prove it” / “devil’s advocate” / “stress test” / “rigor”