prove-it
3
总安装量
1
周安装量
#55620
全站排名
安装命令
npx skills add https://github.com/tkersey/dotfiles --skill prove-it
Agent 安装分布
amp
1
cline
1
opencode
1
cursor
1
continue
1
kimi-cli
1
Skill 文档
Prove It
When to use
- The user asserts certainty: âalwaysâ, âneverâ, âguaranteedâ, âoptimalâ, âcannot failâ, âno downsideâ, â100%â.
- The user asks for a devilâs advocate or proof.
- The claim feels too clean for the domain.
Round cadence (mandatory)
- Definition: one “turn” means one assistant reply.
- Default: autoloop (no approvals). Run exactly one gauntlet round per assistant turn, publish results, then continue on the next turn until Oracle synthesis.
- In default mode, after each round, publish:
- Round Ledger
- Knowledge Delta
- If confidence remains low after Oracle synthesis, continue with additional rounds (11+) and publish an updated Oracle synthesis.
- Do not ask for permission to continue. In default mode, do not wait for “next” between rounds. Pause only when you must ask the user a question or the user says “stop”.
- Step mode (explicit): if the user asks to “pause” / “step” / “one round at a time”, run one round then wait for “next”.
- Full auto mode (explicit): if the user asks for “full auto” / “fast mode”, run rounds 1-10 + Oracle synthesis in one assistant turn while still reporting each round in order.
Mode invocation
| Mode | Default? | How to invoke | Cadence |
|---|---|---|---|
| Autoloop | yes | (no phrase) | 1 round/turn; auto-continue until Oracle |
| Step mode | no | “step mode” / “pause each round” / “pause” / “step” / “one round at a time” | 1 round/turn; wait for “next” |
| Full auto | no | “full auto” / “fast mode” | rounds 1-10 + Oracle in one turn; publish Round Ledger + Knowledge Delta after each round |
Quick start
- Restate the claim and its scope.
- Default to autoloop. If the user explicitly requests “step mode” or “full auto”, use that instead.
- Run round 1 and publish the Round Ledger + Knowledge Delta.
- Continue automatically with one round per turn until round 10 (Oracle synthesis).
- If confidence remains low, run additional rounds (11+) and publish an updated Oracle synthesis.
Ten-round gauntlet
- Counterexamples: smallest concrete break.
- Logic traps: missing quantifiers/premises.
- Boundary cases: zero/one/max/empty/extreme scale.
- Adversarial inputs: worst-case distributions/abuse.
- Alternative paradigms: different model flips the conclusion.
- Operational constraints: latency/cost/compliance/availability.
- Probabilistic uncertainty: variance, tail risk, sampling bias.
- Comparative baselines: âbetter than what?â, which metric?
- Meta-test: fastest disproof experiment.
- Oracle synthesis: tightest surviving claim with boundaries. If confidence is still low, repeat rounds 1-9 as needed, then re-run Oracle synthesis.
Round self-prompt bank (pick exactly 1)
Internal self-prompts for selecting round focus. Do not ask the user unless blocked.
- Counterexamples: What is the smallest input that breaks this?
- Logic traps: What unstated assumption must hold?
- Boundary cases: Which boundary is most likely in real use?
- Adversarial: What does worst-case input look like?
- Alternative paradigm: What objective makes the opposite true?
- Operational: Which dependency/policy is a hard stop?
- Uncertainty: What distribution shift flips the result?
- Baseline: Better than what, on which metric?
- Meta-test: What experiment would change your mind fastest?
- Oracle: What explicit boundaries keep this honest?
Core artifacts
Argument map
Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:
Round Ledger (update every round)
Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:
Knowledge Delta (publish every round)
- New:
- Updated:
- Invalidated:
Claim boundary table
| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale | | | | |
| Data quality | | | | |
| Environment | | | | |
| Adversary | | | | |
Next-tests plan
| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|
Domain packs
Performance
Use when the claim is about speed, latency, throughput, or resources.
- Clarify: median vs tail latency vs throughput.
- Identify workload shape (spiky vs steady) and bottleneck resource.
Product
Use when the claim is about user impact, adoption, or behavior.
- Clarify user segment and success metric.
- State the baseline/counterfactual.
- Name the likely unintended behavior/tradeoff.
Oracle synthesis template (round 10 / as needed)
Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...
Deliverable format (per turn)
- Round number + focus.
- Round Ledger + Knowledge Delta.
- At most one question for the user (only when blocked).
- In default autoloop, run one round in that turn and continue to the next round in the next turn.
- In step mode, run one round and wait for “next”.
- In full auto (or “fast mode”), run rounds 1-10 + Oracle synthesis in one turn (repeat the above per round).
Activation cues
- “always” / “never” / “guaranteed” / “optimal” / “cannot fail” / “no downside” / “100%”
- “prove it” / “devil’s advocate” / “stress test” / “rigor”