test-property-based
npx skills add https://github.com/dawiddutoit/custom-claude --skill test-property-based
Agent 安装分布
Skill 文档
Property-Based Testing with Hypothesis
Quick Start
Property-based testing automatically generates hundreds of test cases to validate invariants:
from hypothesis import given, strategies as st
# Instead of writing many example tests...
# def test_sort_1(): assert sorted([3,1,2]) == [1,2,3]
# def test_sort_2(): assert sorted([]) == []
# ... (20 more examples)
# Write ONE property test that covers ALL cases
@given(st.lists(st.integers()))
def test_sort_idempotent(lst):
"""Property: Sorting twice gives same result as once."""
once_sorted = sorted(lst)
twice_sorted = sorted(once_sorted)
assert once_sorted == twice_sorted
Hypothesis automatically generates 100+ test cases including edge cases you’d never think of: empty lists, single elements, duplicates, large lists, negative numbers, etc.
Table of Contents
- When to Use This Skill
- What This Skill Does
- Core Concepts
- Step-by-Step Workflow
- Common Property Patterns
- Integration with pytest
- Async Property Testing
- Pydantic Model Testing
- Configuration
- Supporting Files
- Expected Outcomes
- Requirements
- Red Flags to Avoid
When to Use This Skill
Explicit Triggers
Use this skill when users mention:
- “property test”
- “hypothesis test”
- “generate test cases”
- “fuzz testing”
- “invariant testing”
- “roundtrip test”
- “stateful testing”
- “edge case testing”
- “test with random data”
Implicit Triggers
Use when you observe:
- Manual writing of many similar example tests
- Testing parsing/serialization (perfect for roundtrip properties)
- Validating configuration classes (especially Pydantic models)
- Testing algorithms with mathematical properties
- Protocol message handling (IPC, API requests/responses)
- State machine behavior
Debugging Triggers
Use when:
- Edge case bugs slip through example-based tests
- Need more comprehensive input coverage
- Test suite misses corner cases
- Validating refactored code behavior
What This Skill Does
This skill guides you through:
- Installing Hypothesis – Add to project dependencies
- Writing property tests – Transform example tests into property-based tests
- Choosing strategies – Select appropriate data generators
- Creating custom strategies – Build domain-specific generators
- Async integration – Combine with pytest-asyncio
- Pydantic integration – Test Pydantic models automatically
- Configuration – Set up profiles for dev/CI/thorough testing
- Stateful testing – Test state machines and complex workflows
Philosophy: Instead of “here are 5 examples that should work”, write “here’s a property that should ALWAYS hold” and let Hypothesis find edge cases.
Core Concepts
Strategies
Strategies describe the type of data Hypothesis should generate:
from hypothesis import strategies as st
# Basic types
st.integers() # All integers
st.integers(min_value=0, max_value=100) # Constrained range
st.floats(allow_nan=False) # Floats without NaN
st.text() # Unicode strings
st.text(alphabet="abc", min_size=1) # Limited alphabet
st.binary() # Bytes
# Collections
st.lists(st.integers()) # Lists of integers
st.dictionaries(st.text(), st.integers()) # Dict[str, int]
st.sets(st.text(), min_size=1) # Non-empty sets
st.tuples(st.text(), st.integers()) # (str, int) tuples
# Special
st.one_of(st.integers(), st.text()) # Union types
st.none() # None values
st.uuids() # UUID objects
st.datetimes() # datetime objects
See references/strategies-reference.md for complete strategy catalog.
The @given Decorator
The @given decorator runs your test function with generated data:
from hypothesis import given, strategies as st
@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
"""Addition should be commutative."""
assert a + b == b + a
@given(st.lists(st.integers()))
def test_sort_preserves_length(lst):
"""Sorting preserves list length."""
assert len(sorted(lst)) == len(lst)
Default behavior: Runs 100 examples (configurable via settings).
Shrinking
When Hypothesis finds a failing test, it automatically minimizes the input:
@given(st.lists(st.integers()))
def test_sum_positive(lst):
assert sum(lst) >= 0 # Fails for negative numbers
# Hypothesis reports: lst=[-1]
# NOT lst=[-9999, -42, -1, -8888] (the random case it found)
This is invaluable for debugging – you get the minimal failing case, not a complex random one.
Custom Strategies
For complex domain objects, build custom strategies with @composite:
from hypothesis import strategies as st
from hypothesis.strategies import composite
@composite
def valid_emails(draw):
"""Generate valid email addresses."""
username = draw(st.text(alphabet=st.characters(
whitelist_categories=('Ll', 'Lu', 'Nd'),
min_codepoint=ord('a')
), min_size=1, max_size=20))
domain = draw(st.text(alphabet=st.characters(
whitelist_categories=('Ll',),
min_codepoint=ord('a')
), min_size=1, max_size=15))
tld = draw(st.sampled_from(['com', 'org', 'net', 'io']))
return f"{username}@{domain}.{tld}"
@given(valid_emails())
def test_email_parsing(email):
"""Test parsing of valid email addresses."""
assert '@' in email
assert '.' in email.split('@')[1]
See references/patterns-catalog.md for more custom strategy patterns.
Step-by-Step Workflow
Step 1: Install Hypothesis
# Add to project dependencies
uv add --dev hypothesis
# Verify installation
python -c "import hypothesis; print(hypothesis.__version__)"
Step 2: Identify Properties to Test
Look for:
- Invariants – Things that should always be true
- Roundtrips – Serialize â Deserialize â Should equal original
- Idempotency – Operation twice = operation once
- Commutativity – Order doesn’t matter
- Consistency – Related operations agree
Example: Testing a JSON serializer:
- Property:
parse(serialize(obj)) == obj(roundtrip) - Property:
serialize(obj)returns valid JSON string - Property: All serialized objects are parseable
Step 3: Choose Strategies
Map your data types to Hypothesis strategies:
# Simple types
int â st.integers()
str â st.text()
bool â st.booleans()
# Collections
List[int] â st.lists(st.integers())
Dict[str, int] â st.dictionaries(st.text(), st.integers())
Optional[str] â st.one_of(st.text(), st.none())
# Domain models (Pydantic)
MyModel â builds(MyModel)
Step 4: Write Property Test
from hypothesis import given, strategies as st
@given(st.dictionaries(st.text(), st.text()))
def test_json_roundtrip(data):
"""Property: All dicts should roundtrip through JSON."""
import json
serialized = json.dumps(data)
parsed = json.loads(serialized)
assert parsed == data
Step 5: Run and Observe
# Run property test
pytest tests/test_properties.py -v
# Show statistics
pytest --hypothesis-show-statistics
# Reproduce specific failure
pytest --hypothesis-seed=12345
Step 6: Refine if Needed
If test generates invalid inputs:
- Add constraints to strategy
- Use
assume()to filter (sparingly) - Create custom strategy with
@composite
from hypothesis import given, assume, strategies as st
@given(st.lists(st.integers()))
def test_with_filtering(lst):
# AVOID: Too much filtering (slow)
assume(len(lst) > 0) # Better: st.lists(st.integers(), min_size=1)
assume(all(x >= 0 for x in lst)) # Better: st.lists(st.integers(min_value=0))
...
Common Property Patterns
1. Roundtrip Testing
Pattern: Serialize â Deserialize â Should equal original
@given(builds(MyModel))
def test_model_json_roundtrip(model):
"""Property: Models roundtrip through JSON."""
json_str = model.model_dump_json()
restored = MyModel.model_validate_json(json_str)
assert restored == model
2. Invariant Testing
Pattern: Some property should always hold
@given(st.lists(st.integers()))
def test_sort_ordered(lst):
"""Property: Sorted list should be in ascending order."""
sorted_lst = sorted(lst)
for i in range(len(sorted_lst) - 1):
assert sorted_lst[i] <= sorted_lst[i + 1]
3. Idempotency Testing
Pattern: Operation twice = operation once
@given(st.text())
def test_normalize_idempotent(text):
"""Property: Normalizing twice gives same result."""
once = normalize(text)
twice = normalize(once)
assert once == twice
4. Commutativity Testing
Pattern: Order doesn’t matter
@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
"""Property: a + b == b + a."""
assert a + b == b + a
5. Consistency Testing
Pattern: Different paths to same result should agree
@given(st.lists(st.integers()))
def test_sum_consistency(lst):
"""Property: Manual sum equals built-in sum."""
manual_sum = 0
for x in lst:
manual_sum += x
assert manual_sum == sum(lst)
See references/patterns-catalog.md for 15+ common patterns.
Integration with pytest
Hypothesis works seamlessly with pytest:
import pytest
from hypothesis import given, strategies as st
# Combine with fixtures
@pytest.fixture
def temp_config(tmp_path):
"""Fixture providing temp configuration."""
return Config(data_dir=tmp_path)
@given(st.text())
def test_with_fixture(temp_config, text):
"""Hypothesis + fixture: temp_config from fixture, text from Hypothesis."""
result = process_with_config(temp_config, text)
assert result is not None
Important: Fixtures are called once per test function, not once per Hypothesis example (100 runs).
pytest Command-Line Options
# Show statistics about data generation
pytest --hypothesis-show-statistics
# Use a specific Hypothesis profile
pytest --hypothesis-profile=ci
# Set verbosity level
pytest --hypothesis-verbosity=debug
# Reproduce a specific failure
pytest --hypothesis-seed=12345
Async Property Testing
Basic Async Pattern
Hypothesis works with pytest-asyncio:
import pytest
from hypothesis import given, strategies as st
@pytest.mark.asyncio
@given(st.text())
async def test_async_property(text):
"""Property test for async function."""
result = await async_process(text)
assert isinstance(result, str)
Critical: Decorator Order
MUST follow this order:
@pytest.mark.asyncio # Innermost (closest to function)
@given(st.text()) # Outermost
async def test_async_property(text):
pass
If you get “Hypothesis doesn’t know how to run async test functions”, check decorator order.
Example: Testing Async IPC
import pytest
from hypothesis import given, strategies as st
@pytest.mark.asyncio
@given(
command=st.sampled_from(["execute", "status", "cancel"]),
prompt=st.text(),
correlation_id=st.uuids().map(str)
)
async def test_ipc_command_roundtrip(command, prompt, correlation_id):
"""Property: All IPC commands should roundtrip through serialization."""
request = create_command_request(
command=command,
prompt=prompt,
correlation_id=correlation_id
)
import json
serialized = json.dumps(request)
deserialized = json.loads(serialized)
assert deserialized == request
assert deserialized["command"] == command
Pydantic Model Testing
Hypothesis automatically supports Pydantic models:
from hypothesis import given
from hypothesis.strategies import builds
from pydantic import BaseModel, EmailStr, PositiveFloat
class PaymentModel(BaseModel):
amount: PositiveFloat
email: EmailStr
description: str
# Hypothesis automatically respects Pydantic constraints!
@given(builds(PaymentModel))
def test_payment_validation(payment):
"""Hypothesis generates valid PaymentModel instances."""
assert payment.amount > 0
assert '@' in payment.email
assert isinstance(payment.description, str)
Overriding Specific Fields
@given(builds(
PaymentModel,
amount=st.floats(min_value=100, max_value=1000),
description=st.text(min_size=10, max_size=100)
))
def test_large_payments(payment):
"""Test with payments between $100-$1000."""
assert 100 <= payment.amount <= 1000
assert 10 <= len(payment.description) <= 100
Testing Configuration Models
from hypothesis import given, strategies as st
from hypothesis.strategies import builds
from my_project.config import AgentConfig
@given(builds(AgentConfig))
def test_agent_config_invariants(config):
"""Any valid AgentConfig should satisfy these invariants."""
assert config.agent_id is not None
assert config.system_prompt is not None
assert len(config.agent_id) > 0
Configuration
Profile Setup (conftest.py)
Create profiles for different environments:
# tests/conftest.py
from hypothesis import settings, HealthCheck
# Configure Hypothesis profiles
settings.register_profile(
"ci",
max_examples=200,
deadline=1000 # milliseconds
)
settings.register_profile(
"dev",
max_examples=50,
deadline=None
)
settings.register_profile(
"thorough",
max_examples=1000,
deadline=None,
suppress_health_check=[HealthCheck.too_slow]
)
# Activate based on environment
import os
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))
Per-Test Settings
from hypothesis import given, settings, strategies as st
@settings(max_examples=1000, deadline=None)
@given(st.integers())
def test_expensive_operation(n):
"""Run 1000 examples with no time limit."""
result = very_slow_computation(n)
assert result >= 0
Configuration Options
| Option | Default | Description |
|---|---|---|
max_examples |
100 | Number of test cases to generate |
deadline |
200ms | Time limit per test case |
suppress_health_check |
[] | Disable specific warnings |
verbosity |
normal | Output verbosity (quiet, normal, verbose, debug) |
derandomize |
False | Use deterministic randomness |
Supporting Files
references/strategies-reference.md
Complete catalog of built-in Hypothesis strategies with examples:
- Basic types (integers, floats, text, binary)
- Collections (lists, sets, dicts, tuples)
- Special types (UUIDs, datetimes, emails)
- Combinators (one_of, builds, recursive)
- Advanced patterns (composite, shared, data)
references/patterns-catalog.md
Common property test patterns with examples:
- Roundtrip testing (serialization, encoding)
- Invariant testing (order, size, consistency)
- Idempotency testing (normalization, deduplication)
- Commutativity testing (operations, transformations)
- State machine testing (lifecycle, protocols)
templates/property-test-templates.md
Copy-paste ready templates for:
- Basic property test
- Async property test
- Pydantic model property test
- Custom strategy
- State machine test
- conftest.py Hypothesis configuration
Expected Outcomes
Successful Property Test Creation
â Property Tests Added
Module: tests/unit/test_ipc_protocol_properties.py
Properties tested:
- JSON roundtrip for command requests
- Correlation ID preservation
- Valid command types
Generated examples: 100 per property
Edge cases found: 0 (all tests passed)
Test results:
â All properties hold
â 300 examples generated (3 properties à 100 each)
â No shrinking needed (no failures)
Configuration:
Profile: dev (50 examples/property)
Deadline: None (development)
Time: 2.3 seconds
Confidence: High (comprehensive input coverage)
Property Test Finding Bug
â ï¸ Property Violation Found
Test: test_json_roundtrip
Property: All dicts should roundtrip through JSON
Falsifying example: data={'key': float('inf')}
Error: JSON cannot serialize infinity
Shrinking: Reduced from complex dict to minimal case
Root cause: Missing validation for special float values
Fix required: Add constraint to strategy or handle inf/nan
Next steps:
1. Decide: Should code handle inf/nan or reject them?
2. Update strategy: st.floats(allow_nan=False, allow_infinity=False)
3. OR: Add validation in serializer
4. Re-run property tests to verify fix
Requirements
Tools needed:
- Bash (for running tests)
- Read (for examining test files)
- Grep (for finding test patterns)
- Glob (for file discovery)
- Edit/Write (for creating/modifying tests)
Dependencies:
- Python 3.8+
- pytest
- hypothesis (install with:
uv add --dev hypothesis) - pytest-asyncio (for async tests)
Test Framework:
- pytest with Hypothesis integration
- pytest-asyncio for async property tests
Knowledge:
- Basic understanding of property-based testing concepts
- Familiarity with pytest
- Understanding of type annotations (helpful for strategies)
Red Flags to Avoid
â WRONG: Over-Constraining Strategies
# BAD: Too specific, loses property testing benefits
@given(st.integers(min_value=42, max_value=42))
def test_specific_value(n):
assert n == 42 # This is just an example test!
â RIGHT: Test properties that hold for all inputs
@given(st.integers())
def test_absolute_value_non_negative(n):
assert abs(n) >= 0
â WRONG: Filtering Too Much
# BAD: Rejecting most generated examples
@given(st.integers())
def test_primes(n):
assume(is_prime(n)) # Rejects 99% of inputs!
# ... test code
â RIGHT: Use a strategy that generates valid inputs
@composite
def primes(draw):
return draw(st.sampled_from([2, 3, 5, 7, 11, 13, 17, 19, 23, 29]))
@given(primes())
def test_primes(n):
# All inputs are primes
â WRONG: Testing Implementation, Not Properties
# BAD: Duplicating implementation in test
@given(st.lists(st.integers()))
def test_sum_implementation(lst):
result = sum(lst)
# Bad: Reimplementing sum() in test
expected = 0
for item in lst:
expected += item
assert result == expected
â RIGHT: Test properties
@given(st.lists(st.integers()))
def test_sum_commutative(lst):
assert sum(lst) == sum(reversed(lst))
@given(st.lists(st.integers()))
def test_sum_with_zero(lst):
assert sum(lst + [0]) == sum(lst)
â WRONG: Wrong Decorator Order for Async
# BAD: Will fail with "Hypothesis doesn't know how to run async"
@given(st.text())
@pytest.mark.asyncio
async def test_async_property(text):
pass
â RIGHT: pytest.mark.asyncio innermost
@pytest.mark.asyncio
@given(st.text())
async def test_async_property(text):
pass
â WRONG: Not Using Pydantic Integration
# BAD: Manually constructing Pydantic models
@given(
amount=st.floats(min_value=0.01),
email=st.text(), # Not valid emails!
)
def test_payment(amount, email):
payment = PaymentModel(amount=amount, email=email) # Will fail validation
â RIGHT: Use builds() for Pydantic models
@given(builds(PaymentModel))
def test_payment(payment):
# Hypothesis automatically generates valid instances
assert payment.amount > 0
â WRONG: Mixing Hypothesis with pytest.mark.parametrize
# BAD: Redundant - Hypothesis already does this
@pytest.mark.parametrize("n", [1, 2, 3, 4, 5])
@given(st.integers())
def test_redundant(n, generated_int):
# Why both? Pick one approach!
pass
â RIGHT: Use Hypothesis for data generation
@given(st.integers(min_value=1, max_value=5))
def test_small_integers(n):
assert 1 <= n <= 5
Notes
Start Small:
- Pick one simple function to test
- Write one property test
- Run it, observe results
- Expand to more properties
Think in Properties, Not Examples:
- Instead of: “sort([3,1,2]) == [1,2,3]”
- Think: “sorted list should be ordered” (invariant)
- Or: “sorting twice == sorting once” (idempotency)
- Or: “sort preserves all elements” (conservation)
Hypothesis Finds Edge Cases You Miss:
- Empty collections
- Single elements
- Duplicates
- Very large/small numbers
- Unicode edge cases
- Boundary conditions
When Property Tests Fail:
- Read the minimal failing example (shrinking gives you this)
- Understand why the property doesn’t hold
- Decide: Is code wrong or property too strict?
- Fix and re-run
Further Reading:
- Hypothesis documentation: https://hypothesis.readthedocs.io/
- Strategies reference: references/strategies-reference.md
- Pattern catalog: references/patterns-catalog.md