data-validation

📁 majesticlabs-dev/majestic-marketplace 📅 Feb 5, 2026
12
总安装量
12
周安装量
#25749
全站排名
安装命令
npx skills add https://github.com/majesticlabs-dev/majestic-marketplace --skill data-validation

Agent 安装分布

opencode 12
claude-code 11
github-copilot 11
codex 11
gemini-cli 11
cursor 11

Skill 文档

Data Validation

Audience: Data engineers building validation pipelines.

Goal: Provide validation patterns for custom business rules.

Framework-specific skills:

  • pydantic-validation – Record-level validation with Pydantic
  • pandera-validation – DataFrame schema validation
  • great-expectations – Pipeline expectations and monitoring

Scripts

Execute validation functions from scripts/validators.py:

from scripts.validators import (
    ValidationResult,
    DataValidator,
    validate_no_duplicates,
    validate_referential_integrity,
    validate_date_range,
    validate_value_in_set,
    run_validation_pipeline,
    validate_with_schema_version,
    assert_schema_match,
    assert_no_nulls,
    assert_unique,
    assert_values_in_set
)

Framework Selection

Use Case Framework
API request/response Pydantic
Record-by-record ETL Pydantic
DataFrame validation Pandera
Type hints for DataFrames Pandera
Pipeline monitoring Great Expectations
Data warehouse checks Great Expectations
Custom business rules Custom functions (this skill)

Usage Examples

Basic Validation

from scripts.validators import validate_no_duplicates, validate_referential_integrity

# Check duplicates
result = validate_no_duplicates(df, cols=['id'])
if not result.passed:
    print(f"Error: {result.message}")
    print(result.failed_rows)

# Check referential integrity
result = validate_referential_integrity(df, 'user_id', users_df, 'id')

Validation Pipeline

from scripts.validators import DataValidator, validate_no_duplicates, validate_date_range

validator = DataValidator()
validator.add_check(lambda df: validate_no_duplicates(df, ['id']))
validator.add_check(lambda df: validate_date_range(df, 'created_at', '2020-01-01', '2025-12-31'))

results = validator.validate(df)
if not results['passed']:
    for check in results['checks']:
        if not check['passed']:
            print(f"Failed: {check['message']}")

Config-Driven Pipeline

from scripts.validators import run_validation_pipeline

config = {
    'unique_columns': ['id'],
    'date_ranges': {
        'created_at': ('2020-01-01', '2025-12-31'),
        'updated_at': ('2020-01-01', '2025-12-31')
    }
}

clean_df, results = run_validation_pipeline(df, config)

Test Assertions

from scripts.validators import assert_schema_match, assert_no_nulls, assert_unique

# In pytest
def test_data_quality():
    assert_schema_match(df, {'id': 'int64', 'email': 'object'})
    assert_no_nulls(df, ['id', 'email'])
    assert_unique(df, ['id'])

Dependencies

pandas