great-expectations

📁 majesticlabs-dev/majestic-marketplace 📅 8 days ago
12
总安装量
2
周安装量
#26686
全站排名
安装命令
npx skills add https://github.com/majesticlabs-dev/majestic-marketplace --skill great-expectations

Agent 安装分布

opencode 2
claude-code 2
replit 2
openhands 1
zencoder 1

Skill 文档

Great Expectations

Audience: Data engineers building validated data pipelines.

Goal: Provide GX patterns for expectation-based validation and monitoring.

Scripts

Execute GX functions from scripts/expectations.py:

from scripts.expectations import (
    get_pandas_context,
    add_dataframe_asset,
    create_basic_suite,
    run_validation
)

Usage Examples

Quick Setup

from scripts.expectations import get_pandas_context, add_dataframe_asset

context, datasource = get_pandas_context("my_datasource")
batch_request = add_dataframe_asset(datasource, "users", df)

Create Expectation Suite

from scripts.expectations import create_basic_suite

columns_config = {
    'user_id': {'not_null': True, 'unique': True, 'type': 'int'},
    'age': {'min': 0, 'max': 150},
    'status': {'values': ['active', 'inactive', 'pending']},
    'email': {'regex': r'^[\w\.-]+@[\w\.-]+\.\w+$'}
}

suite = create_basic_suite(context, "user_suite", columns_config)

Run Validation

from scripts.expectations import run_validation

results = run_validation(
    context,
    checkpoint_name="user_checkpoint",
    batch_request=batch_request,
    suite_name="user_suite"
)

if results['success']:
    print("All expectations passed!")
else:
    for failure in results['failures']:
        print(f"Failed: {failure['expectation']} on {failure['column']}")

Common Expectations Reference

Category Expectation Description
Table ExpectTableRowCountToBeBetween Row count range
Existence ExpectColumnToExist Column must exist
Nulls ExpectColumnValuesToNotBeNull No null values
Range ExpectColumnValuesToBeBetween Value bounds
Set ExpectColumnValuesToBeInSet Allowed values
Pattern ExpectColumnValuesToMatchRegex Regex match
Unique ExpectColumnValuesToBeUnique No duplicates

Data Docs

# Build and open HTML reports
context.build_data_docs()
context.open_data_docs()

Directory Structure

great_expectations/
├── great_expectations.yml     # Config
├── expectations/              # Expectation suites (JSON)
├── checkpoints/               # Checkpoint definitions
├── plugins/                   # Custom expectations
└── uncommitted/
    ├── data_docs/            # Generated HTML docs
    └── validations/          # Validation results

When to Use Great Expectations

Use Case GX Alternative
Pipeline monitoring ✓
Data warehouse validation ✓
Automated data docs ✓
Simple DataFrame checks Pandera
Record-level API validation Pydantic

Dependencies

great_expectations>=0.18
pandas