data-lake-platform

📁 vasilyu1983/ai-agents-public 📅 Jan 23, 2026

总安装量

周安装量

#9706

全站排名

安装命令

npx skills add https://github.com/vasilyu1983/ai-agents-public --skill data-lake-platform

Agent 安装分布

claude-code 18

cursor 17

gemini-cli 13

opencode 13

antigravity 12

Build and operate production data lakes and lakehouses: ingest, transform, store in open formats, and serve analytics reliably.

Batch, streaming, or hybrid? What is the freshness SLO?
Append-only vs upserts/deletes (CDC)? Is time travel required?
Primary query pattern: BI dashboards (high concurrency), ad-hoc joins, embedded analytics?
PII/compliance: row/column-level access, retention, audit logging?
Platform constraints: self-hosted vs cloud, preferred engines, team strengths?

Pick table format + catalog: references/storage-formats.md (use assets/cross-platform/template-schema-evolution.md and assets/cross-platform/template-partitioning-strategy.md)
Design ingestion (batch/incremental/CDC): references/ingestion-patterns.md (use assets/cross-platform/template-ingestion-governance-checklist.md and assets/cross-platform/template-incremental-loading.md)
Design transformations (bronze/silver/gold or data products): references/transformation-patterns.md (use assets/cross-platform/template-data-pipeline.md)
Choose lake query vs serving engines: references/query-engine-patterns.md
Add governance, lineage, and quality gates: references/governance-catalog.md (use assets/cross-platform/template-data-quality-governance.md and assets/cross-platform/template-data-quality.md)
Plan operations + cost controls: references/operational-playbook.md and references/cost-optimization.md (use assets/cross-platform/template-data-quality-backfill-runbook.md and assets/cross-platform/template-cost-optimization.md)

pip install "dlt[clickhouse]"
dlt init rest_api clickhouse
python pipeline.py

pip install sqlmesh
sqlmesh init duckdb
sqlmesh plan && sqlmesh run

Resource	Purpose
references/overview.md	Diagrams and decision flows
references/architecture-patterns.md	Medallion, data mesh
references/ingestion-patterns.md	dlt vs Airbyte, CDC
references/transformation-patterns.md	SQLMesh vs dbt
references/storage-formats.md	Iceberg vs Delta
references/query-engine-patterns.md	ClickHouse, DuckDB
references/streaming-patterns.md	Kafka, Flink
references/orchestration-patterns.md	Dagster, Airflow
references/bi-visualization-patterns.md	Metabase, Superset
references/cost-optimization.md	Cost levers and maintenance
references/operational-playbook.md	Monitoring and incident response
references/governance-catalog.md	Catalog, lineage, access control

Template	Purpose
assets/cross-platform/template-medallion-architecture.md	Baseline bronze/silver/gold plan
assets/cross-platform/template-data-pipeline.md	End-to-end pipeline skeleton
assets/cross-platform/template-ingestion-governance-checklist.md	Source onboarding checklist
assets/cross-platform/template-incremental-loading.md	Incremental + backfill plan
assets/cross-platform/template-schema-evolution.md	Schema change rules
assets/cross-platform/template-cost-optimization.md	Cost control checklist
assets/cross-platform/template-data-quality-governance.md	Quality contracts + SLOs
assets/cross-platform/template-data-quality-backfill-runbook.md	Backfill incident/runbook