clickhouse
npx skills add https://github.com/duyet/claude-plugins --skill clickhouse
Agent 安装分布
Skill 文档
ClickHouse Database Expert
Comprehensive ClickHouse knowledge base for working with high-performance columnar OLAP databases.
When to Invoke This Skill
Use this skill when:
- Designing ClickHouse schemas (tables, partitions, ORDER BY)
- Choosing table engines (MergeTree family decision tree)
- Writing and optimizing ClickHouse SQL queries
- Managing ClickHouse clusters (replication, sharding)
- Debugging query performance or merge issues
- Setting up backups and monitoring
- Integrating ClickHouse with Kafka, S3, or other systems
- Operating ClickHouse on Kubernetes
What is ClickHouse?
ClickHouse is a columnar OLAP database designed for real-time analytics on large datasets.
Key Characteristics:
- Columnar storage: Read only needed columns (10-100x faster than row stores for analytical queries)
- MergeTree engine family: Automatic background merges for data organization
- SQL dialect with extensions: Arrays, tuples, lambdas, specialized functions
- Append-first design: Optimized for high-volume inserts, not point updates
Golden Rules
- Always use MergeTree (except tiny dimensions â Memory engine)
- Sort key = query filter: ORDER BY defines data layout on disk
- Partition by time: For TTL and efficient DROP PARTITION operations
- Avoid mutations: Use INSERT + new data instead of UPDATE/DELETE
- Monitor merges: Background merges impact performance significantly
Quick Start Examples
Minimal Working Schema
-- Basic events table with best practices
CREATE TABLE events (
timestamp DateTime,
user_id UInt32,
event_type LowCardinality(String),
session_id UUID,
metadata String,
revenue Decimal(18, 2) DEFAULT 0
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp) -- Monthly partitions
ORDER BY (user_id, timestamp) -- Data layout matches query pattern
SETTINGS index_granularity = 8192;
Common Query Patterns
-- Effective time range filter
SELECT * FROM events
WHERE timestamp >= today() AND timestamp < tomorrow();
-- Optimize JOIN (smaller table on RIGHT)
SELECT * FROM large_table RIGHT JOIN small_table ON large_table.id = small_table.id;
-- Check query plan
EXPLAIN SELECT * FROM events WHERE user_id = 123;
Monitoring Queries
-- Running queries
SELECT query_id, user, query, elapsed FROM system.processes ORDER BY elapsed DESC;
-- Table sizes
SELECT database, table, formatReadableSize(sum(bytes)) as size
FROM system.parts WHERE active = 1
GROUP BY database, table ORDER BY sum(bytes) DESC;
-- Active merges
SELECT table, elapsed, bytes_read_uncompressed FROM system.merges ORDER BY elapsed DESC;
Common Pitfalls
| Pitfall | Why It’s Bad | Better Approach |
|---|---|---|
| Modifying columns (MODIFY/DROP) | Triggers expensive mutation | Use ADD COLUMN only |
| Updating/deleting rows | Mutations rewrite all data | Use TTL or new tables |
| Bad ORDER BY | Can’t leverage index | Match query WHERE patterns |
| Too many partitions | Slow queries, high overhead | Aim for 100-1000 parts total |
| SELECT * | Reads all columns (columnar penalty) | Select only needed columns |
| String date comparison | Full scan | Use date functions on column |
ClickHouse Architecture Overview
Data Model
- Append-first: No in-place updates (mutations are expensive)
- Parts and partitions: Parts merge into larger parts (background process)
- Two-level index: Sparse index (8192 rows/mark) + mark files
When ClickHouse Shines
â Wide tables (100+ columns), read few columns â Time-series with time-based filters â Aggregations over billions of rows â Append-only workloads (events, logs, metrics)
When to Avoid ClickHouse
â Point updates/deletes (use row store like PostgreSQL) â Heavy JOINs on non-sorted keys â Complex transactions (no ACID support) â Low-latency OLTP (use row store)
Key Topics by Reference
Schema & Table Design
| Topic | Reference | Description |
|---|---|---|
| Core Concepts | references/core-concepts.md |
Architecture, data model, internals |
| Schema Design | references/schema-design.md |
Database engines, migrations, version control |
| Table Design | references/table-design.md |
ORDER BY, partitioning, column selection |
| Table Engines | references/table-engines.md |
Complete MergeTree family reference |
Query & Performance
| Topic | Reference | Description |
|---|---|---|
| SQL Reference | references/sql-reference.md |
Complete SQL dialect, data types |
| Query Optimization | references/query-optimization.md |
EXPLAIN, JOINs, projections, skip indexes |
| Advanced Features | references/advanced-features.md |
Materialized views, mutations, TTL, dictionaries |
Operations & Cluster
| Topic | Reference | Description |
|---|---|---|
| Debugging | references/debugging.md |
Query debugging, merges, mutations, replication |
| Cluster Management | references/cluster-management.md |
Distributed tables, replication, sharding |
| Kubernetes Operator | references/kubernetes-operator.md |
K8s deployment and operations |
| Backup & Restore | references/backup-restore.md |
Backup strategies, disaster recovery |
| Monitoring | references/monitoring.md |
Query monitoring, health checks, system queries |
Integration & Best Practices
| Topic | Reference | Description |
|---|---|---|
| Integrations | references/integrations.md |
Kafka, S3, PostgreSQL, MySQL, BI tools |
| Best Practices | references/best-practices.md |
Complete checklist, anti-patterns |
| External References | references/external.md |
Altinity KB links, official docs |
Quick Decision Guides
Which Table Engine?
Need to store data?
âââ < 1M rows, dimension â Memory
âââ ⥠1M rows â MergeTree family
âââ Deduplication? â ReplacingMergeTree(version)
âââ Changelog? â CollapsingMergeTree(sign)
âââ Pre-aggregation? â AggregatingMergeTree()
âââ Replication? â ReplicatedMergeTree(...)
âââ Default â MergeTree()
See references/table-engines.md for complete reference.
Common Issues & Quick Fixes
| Issue | Quick Fix |
|---|---|
| Too many parts | OPTIMIZE TABLE table FINAL |
| Slow query | EXPLAIN SELECT ... to check index usage |
| Mutation stuck | Check system.mutations, use OPTIMIZE FINAL |
| Replication lag | Check system.replication_queue, ZooKeeper |
| OOM on query | Increase max_memory_usage, optimize query |
See references/debugging.md for detailed troubleshooting.
See Also
Core References
references/core-concepts.md– Architecture, data model, MergeTree internalsreferences/schema-design.md– Database engines, schema organization, migrationsreferences/table-design.md– ORDER BY design, partitioning strategies, column selectionreferences/table-engines.md– Complete guide to all MergeTree family enginesreferences/sql-reference.md– Full SQL dialect with data types and functions
Query & Performance
references/query-optimization.md– EXPLAIN, JOIN optimization, projections, skip indexesreferences/advanced-features.md– Materialized views, mutations, TTL, dictionariesreferences/debugging.md– Query debugging, merge issues, replication problems
Operations
references/cluster-management.md– Distributed tables, replication, shardingreferences/kubernetes-operator.md– K8s deployment with Altinity operatorreferences/backup-restore.md– Backup strategies and disaster recoveryreferences/monitoring.md– Query monitoring, health checks, system queries
Integration & Best Practices
references/integrations.md– Kafka, S3, PostgreSQL, MySQL, BI toolsreferences/best-practices.md– Comprehensive checklist and anti-patternsreferences/external.md– Altinity KB (200+ articles) and official docs
System Queries
references/system-queries.md– Ready-to-use queries for operations and monitoring
Version: 1.0.0 Sources: Altinity Knowledge Base (200+ articles) + ClickHouse Official Docs