clickhouse-best-practices
npx skills add https://github.com/clickhouse/agent-skills --skill clickhouse-best-practices
Agent 安装分布
Skill 文档
ClickHouse Best Practices
Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
Official docs: ClickHouse Best Practices
IMPORTANT: How to Apply This Skill
Before answering ClickHouse questions, follow this priority order:
- Check for applicable rules in the
rules/directory - If rules exist: Apply them and cite them in your response using “Per
rule-name…” - If no rule exists: Use the LLM’s ClickHouse knowledge or search documentation
- If uncertain: Use web search for current best practices
- Always cite your source: rule name, “general ClickHouse guidance”, or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
For Formal Reviews
When performing a formal review of schemas, queries, or data ingestion:
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
rules/schema-pk-plan-before-creation.md– ORDER BY is immutablerules/schema-pk-cardinality-order.md– Column ordering in keysrules/schema-pk-prioritize-filters.md– Filter column inclusionrules/schema-types-native-types.md– Proper type selectionrules/schema-types-minimize-bitwidth.md– Numeric type sizingrules/schema-types-lowcardinality.md– LowCardinality usagerules/schema-types-avoid-nullable.md– Nullable vs DEFAULTrules/schema-partition-low-cardinality.md– Partition count limitsrules/schema-partition-lifecycle.md– Partitioning purpose
Check for:
- PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- Data types match actual data ranges
- LowCardinality applied to appropriate string columns
- Partition key cardinality bounded (100-1,000 values)
- ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
rules/query-join-choose-algorithm.md– Algorithm selectionrules/query-join-filter-before.md– Pre-join filteringrules/query-join-use-any.md– ANY vs regular JOINrules/query-index-skipping-indices.md– Secondary index usagerules/schema-pk-filter-on-orderby.md– Filter alignment with ORDER BY
Check for:
- Filters use ORDER BY prefix columns
- JOINs filter tables before joining (not after)
- Correct JOIN algorithm for table sizes
- Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
rules/insert-batch-size.md– Batch sizing requirementsrules/insert-mutation-avoid-update.md– UPDATE alternativesrules/insert-mutation-avoid-delete.md– DELETE alternativesrules/insert-async-small-batches.md– Async insert usagerules/insert-optimize-avoid-final.md– OPTIMIZE TABLE risks
Check for:
- Batch size 10K-100K rows per INSERT
- No ALTER TABLE UPDATE for frequent changes
- ReplacingMergeTree or CollapsingMergeTree for update patterns
- Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...
## Findings
### Violations
- **`rule-name`**: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
### Compliant
- `rule-name`: Brief note on why it's correct
## Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rule Count |
|---|---|---|---|---|
| 1 | Primary Key Selection | CRITICAL | schema-pk- |
4 |
| 2 | Data Type Selection | CRITICAL | schema-types- |
5 |
| 3 | JOIN Optimization | CRITICAL | query-join- |
5 |
| 4 | Insert Batching | CRITICAL | insert-batch- |
1 |
| 5 | Mutation Avoidance | CRITICAL | insert-mutation- |
2 |
| 6 | Partitioning Strategy | HIGH | schema-partition- |
4 |
| 7 | Skipping Indices | HIGH | query-index- |
1 |
| 8 | Materialized Views | HIGH | query-mv- |
2 |
| 9 | Async Inserts | HIGH | insert-async- |
2 |
| 10 | OPTIMIZE Avoidance | HIGH | insert-optimize- |
1 |
| 11 | JSON Usage | MEDIUM | schema-json- |
1 |
Quick Reference
Schema Design – Primary Key (CRITICAL)
schema-pk-plan-before-creation– Plan ORDER BY before table creation (immutable)schema-pk-cardinality-order– Order columns low-to-high cardinalityschema-pk-prioritize-filters– Include frequently filtered columnsschema-pk-filter-on-orderby– Query filters must use ORDER BY prefix
Schema Design – Data Types (CRITICAL)
schema-types-native-types– Use native types, not String for everythingschema-types-minimize-bitwidth– Use smallest numeric type that fitsschema-types-lowcardinality– LowCardinality for <10K unique stringsschema-types-enum– Enum for finite value sets with validationschema-types-avoid-nullable– Avoid Nullable; use DEFAULT instead
Schema Design – Partitioning (HIGH)
schema-partition-low-cardinality– Keep partition count 100-1,000schema-partition-lifecycle– Use partitioning for data lifecycle, not queriesschema-partition-query-tradeoffs– Understand partition pruning trade-offsschema-partition-start-without– Consider starting without partitioning
Schema Design – JSON (MEDIUM)
schema-json-when-to-use– JSON for dynamic schemas; typed columns for known
Query Optimization – JOINs (CRITICAL)
query-join-choose-algorithm– Select algorithm based on table sizesquery-join-use-any– ANY JOIN when only one match neededquery-join-filter-before– Filter tables before joiningquery-join-consider-alternatives– Dictionaries/denormalization vs JOINquery-join-null-handling– join_use_nulls=0 for default values
Query Optimization – Indices (HIGH)
query-index-skipping-indices– Skipping indices for non-ORDER BY filters
Query Optimization – Materialized Views (HIGH)
query-mv-incremental– Incremental MVs for real-time aggregationsquery-mv-refreshable– Refreshable MVs for complex joins
Insert Strategy – Batching (CRITICAL)
insert-batch-size– Batch 10K-100K rows per INSERT
Insert Strategy – Async (HIGH)
insert-async-small-batches– Async inserts for high-frequency small batchesinsert-format-native– Native format for best performance
Insert Strategy – Mutations (CRITICAL)
insert-mutation-avoid-update– ReplacingMergeTree instead of ALTER UPDATEinsert-mutation-avoid-delete– Lightweight DELETE or DROP PARTITION
Insert Strategy – Optimization (HIGH)
insert-optimize-avoid-final– Let background merges work
When to Apply
This skill activates when you encounter:
CREATE TABLEstatementsALTER TABLEmodificationsORDER BYorPRIMARY KEYdiscussions- Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
Rule File Structure
Each rule file in rules/ contains:
- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with explanation
- Correct example: Best practice with explanation
- Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline: AGENTS.md
Use AGENTS.md when you need to check multiple rules quickly without reading individual files.