clickhouse-best-practices

📁 clickhouse/agent-skills 📅 Jan 27, 2026

386

总安装量

386

周安装量

#711

全站排名

安装命令

npx skills add https://github.com/clickhouse/agent-skills --skill clickhouse-best-practices

Agent 安装分布

opencode 315

codex 308

gemini-cli 308

amp 281

kimi-cli 281

Skill 文档

ClickHouse Best Practices

Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.

Official docs: ClickHouse Best Practices

IMPORTANT: How to Apply This Skill

Before answering ClickHouse questions, follow this priority order:

Check for applicable rules in the rules/ directory
If rules exist: Apply them and cite them in your response using “Per rule-name…”
If no rule exists: Use the LLM’s ClickHouse knowledge or search documentation
If uncertain: Use web search for current best practices
Always cite your source: rule name, “general ClickHouse guidance”, or URL

Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.

For Formal Reviews

When performing a formal review of schemas, queries, or data ingestion:

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

Read these rule files in order:

rules/schema-pk-plan-before-creation.md – ORDER BY is immutable
rules/schema-pk-cardinality-order.md – Column ordering in keys
rules/schema-pk-prioritize-filters.md – Filter column inclusion
rules/schema-types-native-types.md – Proper type selection
rules/schema-types-minimize-bitwidth.md – Numeric type sizing
rules/schema-types-lowcardinality.md – LowCardinality usage
rules/schema-types-avoid-nullable.md – Nullable vs DEFAULT
rules/schema-partition-low-cardinality.md – Partition count limits
rules/schema-partition-lifecycle.md – Partitioning purpose

Check for:

PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
Data types match actual data ranges
LowCardinality applied to appropriate string columns
Partition key cardinality bounded (100-1,000 values)
ReplacingMergeTree has version column if used

For Query Reviews (SELECT, JOIN, aggregations)

Read these rule files:

rules/query-join-choose-algorithm.md – Algorithm selection
rules/query-join-filter-before.md – Pre-join filtering
rules/query-join-use-any.md – ANY vs regular JOIN
rules/query-index-skipping-indices.md – Secondary index usage
rules/schema-pk-filter-on-orderby.md – Filter alignment with ORDER BY

Check for:

Filters use ORDER BY prefix columns
JOINs filter tables before joining (not after)
Correct JOIN algorithm for table sizes
Skipping indices for non-ORDER BY filter columns

For Insert Strategy Reviews (data ingestion, updates, deletes)

Read these rule files:

rules/insert-batch-size.md – Batch sizing requirements
rules/insert-mutation-avoid-update.md – UPDATE alternatives
rules/insert-mutation-avoid-delete.md – DELETE alternatives
rules/insert-async-small-batches.md – Async insert usage
rules/insert-optimize-avoid-final.md – OPTIMIZE TABLE risks

Check for:

Batch size 10K-100K rows per INSERT
No ALTER TABLE UPDATE for frequent changes
ReplacingMergeTree or CollapsingMergeTree for update patterns
Async inserts enabled for high-frequency small batches

Output Format

Structure your response as follows:

## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...

## Findings

### Violations
- **`rule-name`**: Description of the issue
  - Current: [what the code does]
  - Required: [what it should do]
  - Fix: [specific correction]

### Compliant
- `rule-name`: Brief note on why it's correct

## Recommendations
[Prioritized list of changes, citing rules]

Rule Categories by Priority

Priority	Category	Impact	Prefix	Rule Count
1	Primary Key Selection	CRITICAL	`schema-pk-`	4
2	Data Type Selection	CRITICAL	`schema-types-`	5
3	JOIN Optimization	CRITICAL	`query-join-`	5
4	Insert Batching	CRITICAL	`insert-batch-`	1
5	Mutation Avoidance	CRITICAL	`insert-mutation-`	2
6	Partitioning Strategy	HIGH	`schema-partition-`	4
7	Skipping Indices	HIGH	`query-index-`	1
8	Materialized Views	HIGH	`query-mv-`	2
9	Async Inserts	HIGH	`insert-async-`	2
10	OPTIMIZE Avoidance	HIGH	`insert-optimize-`	1
11	JSON Usage	MEDIUM	`schema-json-`	1

Quick Reference

Schema Design – Primary Key (CRITICAL)

schema-pk-plan-before-creation – Plan ORDER BY before table creation (immutable)
schema-pk-cardinality-order – Order columns low-to-high cardinality
schema-pk-prioritize-filters – Include frequently filtered columns
schema-pk-filter-on-orderby – Query filters must use ORDER BY prefix

Schema Design – Data Types (CRITICAL)

schema-types-native-types – Use native types, not String for everything
schema-types-minimize-bitwidth – Use smallest numeric type that fits
schema-types-lowcardinality – LowCardinality for <10K unique strings
schema-types-enum – Enum for finite value sets with validation
schema-types-avoid-nullable – Avoid Nullable; use DEFAULT instead

Schema Design – Partitioning (HIGH)

schema-partition-low-cardinality – Keep partition count 100-1,000
schema-partition-lifecycle – Use partitioning for data lifecycle, not queries
schema-partition-query-tradeoffs – Understand partition pruning trade-offs
schema-partition-start-without – Consider starting without partitioning

Schema Design – JSON (MEDIUM)

schema-json-when-to-use – JSON for dynamic schemas; typed columns for known

Query Optimization – JOINs (CRITICAL)

query-join-choose-algorithm – Select algorithm based on table sizes
query-join-use-any – ANY JOIN when only one match needed
query-join-filter-before – Filter tables before joining
query-join-consider-alternatives – Dictionaries/denormalization vs JOIN
query-join-null-handling – join_use_nulls=0 for default values

Query Optimization – Indices (HIGH)

query-index-skipping-indices – Skipping indices for non-ORDER BY filters

Query Optimization – Materialized Views (HIGH)

query-mv-incremental – Incremental MVs for real-time aggregations
query-mv-refreshable – Refreshable MVs for complex joins

Insert Strategy – Batching (CRITICAL)

insert-batch-size – Batch 10K-100K rows per INSERT

Insert Strategy – Async (HIGH)

insert-async-small-batches – Async inserts for high-frequency small batches
insert-format-native – Native format for best performance

Insert Strategy – Mutations (CRITICAL)

insert-mutation-avoid-update – ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-delete – Lightweight DELETE or DROP PARTITION

Insert Strategy – Optimization (HIGH)

insert-optimize-avoid-final – Let background merges work

When to Apply

This skill activates when you encounter:

CREATE TABLE statements
ALTER TABLE modifications
ORDER BY or PRIMARY KEY discussions
Data type selection questions
Slow query troubleshooting
JOIN optimization requests
Data ingestion pipeline design
Update/delete strategy questions
ReplacingMergeTree or other specialized engine usage
Partitioning strategy decisions

Rule File Structure

Each rule file in rules/ contains:

YAML frontmatter: title, impact level, tags
Brief explanation: Why this rule matters
Incorrect example: Anti-pattern with explanation
Correct example: Best practice with explanation
Additional context: Trade-offs, when to apply, references

Full Compiled Document

For the complete guide with all rules expanded inline: AGENTS.md

Use AGENTS.md when you need to check multiple rules quickly without reading individual files.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台