clickhouse-schema-design

📁 obsessiondb/clickhouse-plugin 📅 14 days ago
1
总安装量
1
周安装量
#45265
全站排名
安装命令
npx skills add https://github.com/obsessiondb/clickhouse-plugin --skill clickhouse-schema-design

Agent 安装分布

mcpjam 1
claude-code 1
kilo 1
windsurf 1
zencoder 1

Skill 文档

ClickHouse Schema Design

ALWAYS LOAD when creating or modifying ClickHouse tables.

Goals

  • Sub-second query response on billion-row tables
  • 10x+ compression ratios vs raw data
  • Zero-maintenance data retention and tiered storage
  • Queries that scan <1% of data instead of full table scans

Reference Documentation

Search terms: schema design, ORDER BY, PRIMARY KEY, PARTITION BY, data types, TTL, index_granularity

Critical Rules

  1. ORDER BY columns determine query performance – Put most-filtered columns first
  2. Low cardinality columns before high cardinality – tenant_id before user_id before timestamp
  3. PARTITION BY is for data management, not query speed – ORDER BY speeds up queries
  4. Use smallest data types possible – UInt32 not UInt64 if values fit
  5. LowCardinality for strings with <10K unique values – Saves storage and speeds queries
  6. Avoid Nullable when possible – Use DEFAULT values instead
  7. PRIMARY KEY can be a prefix of ORDER BY – Reduces index size

Engine Selection

Engine Use Case Key Behavior
MergeTree Append-only data (events, logs) Standard storage, no special merge logic
ReplacingMergeTree Deduplication, upserts Keeps latest row per ORDER BY key; query with argMax pattern
SummingMergeTree Pre-aggregated counters Automatically sums numeric columns on merge
AggregatingMergeTree Complex aggregates (uniq, quantile) Stores intermediate states; use -State/-Merge functions
CollapsingMergeTree Mutable rows Uses +1/-1 sign column to cancel/update rows

For engine examples and Materialized Views, see clickhouse-materialized-views skill.

ORDER BY Selection

The 3-5 Column Rule

ORDER BY (
    tenant_id,      -- 1. Lowest cardinality, most filtered
    event_date,     -- 2. Time component (common filter)
    user_id,        -- 3. Medium cardinality
    event_type      -- 4. Higher cardinality (optional)
)

Decision Process

  1. Which columns appear in WHERE clauses? – These go first
  2. What’s the cardinality? – Lower cardinality columns before higher
  3. Is there a time dimension? – Usually included for range queries

Good vs Bad

-- BAD: High cardinality first, rarely-filtered columns
ORDER BY (user_id, timestamp, tenant_id)

-- GOOD: Low cardinality first, commonly-filtered columns
ORDER BY (tenant_id, toDate(timestamp), user_id)

PRIMARY KEY as Prefix

-- Full sorting key for data layout
ORDER BY (tenant_id, event_date, user_id, event_type)

-- Shorter primary key for smaller index (optional)
PRIMARY KEY (tenant_id, event_date)

PARTITION BY Guidelines

Size Targets

Scenario Target Partition Size
General MergeTree tables 1-300 GB
SummingMergeTree / ReplacingMergeTree 400 MB – 40 GB
Small tables (<5 GB total) No partitioning

Common Patterns

-- Monthly (most common for analytics)
PARTITION BY toYYYYMM(event_date)

-- Daily (high volume, >1TB/month)
PARTITION BY toDate(event_date)

-- Multi-tenant with time
PARTITION BY (tenant_id, toYYYYMM(event_date))

-- No partitioning (small tables)
-- Simply omit PARTITION BY clause

Anti-Pattern

-- BAD: Over-partitioning creates thousands of small parts
PARTITION BY (toDate(event_date), user_id)

-- BAD: Partitioning by high-cardinality column
PARTITION BY user_id

Data Type Optimization

Numbers

-- Use smallest type that fits your data
count UInt16,           -- Max 65,535 (instead of UInt64)
percentage Float32,     -- Instead of Float64 if 6-7 digits precision is enough
flags UInt8,            -- For small integers, booleans

Strings

-- LowCardinality for <10K unique values
country LowCardinality(String),
status LowCardinality(String),
event_type LowCardinality(String),

-- Regular String for high cardinality
user_agent String,
url String,

Dates and Times

-- Use simplest type that meets requirements
event_date Date,                    -- If you only need date
event_time DateTime,                -- If you need seconds
event_time_precise DateTime64(3),   -- Only if you need milliseconds

Avoiding Nullable

-- BAD: Nullable adds overhead and complexity
user_id Nullable(UInt64),

-- GOOD: Use DEFAULT for missing values
user_id UInt64 DEFAULT 0,

-- GOOD: Use empty string for missing text
name String DEFAULT '',

TTL Configuration

Delete Old Data

CREATE TABLE events (
    event_date Date,
    ...
) ENGINE = MergeTree()
ORDER BY (...)
TTL event_date + INTERVAL 90 DAY DELETE;

Tiered Storage

CREATE TABLE events (
    event_date Date,
    ...
) ENGINE = MergeTree()
ORDER BY (...)
TTL
    event_date + INTERVAL 7 DAY TO VOLUME 'hot',
    event_date + INTERVAL 30 DAY TO VOLUME 'warm',
    event_date + INTERVAL 365 DAY DELETE;

Column-Level TTL

CREATE TABLE events (
    event_date Date,
    user_id UInt64,
    -- Delete PII after 30 days, keep aggregated data
    email String TTL event_date + INTERVAL 30 DAY,
    ip_address String TTL event_date + INTERVAL 30 DAY
) ENGINE = MergeTree()
ORDER BY (event_date, user_id);

Complete Example

CREATE TABLE analytics_events (
    -- Time dimension
    event_date Date,
    event_time DateTime,

    -- Identifiers (low to high cardinality)
    tenant_id UInt32,
    user_id UInt64,
    session_id String,

    -- Categorical data (use LowCardinality)
    event_type LowCardinality(String),
    country LowCardinality(String),
    device_type LowCardinality(String),

    -- Metrics
    duration_ms UInt32,

    -- Flexible data
    properties String,  -- JSON as string

    -- Skip indices for secondary lookups
    INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 3,
    INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 3
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (tenant_id, event_date, user_id)
TTL event_date + INTERVAL 12 MONTH DELETE
SETTINGS index_granularity = 8192;

Anti-Patterns

Anti-Pattern Problem Solution
ORDER BY (uuid, timestamp) High cardinality first Put low cardinality columns first
PARTITION BY toDate(ts) for small tables Too many small partitions Omit partitioning or use monthly
Nullable(UInt64) everywhere Storage and query overhead Use DEFAULT values
String for status codes Wastes space Use LowCardinality(String) or Enum
DateTime64(9) always Nanosecond precision rarely needed Use DateTime or DateTime64(3)
Putting timestamp first in ORDER BY Poor compression and filtering Put categorical columns first

Verification Queries

-- Check table size and compression
SELECT
    table,
    formatReadableSize(sum(bytes_on_disk)) AS size,
    sum(rows) AS rows,
    round(sum(bytes_on_disk) / sum(rows), 2) AS bytes_per_row
FROM system.parts
WHERE active AND database = 'default'
GROUP BY table;

-- Check partition sizes
SELECT
    partition,
    count() AS parts,
    sum(rows) AS rows,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active AND table = 'your_table'
GROUP BY partition
ORDER BY partition;

-- Verify column types
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'default' AND table = 'your_table';

Troubleshooting

Always ask for user confirmation before applying schema changes (ALTER TABLE, recreating tables).

“Too Many Parts” Error

Problem: DB::Exception: Too many parts, inserts rejected, merge queue growing

Diagnose:

SELECT table, partition, count() AS parts
FROM system.parts
WHERE active AND database = 'default'
GROUP BY table, partition
HAVING parts > 300
ORDER BY parts DESC;

Solutions:

Cause Fix
Over-partitioning (daily + high cardinality) Use monthly partitions: PARTITION BY toYYYYMM(date)
Too many small inserts Batch inserts: 1000+ rows per INSERT
High-cardinality partition key Remove high-cardinality columns from PARTITION BY
-- Fix: Change from daily to monthly partitioning (requires table recreation)
CREATE TABLE events_new (...) PARTITION BY toYYYYMM(event_date) ...;
INSERT INTO events_new SELECT * FROM events;
RENAME TABLE events TO events_old, events_new TO events;
DROP TABLE events_old;

Poor Compression (<3x Ratio)

Problem: Table using more disk than expected, compression ratio below 3x

Diagnose:

SELECT
    column,
    type,
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 2) AS ratio
FROM system.parts_columns
WHERE active AND table = 'your_table'
GROUP BY column, type
ORDER BY ratio ASC;

Solutions:

Cause Fix
High-cardinality column first in ORDER BY Reorder: low cardinality columns first
String for low-cardinality data Use LowCardinality(String)
Wrong codec for data pattern Use DoubleDelta for timestamps, Gorilla for floats
Random UUIDs in ORDER BY Move UUID later in ORDER BY, or use different key
-- Check cardinality to decide LowCardinality usage
SELECT uniq(status) FROM events;  -- If <10K, use LowCardinality

-- Fix column type (requires recreation or new column)
ALTER TABLE events ADD COLUMN status_new LowCardinality(String);
ALTER TABLE events UPDATE status_new = status WHERE 1;
-- Then migrate queries to use status_new

Slow Queries Despite Good Schema

Problem: Queries slow even with proper ORDER BY and partitioning

Diagnose:

EXPLAIN indexes = 1 SELECT ... FROM your_table WHERE ...;
-- Check: Are granules being skipped? Is partition pruning happening?

Solutions:

Cause Fix
Query doesn’t filter on ORDER BY prefix Add ORDER BY columns to WHERE clause
Function on filter column Store computed column, filter on that
Missing skip index for secondary lookups Add bloom_filter index
Selecting too many columns Select only needed columns
-- Example: Table ORDER BY (tenant_id, event_date, user_id)

-- BAD: Skips ORDER BY prefix
SELECT * FROM events WHERE user_id = 123;

-- GOOD: Include prefix
SELECT * FROM events WHERE tenant_id = 1 AND event_date = today() AND user_id = 123;

-- ALT: Add skip index for direct user_id lookups
ALTER TABLE events ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 4;