clickhouse-materialized-views
npx skills add https://github.com/obsessiondb/clickhouse-plugin --skill clickhouse-materialized-views
Agent 安装分布
Skill 文档
ClickHouse Materialized Views
Load when creating Materialized Views for real-time aggregation, ETL pipelines, or query optimization.
Prerequisite: Understand MergeTree engine variants in clickhouse-schema-design skill.
Reference Documentation
Search terms: materialized view, MV, real-time aggregation, AggregateFunction, -State, -Merge, SummingMergeTree, pre-aggregation, incremental
Critical Rules
- MVs are triggers, not caches – They process INSERT data, not query results
- Use correct engine – AggregatingMergeTree for complex aggregates, SummingMergeTree for counters
- Query with argMax or -Merge – Aggregation completes at query time, not insert time
- MV sees INSERT only – No backfill; existing data must be inserted manually
How Materialized Views Work
Source Table ââINSERTââ⺠MV Transform ââ⺠Target Table
â
ââ MV executes SELECT for each inserted block
Key insight: The MV’s SELECT query runs on each INSERT batch. Results go to the target table. The MV does NOT query historical data.
Pattern 1: Real-Time Counters (SummingMergeTree)
Best for simple sums and counts that need real-time updates.
-- Source: raw events
CREATE TABLE events (
event_time DateTime,
tenant_id UInt32,
event_type String,
user_id UInt64
) ENGINE = MergeTree()
ORDER BY (tenant_id, event_time);
-- Target: daily counters
CREATE TABLE daily_event_counts (
date Date,
tenant_id UInt32,
event_type LowCardinality(String),
event_count UInt64,
unique_users UInt64
)
ENGINE = SummingMergeTree()
ORDER BY (tenant_id, date, event_type);
-- MV: transform inserts
CREATE MATERIALIZED VIEW daily_event_counts_mv
TO daily_event_counts AS
SELECT
toDate(event_time) AS date,
tenant_id,
event_type,
count() AS event_count,
uniq(user_id) AS unique_users -- WARNING: Not additive!
FROM events
GROUP BY date, tenant_id, event_type;
Warning: uniq() in SummingMergeTree is not accurateâsums don’t equal unique counts. Use AggregatingMergeTree for unique counts.
Pattern 2: Complex Aggregates (AggregatingMergeTree)
For accurate uniq, quantiles, or any non-additive aggregate.
-- Target: uses AggregateFunction types
CREATE TABLE user_metrics_agg (
date Date,
tenant_id UInt32,
total_events AggregateFunction(sum, UInt64),
unique_users AggregateFunction(uniq, UInt64),
p95_duration AggregateFunction(quantile(0.95), Float32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, date);
-- MV: use -State functions
CREATE MATERIALIZED VIEW user_metrics_mv
TO user_metrics_agg AS
SELECT
toDate(event_time) AS date,
tenant_id,
sumState(1) AS total_events,
uniqState(user_id) AS unique_users,
quantileState(0.95)(duration_ms) AS p95_duration
FROM events
GROUP BY date, tenant_id;
-- Query: use -Merge functions
SELECT
date,
tenant_id,
sumMerge(total_events) AS events,
uniqMerge(unique_users) AS users,
quantileMerge(0.95)(p95_duration) AS p95
FROM user_metrics_agg
WHERE tenant_id = 1
GROUP BY date, tenant_id;
Key pattern: -State to insert, -Merge to query.
Pattern 3: Data Transformation Pipeline
Transform/enrich data as it arrives.
-- Source: raw JSON logs
CREATE TABLE raw_logs (
timestamp DateTime,
raw_json String
) ENGINE = MergeTree()
ORDER BY timestamp;
-- Target: parsed structured data
CREATE TABLE parsed_logs (
timestamp DateTime,
level LowCardinality(String),
service LowCardinality(String),
message String,
trace_id String
)
ENGINE = MergeTree()
ORDER BY (service, timestamp);
-- MV: parse JSON on insert
CREATE MATERIALIZED VIEW parsed_logs_mv
TO parsed_logs AS
SELECT
timestamp,
JSONExtractString(raw_json, 'level') AS level,
JSONExtractString(raw_json, 'service') AS service,
JSONExtractString(raw_json, 'message') AS message,
JSONExtractString(raw_json, 'trace_id') AS trace_id
FROM raw_logs;
Pattern 4: Last Value Tracking (ReplacingMergeTree)
Track latest state per entity.
-- Target: latest user state
CREATE TABLE user_latest_state (
user_id UInt64,
last_seen DateTime,
last_action LowCardinality(String),
total_actions UInt64
)
ENGINE = ReplacingMergeTree(last_seen)
ORDER BY user_id;
-- MV: update on each event
CREATE MATERIALIZED VIEW user_state_mv
TO user_latest_state AS
SELECT
user_id,
max(event_time) AS last_seen,
argMax(event_type, event_time) AS last_action,
count() AS total_actions
FROM events
GROUP BY user_id;
Query with argMax pattern (avoid FINAL on large tables):
SELECT
user_id,
argMax(last_seen, last_seen) AS last_seen,
argMax(last_action, last_seen) AS last_action,
argMax(total_actions, last_seen) AS total_actions
FROM user_latest_state
WHERE user_id = 123
GROUP BY user_id;
Backfilling Existing Data
MVs don’t process existing data. Backfill manually:
-- Insert historical data into target table
INSERT INTO daily_event_counts
SELECT
toDate(event_time) AS date,
tenant_id,
event_type,
count() AS event_count,
uniq(user_id) AS unique_users
FROM events
WHERE event_time < '2024-01-01' -- Before MV was created
GROUP BY date, tenant_id, event_type;
MV Management
Check MV Status
-- List all MVs
SELECT name, engine, create_table_query
FROM system.tables
WHERE engine = 'MaterializedView';
-- Check target table size
SELECT
table,
formatReadableSize(sum(bytes_on_disk)) AS size,
sum(rows) AS rows
FROM system.parts
WHERE active AND table = 'daily_event_counts'
GROUP BY table;
Pause/Resume MV
-- Pause (stop processing inserts)
ALTER TABLE events DETACH MATERIALIZED VIEW daily_event_counts_mv;
-- Resume
ALTER TABLE events ATTACH MATERIALIZED VIEW daily_event_counts_mv;
Modify MV
MVs cannot be altered. Drop and recreate:
DROP VIEW daily_event_counts_mv;
-- Optionally truncate target: TRUNCATE TABLE daily_event_counts;
CREATE MATERIALIZED VIEW daily_event_counts_mv TO daily_event_counts AS ...;
-- Backfill if needed
Common Pitfalls
| Pitfall | Problem | Solution |
|---|---|---|
Using uniq() with SummingMergeTree |
Sums don’t equal uniques | Use AggregatingMergeTree with uniqState/uniqMerge |
| Forgetting argMax or -Merge | Incomplete aggregation results | Use argMax pattern for Replacing/Collapsing, -Merge for Aggregating |
| No backfill after MV creation | Missing historical data | Manually INSERT historical aggregates |
| MV on wrong table | Inserts to wrong source ignored | Ensure MV is on the table receiving INSERTs |
| Too many MVs on one source | Slow inserts | Consider fewer MVs or async processing |
Decision Tree
Need to aggregate data at query time?
â
ââ Yes, and data changes frequently â Query raw data
â
ââ Yes, but queries are slow â Create MV
â â
â ââ Simple sums/counts only?
â â ââ SummingMergeTree
â â
â ââ Need uniq, quantile, or complex aggregates?
â â ââ AggregatingMergeTree with -State/-Merge
â â
â ââ Need latest value per key?
â ââ ReplacingMergeTree with argMax pattern
â
ââ Need to transform/parse data on insert?
ââ MV with regular MergeTree target
Complete Example: Multi-Level Aggregation
-- Level 1: Raw events (source)
CREATE TABLE events (...) ENGINE = MergeTree() ORDER BY ...;
-- Level 2: Hourly aggregates
CREATE TABLE hourly_stats (
hour DateTime,
tenant_id UInt32,
events AggregateFunction(sum, UInt64),
users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() ORDER BY (tenant_id, hour);
CREATE MATERIALIZED VIEW hourly_mv TO hourly_stats AS
SELECT
toStartOfHour(event_time) AS hour,
tenant_id,
sumState(1) AS events,
uniqState(user_id) AS users
FROM events GROUP BY hour, tenant_id;
-- Level 3: Daily aggregates (from hourly)
CREATE TABLE daily_stats (
date Date,
tenant_id UInt32,
events AggregateFunction(sum, UInt64),
users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() ORDER BY (tenant_id, date);
CREATE MATERIALIZED VIEW daily_mv TO daily_stats AS
SELECT
toDate(hour) AS date,
tenant_id,
sumMergeState(events) AS events, -- Merge then re-State
uniqMergeState(users) AS users
FROM hourly_stats GROUP BY date, tenant_id;
Query any level:
-- Fast daily query
SELECT date, sumMerge(events), uniqMerge(users)
FROM daily_stats WHERE tenant_id = 1 GROUP BY date;
-- Drill down to hourly
SELECT hour, sumMerge(events), uniqMerge(users)
FROM hourly_stats WHERE tenant_id = 1 AND toDate(hour) = today() GROUP BY hour;
Troubleshooting
Always ask for user confirmation before creating/modifying MVs or target tables.
Wrong Aggregation Results
Problem: Counts are too high, uniq values don’t match raw data, aggregates seem doubled
Diagnose:
-- Compare MV result vs raw query
SELECT count() FROM target_table;
SELECT count() FROM source_table WHERE <same_filters>;
-- Check for duplicate keys in target
SELECT date, tenant_id, count() AS rows
FROM target_table
GROUP BY date, tenant_id
HAVING rows > 1;
Solutions:
| Cause | Fix |
|---|---|
Using uniq() with SummingMergeTree |
Switch to AggregatingMergeTree with uniqState/uniqMerge |
Forgetting -Merge in query |
Always use sumMerge(), uniqMerge() for AggregatingMergeTree |
Forgetting argMax for ReplacingMergeTree |
Use argMax pattern: SELECT key, argMax(col, version) ... GROUP BY key |
| Duplicate inserts to source | Deduplicate source or use ReplacingMergeTree for target |
-- BAD: uniq in SummingMergeTree (sums don't work)
ENGINE = SummingMergeTree()
-- SELECT uniq(user_id) AS users -- Wrong!
-- GOOD: AggregatingMergeTree with State/Merge
ENGINE = AggregatingMergeTree()
-- MV: uniqState(user_id) AS users
-- Query: uniqMerge(users)
MV Not Updating / Missing Data
Problem: Target table not receiving new data, counts stuck at old values
Diagnose:
-- Check if MV is attached
SELECT name, engine FROM system.tables WHERE engine = 'MaterializedView';
-- Check target table recent data
SELECT max(date), count() FROM target_table;
-- Verify source table is receiving inserts
SELECT max(event_time), count() FROM source_table WHERE event_time > now() - INTERVAL 1 HOUR;
Solutions:
| Cause | Fix |
|---|---|
| MV detached | ALTER TABLE source ATTACH MATERIALIZED VIEW mv_name |
| MV on wrong source table | Drop MV, recreate with correct source |
| Historical data not backfilled | Manually INSERT aggregated historical data |
| Inserts going to different table | Ensure app inserts to the MV’s source table |
-- Backfill historical data
INSERT INTO target_table
SELECT
toDate(event_time) AS date,
tenant_id,
sumState(1) AS events,
uniqState(user_id) AS users
FROM source_table
WHERE event_time < '2024-01-01' -- Before MV existed
GROUP BY date, tenant_id;
Slow Inserts After Adding MV
Problem: INSERT performance degraded after creating MV, insert latency increased
Diagnose:
-- Check MVs on this table
SELECT name, as_select FROM system.tables
WHERE engine = 'MaterializedView' AND as_select LIKE '%source_table%';
-- Check MV query complexity
EXPLAIN SELECT ... FROM source_table ...; -- Use MV's SELECT query
Solutions:
| Cause | Fix |
|---|---|
| Too many MVs on one source | Consolidate MVs or use async inserts |
| Complex MV query (JOINs, heavy transforms) | Simplify MV, move complexity to query time |
| MV target table has wrong ORDER BY | Match target ORDER BY to MV’s GROUP BY |
-- Example: MV groups by (tenant_id, date)
-- Target table ORDER BY should match
CREATE TABLE target (...)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, date); -- Matches GROUP BY in MV
Target Table Growing Too Large
Problem: MV target table larger than expected, not aggregating properly
Diagnose:
-- Check rows per key (should be 1 after merge for same GROUP BY)
SELECT date, tenant_id, count() AS rows
FROM target_table
GROUP BY date, tenant_id
ORDER BY rows DESC
LIMIT 10;
Solutions:
| Cause | Fix |
|---|---|
| ORDER BY doesn’t match GROUP BY | Recreate target with ORDER BY matching MV’s GROUP BY |
| Background merges haven’t run | Wait for automatic merge, or use argMax in query |
| Wrong engine (MergeTree instead of Summing/Aggregating) | Recreate with correct engine |
-- ORDER BY must match GROUP BY columns for proper aggregation
-- MV: GROUP BY (tenant_id, date, event_type)
-- Target: ORDER BY (tenant_id, date, event_type) -- Must match!