altinity-expert-clickhouse-index-analysis
0
总安装量
17
周安装量
安装命令
npx skills add https://github.com/altinity/skills --skill altinity-expert-clickhouse-index-analysis
Agent 安装分布
codex
16
claude-code
15
gemini-cli
5
kimi-cli
5
opencode
5
Skill 文档
Diagnostics
Run all queries from the file checks.sql and analyze the results.
Deep Dive Queries (Placeholder-Based)
EXPLAIN Index Usage for Specific Query
EXPLAIN indexes = 1
{query_without_format}
Look for:
PrimaryKeycondition should not betrue(means no filtering)Granules: X/Yratio shows selectivity (low X/Y = good)Skipindexes should reduce parts/granules further
Column Cardinality Analysis
SELECT
{columns} APPLY uniq
FROM {database}.{table}
WHERE {time_column} > now() - INTERVAL {days} DAY
Optimal ORDER BY ordering: Low cardinality columns first, high cardinality last.
Query Pattern WHERE Columns Extraction
WITH
any(query) AS q,
arrayJoin(extractAll(query, '\\b(?:PRE)?WHERE\\s+(.*?)\\s+(?:GROUP BY|ORDER BY|UNION|SETTINGS|FORMAT|$)')) AS w,
arrayFilter(x -> (position(w, extract(x, '\\.(`[^`]+`|[^\\.]+)$')) > 0), columns) AS c,
arrayJoin(c) AS c2
SELECT
c2,
count() AS usage_count
FROM system.query_log
WHERE event_time >= now() - toIntervalDay({days})
AND arrayExists(x -> x LIKE '%{table}%', tables)
AND query ILIKE 'SELECT%'
AND type = 'QueryFinish'
GROUP BY c2
ORDER BY usage_count DESC
FORMAT PrettyCompactMonoBlock
Normalized WHERE Clause Patterns
WITH
arrayJoin(extractAll(normalizeQuery(query), '\\b(?:PRE)?WHERE\\s+(.*?)\\s+(?:GROUP BY|ORDER BY|UNION|SETTINGS|FORMAT|$)')) AS w
SELECT
w AS where_pattern,
count() AS frequency
FROM system.query_log
WHERE event_time >= now() - toIntervalDay({days})
AND arrayExists(x -> x LIKE '%{table}%', tables)
AND query ILIKE 'SELECT%'
AND type = 'QueryFinish'
GROUP BY w
ORDER BY frequency DESC
LIMIT 20
Granule Selectivity from Query Log
SELECT
query_id,
normalized_query_hash,
selected_parts,
selected_marks,
read_rows,
round(read_rows / nullIf(selected_marks, 0)) AS rows_per_mark,
query_duration_ms,
formatReadableSize(read_bytes) AS read_bytes
FROM system.query_log
WHERE event_time >= now() - toIntervalDay({days})
AND arrayExists(x -> x LIKE '%{table}%', tables)
AND query ILIKE 'SELECT%'
AND type = 'QueryFinish'
ORDER BY selected_marks DESC
LIMIT 20
High selected_marks / total marks = poor index utilization.
Analysis Workflow
Step 1: Check Current Indexes
-- Table structure with ORDER BY, PRIMARY KEY, indexes
SHOW CREATE TABLE {database}.{table}
-- Skipping indexes
SELECT name, type, expr, granularity
FROM system.data_skipping_indices
WHERE database = '{database}' AND table = '{table}'
Step 2: Extract Query Patterns
Run the WHERE column extraction and normalized pattern queries to understand:
- Which columns appear most frequently in WHERE clauses
- What condition combinations are common
Step 3: Check Column Cardinalities
Compare cardinalities of columns in:
- Current ORDER BY key
- Frequently filtered columns from Step 2
Step 4: Evaluate Index Alignment
| Query Pattern | Index Support | Action |
|---|---|---|
| Filters on ORDER BY prefix | â Good | None |
| Filters on non-ORDER BY cols | â ï¸ Skip index? | Add bloom_filter or projection |
| Time range + entity | â ï¸ Check order | Time in ORDER BY or partition? |
| High-cardinality first in ORDER BY | â Bad | Reorder (lowâhigh cardinality) |
ORDER BY Design Guidelines
Column Order Principles
- Lowest cardinality first – maximizes granule skipping
- Most frequently filtered – columns in WHERE should be in ORDER BY
- Time column considerations:
- If most queries filter on time ranges â include in ORDER BY (possibly with lower resolution like
toDate(ts)) - If partition key handles time filtering â may not need in ORDER BY
- If most queries filter on time ranges â include in ORDER BY (possibly with lower resolution like
Common Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| High-cardinality UUID first | No granule skipping | Move after low-cardinality columns |
| DateTime64 microseconds first | Too granular | Use toDate() or toStartOfHour() |
| Column in WHERE not in ORDER BY | Full scan | Add to ORDER BY or create projection |
| Bloom filter on ORDER BY column | Redundant | Remove skip index |
| Time not in ORDER BY or partition | Range queries scan all | Add toDate(ts) to ORDER BY prefix |
Cardinality Ordering Example
Given cardinalities:
entity_type: 6entity: 18,588cast_hash: 335,620
Recommended ORDER BY: (entity_type, entity, cast_hash, ...)
Skipping Index Guidelines
When Skip Indexes Help
- Column NOT in ORDER BY
- Column values correlate with physical data order
- Low false-positive rate for the index type
When Skip Indexes Don’t Help
- Column already in ORDER BY prefix (use PRIMARY KEY instead)
- Column values randomly distributed (no correlation with ORDER BY)
- Very high cardinality with set/bloom_filter