managing-databases
npx skills add https://github.com/rileyhilliard/claude-essentials --skill managing-databases
Agent 安装分布
Skill 文档
Database Management
Decision guidance for PostgreSQL, DuckDB, and Parquet in hybrid storage architectures.
Contents
- When to use which database
- PostgreSQL quick reference
- DuckDB quick reference
- Parquet quick reference
- PGVector quick reference
- Cross-database conventions
- Performance debugging checklist
When to use which database
| Workload | Use | Why |
|---|---|---|
| Transactional (CRUD, users, sessions) | PostgreSQL | ACID, row-level locking, indexes |
| Analytical (aggregations, scans) | DuckDB | Columnar, vectorized, parallel |
| Data storage/interchange | Parquet | Compressed, columnar, portable |
| Metadata + relationships | PostgreSQL | Foreign keys, constraints |
| Ad-hoc exploration | DuckDB | Fast on Parquet, no ETL needed |
| Time-series with point lookups | PostgreSQL + partitioning | Partition pruning + indexes |
| Time-series analytics | DuckDB on Parquet | Scan performance |
| Vector similarity search | PostgreSQL + PGVector | HNSW/IVFFlat indexes, hybrid search |
| RAG / semantic search | PostgreSQL + PGVector | Embeddings + metadata in same DB |
Hybrid pattern example:
- PostgreSQL: transactional data, relationships, users (metadata)
- DuckDB + Parquet: analytical content, aggregations, time-series
PostgreSQL quick reference
Use for: Metadata, relationships, OLTP workloads, anything needing ACID.
Key decisions:
- Partition tables >100M rows or with retention requirements
- Index columns in WHERE/JOIN clauses, not everything
- Tune autovacuum for high-churn tables
See references/postgres-architecture.md for maintenance patterns. See references/postgres-querying.md for advanced query techniques.
DuckDB quick reference
Use for: Analytics, aggregations, Parquet queries, data exploration.
Key decisions:
- Prefer Parquet files over CSV (10-100x faster)
- Let DuckDB auto-parallelize; don’t micro-optimize
- For remote data, increase threads beyond CPU count
See references/duckdb-architecture.md for storage and parallelism. See references/duckdb-querying.md for DuckDB-specific SQL features.
Parquet quick reference
Use for: Storing analytical data, data interchange, columnar compression.
Key decisions:
- Target 128MB-1GB file sizes
- Partition by low-to-moderate cardinality columns (date, region)
- Sort by columns used in filters for better pruning
See references/parquet-architecture.md for file design. See references/parquet-querying.md for query optimization.
PGVector quick reference
Use for: Similarity search, RAG applications, semantic search, recommendations.
Key decisions:
- HNSW for low-latency, high-recall (default choice)
- IVFFlat for memory-constrained or batch-updated data
- Use iterative scan for filtered queries
- Consider hybrid search (vector + keyword) for 8-15% accuracy boost
See references/pgvector-architecture.md for index configuration. See references/pgvector-querying.md for hybrid search and filtering.
Cross-database conventions
Naming
| Convention | Example | Applies to |
|---|---|---|
| snake_case tables | dataset_jobs |
All |
| snake_case columns | created_at |
All |
| Singular table names | dataset not datasets |
PostgreSQL |
| Plural for collections | datasets/ directory |
Parquet files |
Normalization decisions
| Pattern | When to normalize | When to denormalize |
|---|---|---|
| Lookup tables | PostgreSQL, changes frequently | DuckDB/Parquet, static data |
| Repeated values | PostgreSQL, storage matters | Parquet, compression handles it |
| Joins at query time | PostgreSQL, complex relationships | Parquet, pre-join for analytics |
Timestamps
- Store as UTC always
- PostgreSQL:
TIMESTAMPTZ - Parquet:
TIMESTAMPwithisAdjustedToUTC=true - DuckDB: reads both correctly
Performance debugging checklist
PostgreSQL slow query
- Run
EXPLAIN (ANALYZE, BUFFERS)on the query - Check for sequential scans on large tables
- Verify indexes exist on filter/join columns
- Check
pg_stat_user_tablesfor bloat (dead tuples) - Review
work_memif seeing disk sorts
DuckDB slow query
- Check if reading CSV instead of Parquet
- Verify not doing
SELECT *on remote data - Check thread count matches workload
- Look for unnecessary type conversions
Parquet slow reads
- Verify predicate pushdown is working (check query plan)
- Check file sizes (too small = overhead, too large = no parallelism)
- Confirm data is sorted by filter columns
- Look for high-cardinality partition keys (too many small files)
PGVector slow search
- Verify index exists and is being used (EXPLAIN)
- Check
ef_search(HNSW) orprobes(IVFFlat) settings - Enable iterative scan for filtered queries
- Check if IVFFlat recall degraded (rebuild index if heavily updated)
- Consider partial indexes for common filters