database-decision
npx skills add https://github.com/lviladrich/database-decision --skill database-decision
Agent 安装分布
Skill 文档
Database Decision Framework
CRITICAL: DECIDE BEFORE YOU CODE
STOP. Before writing any database code, schema, or query â justify the choice first. Perfect code on the wrong foundation is technical debt from day one.
STEP 1: GATHER REQUIREMENTS
Ask the user if not provided:
- Data shape: structured / semi-structured / graph / time-series / objects?
- Schema stability: fixed or frequently changing?
- Read/Write ratio: read-heavy, write-heavy, balanced?
- Volume & scale: MBsâPBs? Users: tensâmillions? Single region or global?
- Consistency needs: strict ACID required? Eventual consistency acceptable?
- Business: team expertise, budget, existing infra, time pressure?
STEP 2: SELECTION MATRIX
Relational (PostgreSQL, MySQL) â STRUCTURED + STRONG CONSISTENCY
Choose when: stable schema, ACID mandatory, complex JOINs, referential integrity critical. CAP: CA. Signals: “users table”, “orders”, “bank transactions”, “billing”, “ERP”. AVOID: schema changes often, hierarchical data, need 100s of nodes, millions writes/sec.
Document (MongoDB) â FLEXIBLE SCHEMA + HIERARCHICAL DATA
Choose when: variable attributes per record, nested/JSON data, schema evolves fast, need sharding. CAP: CP. Sharding: by range, hash, or geographic. Signals: “product catalog”, “user profiles”, “CMS”, “different fields per record”. AVOID: highly relational with many JOINs, simple key-value pairs.
Key-Value (Redis) â SPEED + SIMPLE ACCESS
Choose when: sub-millisecond latency, get/set by key, caching, sessions, counters, leaderboards, queues. CAP: CP (cluster). Structures: Strings, Lists, Sets, Hashes, Sorted Sets, Streams. Signals: “cache”, “sessions”, “real-time”, “leaderboard”, “rate limiter”, “shopping cart”. AVOID: data exceeds RAM, complex queries needed, need persistent-first storage.
Graph (Neo4j) â RELATIONSHIP-CENTRIC DATA
Choose when: relationships ARE the value, path-finding, recommendations, fraud detection, social networks. CAP: CA. Clustering: Core servers (Raft, M=2F+1) + Read Replicas. Causal consistency. Signals: “relationships”, “recommendations”, “fraud”, “knowledge graph”, “who knows whom”. AVOID: massive distributed writes, tabular data without relationships, bulk updates on millions of nodes.
Wide-Column (Cassandra) â MASSIVE WRITES + GLOBAL DISTRIBUTION
Choose when: write-heavy (IoT, logs, metrics), multi-datacenter, known query patterns, linear scaling, zero downtime. CAP: AP. Consistency tuning: R+W>N = strong; W=1,R=N = fast writes; Quorum: W>N/2. Signals: “IoT”, “sensor data”, “event log”, “metrics”, “multi-region”, “always available”. AVOID: ad-hoc queries, JOINs, unpredictable read patterns, frequently changing data model.
Object DB (InterSystems IRIS) â COMPLEX OBJECT PERSISTENCE
Choose when: deep OOP hierarchies, no ORM needed, complex domains (healthcare, finance, manufacturing). Signals: “complex objects”, “inheritance”, “medical records”, “financial instruments”, “multi-model”. AVOID: simple structures, need max horizontal scaling, open-source requirement.
Time-Series (InfluxDB) â TIME-INDEXED DATA + ANALYTICS
Choose when: data indexed by time, monitoring/metrics/IoT, downsampling, retention policies, scientific data. Indexing: R-trees, KD-trees, LSM-trees. Compression: 90%+ reduction. Signals: “metrics”, “monitoring”, “Grafana”, “dashboards”, “sensor data over time”. AVOID: data not time-indexed, need relationships, general-purpose CRUD.
STEP 3: POLYGLOT PERSISTENCE
Most real systems need multiple databases. For each component independently ask: what’s the data shape, access pattern, consistency need, and performance requirement?
Classic E-Commerce (Polyglot)
| Component | DB | Why |
|---|---|---|
| Users & Auth | PostgreSQL | ACID, strong consistency, stable schema |
| Product Catalog | MongoDB | Variable attributes, flexible schema |
| Shopping Cart | Redis | Sub-ms latency, temporary, high concurrency |
| Recommendations | Neo4j | Graph traversal: “users who bought X also bought Y” |
| Order History | Cassandra | High writes, time-partitioned, multi-region |
| System Metrics | InfluxDB | Time-indexed, downsampling, dashboards |
ACID vs BASE by Component
- ACID required: user accounts, payments, orders, inventory
- BASE acceptable: recommendations, caches, sessions, analytics, feeds
STEP 4: JUSTIFICATION TEMPLATE
ALWAYS produce this for every database choice:
COMPONENT: [Name]
DATABASE: [Technology] ([Type])
DATA: shape, volume, schema stability
ACCESS: R/W ratio, query type, latency need
CONSISTENCY: ACID/eventual/tunable â CAP priority
SCALE: vertical/horizontal, distribution
REJECTED ALTERNATIVES: [why each was worse]
TRADE-OFFS ACCEPTED: [what you sacrifice and why]
STEP 5: ANTI-PATTERNS â STOP THE USER
- “MongoDB for everything” â Not a relational replacement. Push back if JOINs/ACID needed.
- “PostgreSQL for everything” â Not optimal for graph traversals, massive writes, sub-ms caching, or variable schemas at scale.
- “Redis as primary DB” â It’s a cache. Always pair with persistent store.
- “Neo4j for tabular data” â No benefit without relationship traversal.
- “Cassandra for ad-hoc queries” â Requires one table per query pattern. Unknown patterns = pain.
- “Single DB for complex system” â 3+ different access patterns = suggest polyglot.
- “Choosing by popularity” â “Everyone uses X” is not a justification. Defend with data shape + access patterns.
RESPONSE BEHAVIOR
- NEVER jump to code. Start with decision framework.
- Ask clarifying questions if ambiguous (STEP 1).
- Recommend type first, then specific technology.
- Always produce justification (STEP 4), even for simple cases.
- Suggest polyglot when appropriate (STEP 3).
- Call out anti-patterns proactively (STEP 5).
- Only after justification, help with schemas/queries/code.
- If user already chose, validate against requirements first.
QUICK REFERENCE
- “I need ACID” â PostgreSQL/MySQL
- “Schema changes a lot” â MongoDB
- “Need it in <1ms” â Redis
- “Relationships ARE my data” â Neo4j
- “Millions writes/sec, global” â Cassandra
- “Complex objects with inheritance” â IRIS
- “Time-indexed metrics” â InfluxDB
- “3+ different data needs” â Polyglot architecture