software-backend
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill software-backend
Agent 安装分布
Skill 文档
Software Backend Engineering
Use this skill to design, implement, and review production-grade backend services: API boundaries, data layer, auth, caching, observability, error handling, testing, and deployment.
Defaults to bias toward: type-safe boundaries (validation at the edge), OpenTelemetry for observability, zero-trust assumptions, idempotency for retries, RFC 9457 errors, Postgres + pooling, structured logs, timeouts, and rate limiting.
Quick Reference
| Task | Default Picks | Notes |
|---|---|---|
| REST API | Fastify / Express / NestJS | Prefer typed boundaries + explicit timeouts |
| Edge API | Hono / platform-native handlers | Keep work stateless, CPU-light |
| Type-Safe API | tRPC | Prefer for TS monorepos and internal APIs |
| GraphQL API | Apollo Server / Pothos | Prefer for complex client-driven queries |
| Database | PostgreSQL | Use pooling + migrations + query budgets |
| ORM / Query Layer | Prisma / Drizzle / SQLAlchemy / GORM / SeaORM | Prefer explicit transactions |
| Authentication | OIDC/OAuth + sessions/JWT | Prefer httpOnly cookies for browsers |
| Validation | Zod / Pydantic / validator libs | Validate at the boundary, not deep inside |
| Caching | Redis (or managed) | Use TTLs + invalidation strategy |
| Background Jobs | BullMQ / platform queues | Make jobs idempotent + retry-safe |
| Testing | Unit + integration + contract/E2E | Keep most tests below the UI layer |
| Observability | Structured logs + OpenTelemetry | Correlation IDs end-to-end |
Scope
Use this skill to:
- Design and implement REST/GraphQL/tRPC APIs
- Model data schemas and run safe migrations
- Implement authentication/authorization (OIDC/OAuth, sessions/JWT)
- Add validation, error handling, rate limiting, caching, and background jobs
- Ship production readiness (timeouts, observability, deploy/runbooks)
When NOT to Use This Skill
Use a different skill when:
- Frontend-only concerns -> See software-frontend
- Infrastructure provisioning (Terraform, K8s manifests) -> See ops-devops-platform
- API design patterns only (no implementation) -> See dev-api-design
- SQL query optimization and indexing -> See data-sql-optimization
- Security audits and threat modeling -> See software-security-appsec
- System architecture (beyond single service) -> See software-architecture-design
Decision Tree: Backend Technology Selection
Backend project needs: [API Type]
- REST API?
- Simple CRUD -> Express/Fastify + Prisma/Drizzle
- Enterprise features -> NestJS (DI, modules)
- High performance -> Fastify (tight request lifecycle)
- Edge/Serverless -> Hono (Cloudflare Workers, Vercel Edge)
- Type-Safe API?
- Full-stack TypeScript monorepo -> tRPC (no schema, no codegen)
- Public API with docs -> REST + OpenAPI
- Flexible data fetching -> GraphQL + Pothos/Apollo
- GraphQL API?
- Code-first -> Pothos GraphQL (TypeScript)
- Schema-first -> Apollo Server + GraphQL Codegen
- Runtime Selection?
- Enterprise stable -> Node.js (current LTS)
- Performance-critical -> Bun (verify runtime constraints)
- Security-focused -> Deno (verify platform support)
- Authentication Strategy?
- Browser sessions -> httpOnly cookies + server-side session store
- OAuth/Social -> OIDC/OAuth library (or platform auth)
- Service-to-service -> short-lived JWT + mTLS where possible
- Database Layer?
- Type-safe ORM -> Prisma (migrations, Studio)
- SQL-first/perf -> Drizzle (SQL-like API)
- Raw SQL -> driver + query builder (Kysely/sqlc/SQLx)
- Edge-compatible -> driver/ORM + Neon/Turso/D1
- Caching Strategy?
- Distributed cache -> Redis (multi-server)
- Serverless cache -> managed Redis (e.g., Upstash)
- In-memory cache -> process memory (single instance only)
- Edge Deployment?
- Global low-latency -> Cloudflare Workers
- Next.js integration -> Vercel Edge Functions
- AWS ecosystem -> Lambda@Edge
- Background Jobs?
- Complex workflows -> BullMQ (Redis-backed, retries)
- Serverless workflows -> AWS Step Functions
- Simple scheduling -> cron + durable storage
Runtime & Language Alternatives:
- Node.js (current LTS) (Express/Fastify/NestJS + Prisma/Drizzle): default for broad ecosystem + mature tooling
- Bun (Hono/Elysia + Drizzle): consider for perf-sensitive workloads (verify runtime constraints)
- Python (FastAPI + SQLAlchemy): strong for data-heavy services and ML integration
- Go (Fiber/Gin + GORM/sqlc): strong for concurrency and simple deploys
- Rust (Axum + SeaORM/SQLx): strong for safety/performance-critical services
See assets/ for language-specific starter templates and references/edge-deployment-guide.md for edge computing patterns.
API Design Patterns (Dec 2025)
Idempotency Patterns
All mutating operations MUST support idempotency for retry safety.
Implementation:
// Idempotency key header
const idempotencyKey = request.headers['idempotency-key'];
const cached = await redis.get(`idem:${idempotencyKey}`);
if (cached) return JSON.parse(cached);
const result = await processOperation();
await redis.set(`idem:${idempotencyKey}`, JSON.stringify(result), 'EX', 86400);
return result;
| Do | Avoid |
|---|---|
| Store idempotency keys with TTL (24h typical) | Processing duplicate requests |
| Return cached response for duplicate keys | Different responses for same key |
| Use client-generated UUIDs | Server-generated keys |
Pagination Patterns
| Pattern | Use When | Example |
|---|---|---|
| Cursor-based | Large datasets, real-time data | ?cursor=abc123&limit=20 |
| Offset-based | Small datasets, random access | ?page=3&per_page=20 |
| Keyset | Sorted data, high performance | ?after_id=1000&limit=20 |
Prefer cursor-based pagination for APIs with frequent inserts.
Error Response Standard (Problem Details)
Use a consistent machine-readable error format (RFC 9457 Problem Details): https://www.rfc-editor.org/rfc/rfc9457
{
"type": "https://example.com/problems/invalid-request",
"title": "Invalid request",
"status": 400,
"detail": "email is required",
"instance": "/v1/users"
}
Health Check Patterns
// Liveness: Is the process running?
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'ok' });
});
// Readiness: Can the service handle traffic?
app.get('/health/ready', async (req, res) => {
const dbOk = await checkDatabase();
const cacheOk = await checkRedis();
if (dbOk && cacheOk) {
res.status(200).json({ status: 'ready', db: 'ok', cache: 'ok' });
} else {
res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk });
}
});
Migration Rollback Strategies
| Strategy | Description | Use When |
|---|---|---|
| Backward-compatible | New code works with old schema | Zero-downtime deployments |
| Expand-contract | Add new, migrate, remove old | Schema changes |
| Shadow tables | Write to both during transition | High-risk migrations |
Common Backend Mistakes to Avoid
| FAIL Avoid | PASS Instead | Why |
|---|---|---|
| Storing sessions in memory | Use Redis/Upstash | Memory lost on restart, no horizontal scaling |
| Synchronous file I/O | Use fs.promises or streams |
Blocks event loop, kills throughput |
| Unbounded queries | Always use LIMIT + cursor pagination |
Memory exhaustion, slow responses |
| Trusting client input | Validate with Zod at API boundaries | Injection attacks, type coercion bugs |
| Hardcoded secrets | Use env vars + secret manager (Vault, AWS SM) | Security breach on repo exposure |
| N+1 database queries | Use include/select or DataLoader |
10-100x performance degradation |
console.log in production |
Use structured logging (Pino/Winston) | No correlation IDs, unqueryable logs |
| Catching errors silently | Log + rethrow or handle explicitly | Hidden failures, debugging nightmares |
| Missing connection pooling | Use Prisma connection pool or PgBouncer | Connection exhaustion under load |
| No request timeouts | Set timeouts on HTTP clients and DB queries | Resource leaks, cascading failures |
Security anti-patterns:
- FAIL Don’t use MD5/SHA1 for passwords -> Use Argon2id
- FAIL Don’t store JWTs in localStorage -> Use httpOnly cookies
- FAIL Don’t trust
X-Forwarded-Forwithout validation -> Configure trusted proxies - FAIL Don’t skip rate limiting -> Use sliding window (Redis) or token bucket
- FAIL Don’t log sensitive data -> Redact PII, tokens, passwords
Optional: AI/Automation Extensions
Note: AI-assisted backend patterns. Skip if not using AI tooling.
AI-Assisted Code Generation
| Tool | Use Case |
|---|---|
| GitHub Copilot | Inline suggestions, boilerplate |
| Cursor | AI-first IDE, context-aware |
| Claude Code | CLI-based development |
Review requirements for AI-generated code:
- All imports verified against package.json
- Type checker passes (strict mode)
- Security scan passes
- Tests cover generated code
Infrastructure Economics and Business Impact
Why this matters: Backend decisions directly impact revenue. A 100ms latency increase can reduce conversions by 7%. A poorly chosen architecture can cost 10x more in cloud spend. Performance SLAs are revenue commitments.
Cost Modeling Quick Reference
| Decision | Cost Impact | Revenue Impact |
|---|---|---|
| Edge vs. Origin | 60-80% latency reduction | +2-5% conversion rate |
| Serverless vs. Containers | Variable cost, scales to zero | Better unit economics at low scale |
| Reserved vs. On-Demand | 30-60% cost savings | Predictable COGS |
| Connection pooling | 50-70% fewer DB connections | Lower database costs |
| Caching layer | 80-95% fewer origin requests | Reduced compute costs |
Performance SLA -> Revenue Mapping
SLA Target -> Business Metric
P50 latency < 100ms -> Baseline user experience
P95 latency < 500ms -> 95% users satisfied
P99 latency < 1000ms -> Enterprise SLA compliance
Uptime 99.9% (43.8m downtime/month) -> Standard SLA tier
Uptime 99.99% (4.4m downtime/month) -> Enterprise tier ($$$)
Unit Economics Checklist
Before deploying any backend service, calculate:
- Cost per request: Total infra cost / monthly requests
- Cost per user: Total infra cost / MAU
- Gross margin impact: How does infra cost affect product margin?
- Scale economics: At 10x traffic, does cost scale linearly or worse?
- Break-even point: At what traffic level does this architecture pay for itself?
Architecture Decision -> Business Impact
| Architecture Choice | Technical Benefit | Business Impact |
|---|---|---|
| CDN + Edge caching | Lower latency | Higher conversion, better SEO |
| Read replicas | Scale reads | Handle traffic spikes without degradation |
| Queue-based processing | Decouple services | Smoother UX during high load |
| Multi-region deployment | Fault tolerance | Enterprise SLA compliance |
| Auto-scaling | Right-sized infra | Lower COGS, better margins |
FinOps Practices for Backend Teams
- Tag all resources – Every resource tagged with
team,service,environment - Set billing alerts – Alert at 50%, 80%, 100% of budget
- Review weekly – 15-minute weekly cost review meeting
- Right-size monthly – Check CPU/memory utilization, downsize overprovisioned
- Spot/Preemptible for non-prod – 60-90% savings on dev/staging
See references/infrastructure-economics.md for detailed cost modeling, cloud provider comparisons, and ROI calculators.
Navigation
Resources
- references/backend-best-practices.md – Template authoring guide, quality checklist, and shared utilities pointers
- references/edge-deployment-guide.md – Edge computing patterns, Cloudflare Workers vs Vercel Edge, tRPC, Hono, Bun
- references/infrastructure-economics.md – Cost modeling, performance SLAs -> revenue, FinOps practices, cloud optimization
- references/go-best-practices.md – Go idioms, concurrency, error handling, GORM usage, testing, profiling
- references/rust-best-practices.md – Ownership, async, Axum, SeaORM, error handling, testing
- references/python-best-practices.md – FastAPI, SQLAlchemy, async patterns, validation, testing, performance
- data/sources.json – External references per language/runtime
- Shared checklists: ../software-clean-code-standard/assets/checklists/backend-api-review-checklist.md, ../software-clean-code-standard/assets/checklists/secure-code-review-checklist.md
Shared Utilities (Centralized patterns – extract, don’t duplicate)
- ../software-clean-code-standard/utilities/auth-utilities.md – Argon2id, jose JWT, OAuth 2.1/PKCE
- ../software-clean-code-standard/utilities/error-handling.md – Effect Result types, correlation IDs
- ../software-clean-code-standard/utilities/config-validation.md – Zod 3.24+, Valibot, secrets management
- ../software-clean-code-standard/utilities/resilience-utilities.md – p-retry v6, opossum v8, OTel spans
- ../software-clean-code-standard/utilities/logging-utilities.md – pino v9 + OpenTelemetry integration
- ../software-clean-code-standard/utilities/testing-utilities.md – Vitest, MSW v2, factories, fixtures
- ../software-clean-code-standard/utilities/observability-utilities.md – OpenTelemetry SDK, tracing, metrics
- ../software-clean-code-standard/references/clean-code-standard.md – Canonical clean code rules (
CC-*) for citation
Templates
- assets/nodejs/template-nodejs-prisma-postgres.md – Node.js + Prisma + PostgreSQL
- assets/go/template-go-fiber-gorm.md – Go + Fiber + GORM + PostgreSQL
- assets/rust/template-rust-axum-seaorm.md – Rust + Axum + SeaORM + PostgreSQL
- assets/python/template-python-fastapi-sqlalchemy.md – Python + FastAPI + SQLAlchemy + PostgreSQL
Related Skills
- ../software-architecture-design/SKILL.md – System decomposition, SLAs, and data flows
- ../software-security-appsec/SKILL.md – Authentication/authorization and secure API design
- ../ops-devops-platform/SKILL.md – CI/CD, infrastructure, and deployment safety
- ../qa-resilience/SKILL.md – Resilience, retries, and failure playbooks
- ../software-code-review/SKILL.md – Review checklists and standards for backend changes
- ../qa-testing-strategy/SKILL.md – Testing strategies, test pyramids, and coverage goals
- ../dev-api-design/SKILL.md – RESTful design, GraphQL, and API versioning patterns
- ../data-sql-optimization/SKILL.md – SQL optimization, indexing, and query tuning patterns
Freshness Protocol
When users ask version-sensitive recommendation questions, do a quick freshness check before asserting “best” choices or quoting versions.
Trigger Conditions
- “What’s the best backend framework for [use case]?”
- “What should I use for [API design/auth/database]?”
- “What’s the latest in Node.js/Go/Rust?”
- “Current best practices for [REST/GraphQL/tRPC]?”
- “Is [framework/runtime] still relevant in 2026?”
- “[Express] vs [Fastify] vs [Hono]?”
- “Best ORM for [database/use case]?”
How to Freshness-Check
- Start from
data/sources.json(official docs, release notes, support policies). - Run a targeted web search for the specific component and open release notes/support policy pages.
- Prefer official sources over blogs for versions and support windows.
What to Report
- Current landscape: what is stable and widely used now
- Emerging trends: what is gaining traction (and why)
- Deprecated/declining: what is falling out of favor (and why)
- Recommendation: default choice + 1-2 alternatives, with trade-offs
Example Topics (verify with fresh search)
- Node.js LTS support window and major changes
- Bun vs Deno vs Node.js
- Hono, Elysia, and edge-first frameworks
- Drizzle vs Prisma for TypeScript
- tRPC and end-to-end type safety
- Edge computing and serverless patterns
Operational Playbooks
- references/operational-playbook.md – Full backend architecture patterns, checklists, TypeScript notes, and decision tables