software-backend

📁 vasilyu1983/ai-agents-public 📅 Jan 23, 2026
35
总安装量
35
周安装量
#5864
全站排名
安装命令
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill software-backend

Agent 安装分布

claude-code 21
gemini-cli 20
opencode 20
codex 19
antigravity 17

Skill 文档

Software Backend Engineering

Use this skill to design, implement, and review production-grade backend services: API boundaries, data layer, auth, caching, observability, error handling, testing, and deployment.

Defaults to bias toward: type-safe boundaries (validation at the edge), OpenTelemetry for observability, zero-trust assumptions, idempotency for retries, RFC 9457 errors, Postgres + pooling, structured logs, timeouts, and rate limiting.


Quick Reference

Task Default Picks Notes
REST API Fastify / Express / NestJS Prefer typed boundaries + explicit timeouts
Edge API Hono / platform-native handlers Keep work stateless, CPU-light
Type-Safe API tRPC Prefer for TS monorepos and internal APIs
GraphQL API Apollo Server / Pothos Prefer for complex client-driven queries
Database PostgreSQL Use pooling + migrations + query budgets
ORM / Query Layer Prisma / Drizzle / SQLAlchemy / GORM / SeaORM Prefer explicit transactions
Authentication OIDC/OAuth + sessions/JWT Prefer httpOnly cookies for browsers
Validation Zod / Pydantic / validator libs Validate at the boundary, not deep inside
Caching Redis (or managed) Use TTLs + invalidation strategy
Background Jobs BullMQ / platform queues Make jobs idempotent + retry-safe
Testing Unit + integration + contract/E2E Keep most tests below the UI layer
Observability Structured logs + OpenTelemetry Correlation IDs end-to-end

Scope

Use this skill to:

  • Design and implement REST/GraphQL/tRPC APIs
  • Model data schemas and run safe migrations
  • Implement authentication/authorization (OIDC/OAuth, sessions/JWT)
  • Add validation, error handling, rate limiting, caching, and background jobs
  • Ship production readiness (timeouts, observability, deploy/runbooks)

When NOT to Use This Skill

Use a different skill when:

Decision Tree: Backend Technology Selection

Backend project needs: [API Type]
  - REST API?
    - Simple CRUD -> Express/Fastify + Prisma/Drizzle
    - Enterprise features -> NestJS (DI, modules)
    - High performance -> Fastify (tight request lifecycle)
    - Edge/Serverless -> Hono (Cloudflare Workers, Vercel Edge)

  - Type-Safe API?
    - Full-stack TypeScript monorepo -> tRPC (no schema, no codegen)
    - Public API with docs -> REST + OpenAPI
    - Flexible data fetching -> GraphQL + Pothos/Apollo

  - GraphQL API?
    - Code-first -> Pothos GraphQL (TypeScript)
    - Schema-first -> Apollo Server + GraphQL Codegen

  - Runtime Selection?
    - Enterprise stable -> Node.js (current LTS)
    - Performance-critical -> Bun (verify runtime constraints)
    - Security-focused -> Deno (verify platform support)

  - Authentication Strategy?
    - Browser sessions -> httpOnly cookies + server-side session store
    - OAuth/Social -> OIDC/OAuth library (or platform auth)
    - Service-to-service -> short-lived JWT + mTLS where possible

  - Database Layer?
    - Type-safe ORM -> Prisma (migrations, Studio)
    - SQL-first/perf -> Drizzle (SQL-like API)
    - Raw SQL -> driver + query builder (Kysely/sqlc/SQLx)
    - Edge-compatible -> driver/ORM + Neon/Turso/D1

  - Caching Strategy?
    - Distributed cache -> Redis (multi-server)
    - Serverless cache -> managed Redis (e.g., Upstash)
    - In-memory cache -> process memory (single instance only)

  - Edge Deployment?
    - Global low-latency -> Cloudflare Workers
    - Next.js integration -> Vercel Edge Functions
    - AWS ecosystem -> Lambda@Edge

  - Background Jobs?
    - Complex workflows -> BullMQ (Redis-backed, retries)
    - Serverless workflows -> AWS Step Functions
    - Simple scheduling -> cron + durable storage

Runtime & Language Alternatives:

  • Node.js (current LTS) (Express/Fastify/NestJS + Prisma/Drizzle): default for broad ecosystem + mature tooling
  • Bun (Hono/Elysia + Drizzle): consider for perf-sensitive workloads (verify runtime constraints)
  • Python (FastAPI + SQLAlchemy): strong for data-heavy services and ML integration
  • Go (Fiber/Gin + GORM/sqlc): strong for concurrency and simple deploys
  • Rust (Axum + SeaORM/SQLx): strong for safety/performance-critical services

See assets/ for language-specific starter templates and references/edge-deployment-guide.md for edge computing patterns.


API Design Patterns (Dec 2025)

Idempotency Patterns

All mutating operations MUST support idempotency for retry safety.

Implementation:

// Idempotency key header
const idempotencyKey = request.headers['idempotency-key'];
const cached = await redis.get(`idem:${idempotencyKey}`);
if (cached) return JSON.parse(cached);

const result = await processOperation();
await redis.set(`idem:${idempotencyKey}`, JSON.stringify(result), 'EX', 86400);
return result;
Do Avoid
Store idempotency keys with TTL (24h typical) Processing duplicate requests
Return cached response for duplicate keys Different responses for same key
Use client-generated UUIDs Server-generated keys

Pagination Patterns

Pattern Use When Example
Cursor-based Large datasets, real-time data ?cursor=abc123&limit=20
Offset-based Small datasets, random access ?page=3&per_page=20
Keyset Sorted data, high performance ?after_id=1000&limit=20

Prefer cursor-based pagination for APIs with frequent inserts.

Error Response Standard (Problem Details)

Use a consistent machine-readable error format (RFC 9457 Problem Details): https://www.rfc-editor.org/rfc/rfc9457

{
  "type": "https://example.com/problems/invalid-request",
  "title": "Invalid request",
  "status": 400,
  "detail": "email is required",
  "instance": "/v1/users"
}

Health Check Patterns

// Liveness: Is the process running?
app.get('/health/live', (req, res) => {
  res.status(200).json({ status: 'ok' });
});

// Readiness: Can the service handle traffic?
app.get('/health/ready', async (req, res) => {
  const dbOk = await checkDatabase();
  const cacheOk = await checkRedis();
  if (dbOk && cacheOk) {
    res.status(200).json({ status: 'ready', db: 'ok', cache: 'ok' });
  } else {
    res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk });
  }
});

Migration Rollback Strategies

Strategy Description Use When
Backward-compatible New code works with old schema Zero-downtime deployments
Expand-contract Add new, migrate, remove old Schema changes
Shadow tables Write to both during transition High-risk migrations

Common Backend Mistakes to Avoid

FAIL Avoid PASS Instead Why
Storing sessions in memory Use Redis/Upstash Memory lost on restart, no horizontal scaling
Synchronous file I/O Use fs.promises or streams Blocks event loop, kills throughput
Unbounded queries Always use LIMIT + cursor pagination Memory exhaustion, slow responses
Trusting client input Validate with Zod at API boundaries Injection attacks, type coercion bugs
Hardcoded secrets Use env vars + secret manager (Vault, AWS SM) Security breach on repo exposure
N+1 database queries Use include/select or DataLoader 10-100x performance degradation
console.log in production Use structured logging (Pino/Winston) No correlation IDs, unqueryable logs
Catching errors silently Log + rethrow or handle explicitly Hidden failures, debugging nightmares
Missing connection pooling Use Prisma connection pool or PgBouncer Connection exhaustion under load
No request timeouts Set timeouts on HTTP clients and DB queries Resource leaks, cascading failures

Security anti-patterns:

  • FAIL Don’t use MD5/SHA1 for passwords -> Use Argon2id
  • FAIL Don’t store JWTs in localStorage -> Use httpOnly cookies
  • FAIL Don’t trust X-Forwarded-For without validation -> Configure trusted proxies
  • FAIL Don’t skip rate limiting -> Use sliding window (Redis) or token bucket
  • FAIL Don’t log sensitive data -> Redact PII, tokens, passwords

Optional: AI/Automation Extensions

Note: AI-assisted backend patterns. Skip if not using AI tooling.

AI-Assisted Code Generation

Tool Use Case
GitHub Copilot Inline suggestions, boilerplate
Cursor AI-first IDE, context-aware
Claude Code CLI-based development

Review requirements for AI-generated code:

  • All imports verified against package.json
  • Type checker passes (strict mode)
  • Security scan passes
  • Tests cover generated code

Infrastructure Economics and Business Impact

Why this matters: Backend decisions directly impact revenue. A 100ms latency increase can reduce conversions by 7%. A poorly chosen architecture can cost 10x more in cloud spend. Performance SLAs are revenue commitments.

Cost Modeling Quick Reference

Decision Cost Impact Revenue Impact
Edge vs. Origin 60-80% latency reduction +2-5% conversion rate
Serverless vs. Containers Variable cost, scales to zero Better unit economics at low scale
Reserved vs. On-Demand 30-60% cost savings Predictable COGS
Connection pooling 50-70% fewer DB connections Lower database costs
Caching layer 80-95% fewer origin requests Reduced compute costs

Performance SLA -> Revenue Mapping

SLA Target -> Business Metric

P50 latency < 100ms -> Baseline user experience
P95 latency < 500ms -> 95% users satisfied
P99 latency < 1000ms -> Enterprise SLA compliance
Uptime 99.9% (43.8m downtime/month) -> Standard SLA tier
Uptime 99.99% (4.4m downtime/month) -> Enterprise tier ($$$)

Unit Economics Checklist

Before deploying any backend service, calculate:

  • Cost per request: Total infra cost / monthly requests
  • Cost per user: Total infra cost / MAU
  • Gross margin impact: How does infra cost affect product margin?
  • Scale economics: At 10x traffic, does cost scale linearly or worse?
  • Break-even point: At what traffic level does this architecture pay for itself?

Architecture Decision -> Business Impact

Architecture Choice Technical Benefit Business Impact
CDN + Edge caching Lower latency Higher conversion, better SEO
Read replicas Scale reads Handle traffic spikes without degradation
Queue-based processing Decouple services Smoother UX during high load
Multi-region deployment Fault tolerance Enterprise SLA compliance
Auto-scaling Right-sized infra Lower COGS, better margins

FinOps Practices for Backend Teams

  1. Tag all resources – Every resource tagged with team, service, environment
  2. Set billing alerts – Alert at 50%, 80%, 100% of budget
  3. Review weekly – 15-minute weekly cost review meeting
  4. Right-size monthly – Check CPU/memory utilization, downsize overprovisioned
  5. Spot/Preemptible for non-prod – 60-90% savings on dev/staging

See references/infrastructure-economics.md for detailed cost modeling, cloud provider comparisons, and ROI calculators.


Navigation

Resources

Shared Utilities (Centralized patterns – extract, don’t duplicate)

Templates

Related Skills


Freshness Protocol

When users ask version-sensitive recommendation questions, do a quick freshness check before asserting “best” choices or quoting versions.

Trigger Conditions

  • “What’s the best backend framework for [use case]?”
  • “What should I use for [API design/auth/database]?”
  • “What’s the latest in Node.js/Go/Rust?”
  • “Current best practices for [REST/GraphQL/tRPC]?”
  • “Is [framework/runtime] still relevant in 2026?”
  • “[Express] vs [Fastify] vs [Hono]?”
  • “Best ORM for [database/use case]?”

How to Freshness-Check

  1. Start from data/sources.json (official docs, release notes, support policies).
  2. Run a targeted web search for the specific component and open release notes/support policy pages.
  3. Prefer official sources over blogs for versions and support windows.

What to Report

  • Current landscape: what is stable and widely used now
  • Emerging trends: what is gaining traction (and why)
  • Deprecated/declining: what is falling out of favor (and why)
  • Recommendation: default choice + 1-2 alternatives, with trade-offs

Example Topics (verify with fresh search)

  • Node.js LTS support window and major changes
  • Bun vs Deno vs Node.js
  • Hono, Elysia, and edge-first frameworks
  • Drizzle vs Prisma for TypeScript
  • tRPC and end-to-end type safety
  • Edge computing and serverless patterns

Operational Playbooks