lgtm
npx skills add https://github.com/pokgak/agent-skills --skill lgtm
Agent 安装分布
Skill 文档
LGTM Skill – Query Observability Backends
â ï¸ CRITICAL: DO NOT RUN QUERIES DIRECTLY
STOP. Before running ANY lgtm command, you MUST spawn a subagent.
Never run lgtm commands directly in the main conversation. Raw JSON responses will bloat context.
Required Two-Phase Approach
Phase 1: DISCOVERY (haiku subagent)
First, discover what’s available before querying blindly:
Task tool call:
subagent_type: "Bash"
model: "haiku"
prompt: "Using lgtm CLI, discover available labels and services.
Run: lgtm loki labels
Run: lgtm loki label-values app
Run: lgtm loki label-values namespace
Return: list of available apps, namespaces, and other relevant labels."
Phase 2: INVESTIGATION (haiku subagent)
After discovery, query with specific filters:
Task tool call:
subagent_type: "Bash"
model: "haiku"
prompt: "Using lgtm CLI, investigate errors in the checkout app in prod namespace.
<specific queries based on discovery results>
Return ONLY a concise summary."
Orchestrator Pattern
- Opus (you): Coordinate discovery â investigation flow. Evaluate summaries, decide next steps. NEVER execute queries.
- Haiku subagent: All query execution – discovery, investigation, analysis. Fast and sufficient for most tasks.
- Sonnet subagent: Reserved for complex multi-signal correlation or deep root cause analysis (user must explicitly request).
Parallel Execution
Run independent queries in parallel – spawn multiple Task calls in one message when queries don’t depend on each other (e.g., check logs AND metrics AND traces simultaneously after discovery).
CLI Reference (FOR SUBAGENTS ONLY)
The following commands are for subagents to execute, NOT for direct use in main conversation.
Prerequisites
The CLI should be available via:
uvx --from git+https://github.com/pokgak/lgtm-cli lgtm --help
Configuration
Config file: ~/.config/lgtm/config.yaml
Loki (Logs)
Discovery First
# What labels exist?
lgtm loki labels
# What values for a label?
lgtm loki label-values app
lgtm loki label-values namespace
Query Logs
# Basic query (defaults: last 15 min, limit 50)
lgtm loki query '{app="myapp"}'
# Filter for errors
lgtm loki query '{app="myapp"} |= "error"'
# With custom time range and limit
lgtm loki query '{app="myapp"}' --start 2024-01-15T10:00:00Z --end 2024-01-15T11:00:00Z --limit 100
Metric Queries (Aggregations)
# Count errors - use this to get overview first
lgtm loki instant 'count_over_time({app="myapp"} |= "error" [5m])'
# Errors by level
lgtm loki instant 'sum by (level) (count_over_time({app="myapp"} | json [5m]))'
Prometheus/Mimir (Metrics)
Discovery First
# What labels exist?
lgtm prom labels
# What metrics exist?
lgtm prom label-values __name__
# Metric metadata
lgtm prom metadata --metric http_requests_total
Query Metrics
# Instant query
lgtm prom query 'up{job="prometheus"}'
# Rate of requests
lgtm prom query 'rate(http_requests_total[5m])'
# Range query (defaults: last 15 min, 60s step)
lgtm prom range 'rate(http_requests_total[5m])'
# Custom time range
lgtm prom range 'up' --start 2024-01-15T10:00:00Z --end 2024-01-15T11:00:00Z --step 5m
Tempo (Traces)
Discovery First
# What tags exist?
lgtm tempo tags
# What services?
lgtm tempo tag-values service.name
Search Traces
# Search by service (defaults: last 15 min, limit 20)
lgtm tempo search -q '{resource.service.name="api"}'
# Error traces
lgtm tempo search -q '{status=error}'
# Slow traces
lgtm tempo search --min-duration 1s
# Combined filters
lgtm tempo search -q '{resource.service.name="api" && status=error}' --min-duration 500ms
Get Specific Trace
# When you have a trace ID
lgtm tempo trace abc123def456
Instance Selection
# Use specific instance
lgtm -i production loki query '{app="api"}'
# List configured instances
lgtm instances
Kubernetes Port-Forward Instances
Some instances require kubectl port-forwarding to access services inside Kubernetes clusters.
Check if Port-Forward is Required
# List all instances and their port-forward requirements
lgtm instances
# Show port-forward commands for all instances that need them
lgtm port-forward
# Show port-forward command for specific instance
lgtm -i sandbox port-forward
Using Port-Forward Instances
Before querying an instance that requires port-forwarding, start the tunnel:
# 1. Get the port-forward command
lgtm -i sandbox port-forward
# Output: kubectl port-forward -n monitoring svc/victoria-metrics-server 8428:8428 --context sandbox
# 2. Start the tunnel (in background or separate terminal)
kubectl port-forward -n monitoring svc/victoria-metrics-server 8428:8428 --context sandbox &
# 3. Query the instance
lgtm -i sandbox prom query 'up'
Subagent Prompt Example for Port-Forward Instances
When querying instances that require port-forwarding:
Task tool call:
subagent_type: "Bash"
model: "haiku"
prompt: "Query sandbox cluster metrics using lgtm CLI.
1. First check if port-forward is needed:
uvx --from git+https://github.com/pokgak/lgtm-cli lgtm -i sandbox port-forward
2. If port-forward is needed, start it in background:
kubectl port-forward -n monitoring svc/victoria-metrics-server 8428:8428 --context sandbox &
sleep 2 # Wait for tunnel to establish
3. Run the query:
uvx --from git+https://github.com/pokgak/lgtm-cli lgtm -i sandbox prom query 'sandbox_running_count'
4. Return a summary of the results."
Best Practices Workflow
1. Discover â Filter â Query
# Step 1: What's available?
lgtm loki labels
lgtm loki label-values app
# Step 2: Get overview with aggregation
lgtm loki instant 'sum by (app) (count_over_time({namespace="prod"} |= "error" [15m]))'
# Step 3: Narrow down to specific app
lgtm loki query '{namespace="prod", app="checkout"} |= "error"' --limit 20
2. Use Specific Identifiers
# If you have a trace ID, fetch directly
lgtm tempo trace abc123def456
# Filter logs by request ID
lgtm loki query '{app="api"} |= "request_id=abc123"'
# Filter by pod name
lgtm loki query '{pod="api-server-xyz123"}'
3. Aggregations Over Raw Data
# BAD: Fetching all error logs
lgtm loki query '{app="api"} |= "error"'
# GOOD: Count first, then drill down
lgtm loki instant 'count_over_time({app="api"} |= "error" [5m])'
Subagent Prompt Examples
Example: Discovery (run this FIRST)
Use Task tool with subagent_type: "Bash" and model: "haiku":
Discover available observability data using lgtm CLI.
1. Get Loki labels: lgtm loki labels
2. Get app values: lgtm loki label-values app
3. Get namespace values: lgtm loki label-values namespace
4. Get Tempo services: lgtm tempo tag-values service.name
Return a concise list:
- Available apps: [list]
- Available namespaces: [list]
- Available services in traces: [list]
- Any other relevant labels discovered
Example: Investigate Error Spike (after discovery)
Use Task tool with subagent_type: "Bash" and model: "haiku":
Investigate errors in the checkout service over the last hour using the lgtm CLI.
1. First get error counts: lgtm loki instant 'sum by (level) (count_over_time({app="checkout"} | json [1h]))'
2. If errors found, get sample logs: lgtm loki query '{app="checkout"} |= "error"' --limit 30
3. Check for related traces: lgtm tempo search -q '{resource.service.name="checkout" && status=error}'
Summarize findings:
- Total error count and trend (up/down from normal)
- Top 3 most frequent error messages
- When the errors started
- Affected components/pods
- Any correlated trace IDs for debugging
Return ONLY the summary, not raw JSON output.
Example: Service Health Check
Use Task tool with subagent_type: "Bash" and model: "haiku":
Check health of the payment-service using lgtm CLI.
1. Error rate: lgtm loki instant 'sum(count_over_time({app="payment-service"} |= "error" [15m]))'
2. Request latency: lgtm prom query 'histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service="payment"}[5m]))'
3. Recent errors: lgtm loki query '{app="payment-service"} |= "error"' --limit 10
Return a brief health summary:
- Status: healthy/degraded/unhealthy
- Error rate (errors per minute)
- P95 latency
- Any critical issues found
Example: Trace Investigation
Use Task tool with subagent_type: "Bash" and model: "haiku":
Investigate slow requests in the API gateway using lgtm CLI.
1. Find slow traces: lgtm tempo search -q '{resource.service.name="api-gateway"}' --min-duration 2s --limit 10
2. For the slowest trace, get details: lgtm tempo trace <traceID>
3. Check if downstream services are slow: lgtm tempo search -q '{resource.service.name="api-gateway"} >> {duration > 1s}'
Summarize:
- How many slow requests in the last 15 min
- Which downstream service is causing delays
- Common patterns in slow requests
NEVER paste raw JSON output into the main conversation. The subagent processes all data and returns only a concise summary. This is critical for maintaining context efficiency.
Output Formatting
All commands output JSON. Use jq for formatting:
# Extract just log lines
lgtm loki query '{app="api"}' | jq -r '.data.result[].values[][] | select(type == "string")'
# Extract metric values
lgtm prom query 'up' | jq -r '.data.result[] | "\(.metric.instance): \(.value[1])"'
# Trace summary
lgtm tempo search -q '{status=error}' | jq -r '.traces[] | "\(.traceID) | \(.rootServiceName) | \(.durationMs)ms"'
Reference
For query syntax, see:
reference/logql.md– LogQL syntax for Lokireference/promql.md– PromQL syntax for Prometheusreference/traceql.md– TraceQL syntax for Tempo