ai-tracing-requests
1
总安装量
1
周安装量
#43209
全站排名
安装命令
npx skills add https://github.com/lebsral/dspy-programming-not-prompting-lms-skills --skill ai-tracing-requests
Agent 安装分布
replit
1
opencode
1
cursor
1
github-copilot
1
claude-code
1
Skill 文档
See What Your AI Did on a Specific Request
Guide the user through tracing and debugging individual AI requests. The goal: for any request, see every LM call, retrieval step, intermediate result, token count, and latency.
When you need this
- A customer reports a wrong answer â you need to see exactly what happened
- Your pipeline is slow â you need to find which step is the bottleneck
- Compliance requires audit trails of every AI decision
- QA wants to inspect AI behavior before launch
- You’re debugging why an agent took unexpected actions
How it’s different from monitoring
Monitoring (/ai-monitoring) |
Tracing (this skill) | |
|---|---|---|
| Scope | Aggregate health across all requests | Single request, full detail |
| Question answered | “Is accuracy dropping this week?” | “Why did customer #12345 get a wrong answer at 2:14pm?” |
| Output | Scores, trends, alerts | Call traces, intermediate results, latencies |
| Timing | Periodic batch evaluation | Per-request, real-time |
Step 1: Understand the need
Quick decision tree:
What are you debugging?
|
+- A specific wrong answer right now?
| -> Step 2: Quick debugging with dspy.inspect_history
|
+- Need to trace requests in a running app?
| -> Step 3-4: Add per-step tracing
|
+- Need a visual trace viewer for your team?
| -> Step 5: Connect Langtrace, Phoenix, or Jaeger
|
+- Need to find patterns across many traces?
-> Step 6: Search and filter traces
Step 2: Quick debugging (no extra tools needed)
Inspect the last LM calls
The fastest way to see what happened:
import dspy
# Run your program
result = my_program(question="What is our refund policy?")
# See the last 5 LM calls â shows full prompts and responses
dspy.inspect_history(n=5)
This shows:
- The full prompt sent to the LM (including system message, few-shot examples, input)
- The LM’s raw response
- How DSPy parsed the response into fields
Time individual steps
import time
result = my_program(question="test")
# Quick manual timing
start = time.time()
step1_result = my_program.step1(question="test")
step1_time = time.time() - start
print(f"Step 1: {step1_time:.2f}s")
start = time.time()
step2_result = my_program.step2(context=step1_result.context, question="test")
step2_time = time.time() - start
print(f"Step 2: {step2_time:.2f}s")
JSONL trace logging
For persistent traces without any extra dependencies:
import json
import time
from datetime import datetime
class TracedProgram(dspy.Module):
"""Wraps any DSPy program to log per-step traces to JSONL."""
def __init__(self, program, log_path="traces.jsonl"):
self.program = program
self.log_path = log_path
def forward(self, **kwargs):
trace_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
steps = []
start = time.time()
result = self.program(**kwargs)
total_time = time.time() - start
# Log the trace
entry = {
"trace_id": trace_id,
"timestamp": datetime.now().isoformat(),
"inputs": {k: str(v) for k, v in kwargs.items()},
"outputs": {k: str(getattr(result, k, "")) for k in result.keys()},
"total_latency_ms": round(total_time * 1000),
}
with open(self.log_path, "a") as f:
f.write(json.dumps(entry) + "\n")
return result
# Use it
traced = TracedProgram(my_program)
result = traced(question="How do refunds work?")
Step 3: Per-step tracing in pipelines
For multi-step pipelines, trace each stage separately to see exactly where things go wrong:
import json
import time
import uuid
from datetime import datetime
class StepTracer:
"""Collects per-step timing and intermediate results."""
def __init__(self):
self.steps = []
self.trace_id = str(uuid.uuid4())[:8]
def trace_step(self, name, func, **kwargs):
"""Run a step and record its inputs, outputs, and latency."""
start = time.time()
result = func(**kwargs)
latency = time.time() - start
self.steps.append({
"step": name,
"inputs": {k: str(v)[:200] for k, v in kwargs.items()},
"outputs": {k: str(getattr(result, k, ""))[:200] for k in result.keys()},
"latency_ms": round(latency * 1000),
})
return result
def summary(self):
"""Print a summary of all traced steps."""
print(f"Trace {self.trace_id}:")
total = sum(s["latency_ms"] for s in self.steps)
for step in self.steps:
pct = step["latency_ms"] / total * 100 if total > 0 else 0
print(f" {step['step']}: {step['latency_ms']}ms ({pct:.0f}%)")
print(f" Total: {total}ms")
def to_dict(self):
return {
"trace_id": self.trace_id,
"timestamp": datetime.now().isoformat(),
"steps": self.steps,
"total_latency_ms": sum(s["latency_ms"] for s in self.steps),
}
# Use in a pipeline
class TracedRAG(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
tracer = StepTracer()
retrieval = tracer.trace_step("retrieve", self.retrieve, query=question)
answer = tracer.trace_step(
"answer", self.answer,
context=retrieval.passages, question=question,
)
tracer.summary()
# Trace a1b2c3d4:
# retrieve: 120ms (15%)
# answer: 680ms (85%)
# Total: 800ms
return answer
Save traces for later analysis
def save_trace(tracer, path="traces.jsonl"):
with open(path, "a") as f:
f.write(json.dumps(tracer.to_dict()) + "\n")
# Load and analyze traces
def load_traces(path="traces.jsonl"):
with open(path) as f:
return [json.loads(line) for line in f]
def find_slow_traces(traces, threshold_ms=2000):
return [t for t in traces if t["total_latency_ms"] > threshold_ms]
def find_failed_steps(traces):
return [
t for t in traces
if any("error" in str(s.get("outputs", "")).lower() for s in t["steps"])
]
Step 4: OpenTelemetry instrumentation
For production tracing with any backend (Jaeger, Zipkin, Datadog, etc.):
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Setup â do this once at app startup
provider = TracerProvider()
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("my-ai-app")
class OTelTracedProgram(dspy.Module):
"""Wraps a DSPy program with OpenTelemetry spans."""
def __init__(self, program):
self.program = program
def forward(self, **kwargs):
with tracer.start_as_current_span("ai_request") as span:
span.set_attribute("ai.inputs", json.dumps({k: str(v) for k, v in kwargs.items()}))
start = time.time()
result = self.program(**kwargs)
latency = time.time() - start
span.set_attribute("ai.latency_ms", round(latency * 1000))
span.set_attribute("ai.outputs", json.dumps(
{k: str(getattr(result, k, "")) for k in result.keys()}
))
return result
Trace individual pipeline steps with OTel
class OTelTracedRAG(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
with tracer.start_as_current_span("rag_pipeline") as parent:
parent.set_attribute("question", question)
with tracer.start_as_current_span("retrieve"):
retrieval = self.retrieve(query=question)
with tracer.start_as_current_span("generate_answer"):
answer = self.answer(
context=retrieval.passages, question=question
)
return answer
Step 5: Connect a trace viewer
Option A: Langtrace (best DSPy integration)
First-class DSPy auto-instrumentation â one line to trace all LM calls:
pip install langtrace-python-sdk
from langtrace_python_sdk import langtrace
langtrace.init(api_key="your-key") # or use LANGTRACE_API_KEY env var
# That's it â all DSPy calls are now traced automatically
result = my_program(question="test")
# View traces at app.langtrace.ai
Option B: Arize Phoenix (open-source, self-hosted)
pip install arize-phoenix openinference-instrumentation-dspy
import phoenix as px
from openinference.instrumentation.dspy import DSPyInstrumentor
# Launch local trace viewer
px.launch_app() # Opens at http://localhost:6006
# Auto-instrument DSPy
DSPyInstrumentor().instrument()
# All DSPy calls are now traced
result = my_program(question="test")
Option C: Jaeger (open-source, Docker)
docker run -d -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one:latest
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Export spans to Jaeger
exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
# View traces at http://localhost:16686
Comparison
| Feature | Langtrace | Arize Phoenix | Jaeger |
|---|---|---|---|
| DSPy auto-instrumentation | Yes (built-in) | Yes (plugin) | Manual |
| Setup effort | One line | Two lines + Docker | Docker + manual spans |
| Self-hosted option | Yes | Yes | Yes |
| Cloud option | Yes | Yes | No |
| LM call details | Prompts, tokens, cost | Prompts, tokens | Custom attributes |
| Best for | DSPy-first teams | Teams wanting open-source + UI | Teams already using Jaeger |
Step 6: Search and filter traces
Find traces by criteria
def search_traces(traces, **filters):
"""Search traces by user, time range, latency, or content."""
results = traces
if "min_latency_ms" in filters:
results = [t for t in results if t["total_latency_ms"] >= filters["min_latency_ms"]]
if "after" in filters:
results = [t for t in results if t["timestamp"] >= filters["after"]]
if "before" in filters:
results = [t for t in results if t["timestamp"] <= filters["before"]]
if "contains" in filters:
keyword = filters["contains"].lower()
results = [
t for t in results
if keyword in json.dumps(t).lower()
]
return results
# Find slow requests from today
slow = search_traces(
load_traces(),
min_latency_ms=3000,
after="2025-01-15T00:00:00",
)
Aggregate trace statistics
def trace_stats(traces):
"""Summary statistics across traces."""
latencies = [t["total_latency_ms"] for t in traces]
if not latencies:
return "No traces found"
latencies.sort()
return {
"count": len(latencies),
"p50_ms": latencies[len(latencies) // 2],
"p95_ms": latencies[int(len(latencies) * 0.95)],
"p99_ms": latencies[int(len(latencies) * 0.99)],
"max_ms": latencies[-1],
}
Step 7: Use traces to improve your AI
Traces aren’t just for debugging â they’re a source of improvement.
Find patterns in wrong answers
# Load traces where the answer was marked wrong by a user or metric
wrong_traces = search_traces(load_traces(), contains='"is_correct": false')
# Check which step is most often the bottleneck
from collections import Counter
slow_steps = Counter()
for t in wrong_traces:
slowest = max(t["steps"], key=lambda s: s["latency_ms"])
slow_steps[slowest["step"]] += 1
print(slow_steps)
# Counter({"retrieve": 23, "answer": 7})
# -> Retrieval is the problem, not the answer generation
Build training data from failures
# Extract failed examples for re-optimization
failed_examples = []
for t in wrong_traces:
ex = dspy.Example(
question=t.get("inputs", {}).get("question", ""),
).with_inputs("question")
failed_examples.append(ex)
# Add to training set and re-optimize
# See /ai-improving-accuracy
Key patterns
- Start with
dspy.inspect_historyâ it’s free and solves most debugging needs - Add JSONL tracing before you need it â you can’t debug traces you didn’t log
- Trace at the step level, not just the request level â per-step latency reveals bottlenecks
- Use OpenTelemetry for production â it’s the standard, works with any backend
- Langtrace is easiest for DSPy â one-line setup with automatic instrumentation
- Traces feed optimization â patterns in wrong answers tell you what to fix
Additional resources
- For worked examples, see examples.md
- Use
/ai-monitoringfor aggregate health checks across all requests - Use
/ai-fixing-errorsfor code-level debugging (crashes, config issues) - Use
/ai-building-pipelinesto structure pipelines that are easy to trace - Use
/ai-improving-accuracyto optimize based on patterns found in traces