dotnet-ci-benchmarking
npx skills add https://github.com/novotnyllc/dotnet-artisan --skill dotnet-ci-benchmarking
Agent 安装分布
Skill 文档
dotnet-ci-benchmarking
Continuous benchmarking guidance for detecting performance regressions in CI pipelines. Covers baseline file management with BenchmarkDotNet JSON exporters, GitHub Actions workflows for artifact-based baseline comparison, regression detection patterns with configurable thresholds, and alerting strategies for performance degradation.
Version assumptions: BenchmarkDotNet v0.14+ for JSON export, GitHub Actions runner environment. Examples use actions/upload-artifact@v4 and actions/download-artifact@v4.
Scope
- Baseline file management with BenchmarkDotNet JSON exporters
- GitHub Actions workflows for artifact-based baseline comparison
- Regression detection with configurable thresholds
- Alerting strategies for performance degradation
Out of scope
- BenchmarkDotNet setup and benchmark class design — see [skill:dotnet-benchmarkdotnet]
- Performance architecture patterns — see [skill:dotnet-performance-patterns]
- Profiling tools (dotnet-counters, dotnet-trace, dotnet-dump) — see [skill:dotnet-profiling]
- OpenTelemetry metrics and distributed tracing — see [skill:dotnet-observability]
- Composable CI/CD workflow design — see [skill:dotnet-gha-patterns]
Cross-references: [skill:dotnet-benchmarkdotnet] for benchmark class setup and JSON exporter configuration, [skill:dotnet-observability] for correlating benchmark regressions with runtime metrics changes, [skill:dotnet-gha-patterns] for composable workflow patterns (reusable workflows, composite actions, matrix builds).
Baseline File Management
BenchmarkDotNet JSON Export
BenchmarkDotNet’s JSON exporter produces machine-readable results for automated comparison. Configure the exporter in benchmark classes:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Exporters.Json;
[JsonExporterAttribute.Full]
[MemoryDiagnoser]
public class CriticalPathBenchmarks
{
[Benchmark(Baseline = true)]
public void ProcessOrder() { /* ... */ }
[Benchmark]
public void ProcessOrderOptimized() { /* ... */ }
}
Or configure via custom config for all benchmark classes:
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Exporters.Json;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
var config = ManualConfig.Create(DefaultConfig.Instance)
.AddJob(Job.ShortRun) // fewer iterations for CI speed
.AddExporter(JsonExporter.Full)
.WithArtifactsPath("./benchmark-results");
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args, config);
JSON Export Structure
The exported JSON file (*-report-full.json) contains structured benchmark results:
{
"Title": "CriticalPathBenchmarks",
"Benchmarks": [
{
"FullName": "MyApp.Benchmarks.CriticalPathBenchmarks.ProcessOrder",
"Statistics": {
"Mean": 1234.5678,
"Median": 1230.1234,
"StandardDeviation": 15.234,
"StandardError": 4.812
},
"Memory": {
"BytesAllocatedPerOperation": 1024,
"Gen0Collections": 0.0012,
"Gen1Collections": 0,
"Gen2Collections": 0
}
}
]
}
Key fields for regression comparison:
| Field | Purpose |
|---|---|
Statistics.Mean |
Average execution time (nanoseconds) |
Statistics.Median |
Middle execution time (more robust to outliers) |
Statistics.StandardDeviation |
Measurement variability |
Memory.BytesAllocatedPerOperation |
GC allocation per operation |
Baseline Storage Strategies
| Strategy | Pros | Cons | Best For |
|---|---|---|---|
| Git-committed baseline file | Versioned, auditable, no external deps | Repo size grows; must update deliberately | Small benchmark suites, stable hardware |
| GitHub Actions artifacts | No repo bloat; automatic retention | 90-day default retention; cross-workflow access requires tokens | Large benchmark suites, shared runners |
| External storage (S3/Azure Blob) | Unlimited history; cross-repo sharing | Extra infrastructure; credential management | Multi-repo benchmark comparison |
This skill focuses on the GitHub Actions artifact strategy as the default. For composable workflow patterns and reusable actions, see [skill:dotnet-gha-patterns].
GitHub Actions Benchmark Workflow
Basic Benchmark Workflow
name: Benchmarks
on:
pull_request:
paths:
- 'src/**'
- 'benchmarks/**'
workflow_dispatch:
permissions:
contents: read
actions: read # required for artifact download
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Run benchmarks
run: dotnet run -c Release --project benchmarks/MyBenchmarks.csproj -- --exporters json
- name: Upload benchmark results
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.sha }}
path: benchmarks/BenchmarkDotNet.Artifacts/results/
retention-days: 90
Baseline Comparison Workflow
This workflow downloads the baseline from a previous run and compares against current results:
name: Benchmark Regression Check
on:
pull_request:
paths:
- 'src/**'
- 'benchmarks/**'
permissions:
contents: read
actions: read
env:
BENCHMARK_PROJECT: benchmarks/MyBenchmarks.csproj
RESULTS_DIR: benchmarks/BenchmarkDotNet.Artifacts/results
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Download baseline results
uses: actions/download-artifact@v4
with:
name: benchmark-baseline
path: ./baseline-results
continue-on-error: true
id: download-baseline
- name: Run benchmarks
run: dotnet run -c Release --project ${{ env.BENCHMARK_PROJECT }} -- --exporters json
- name: Compare with baseline
if: steps.download-baseline.outcome == 'success'
shell: bash
run: |
set -euo pipefail
python3 scripts/compare-benchmarks.py \
--baseline ./baseline-results \
--current "${{ env.RESULTS_DIR }}" \
--threshold 10 \
--output benchmark-comparison.md
- name: Upload current results as new baseline
if: github.ref == 'refs/heads/main'
uses: actions/upload-artifact@v4
with:
name: benchmark-baseline
path: ${{ env.RESULTS_DIR }}/
retention-days: 90
overwrite: true
- name: Upload comparison report
if: steps.download-baseline.outcome == 'success'
uses: actions/upload-artifact@v4
with:
name: benchmark-comparison-${{ github.sha }}
path: benchmark-comparison.md
retention-days: 30
Key design decisions:
continue-on-error: trueon baseline download handles first-run (no baseline exists yet)- Baseline is only updated from
mainbranch merges to prevent PR branches from polluting the baseline overwrite: truereplaces the previous baseline artifact
For converting these inline workflows into reusable workflow_call patterns, see [skill:dotnet-gha-patterns].
Regression Detection Patterns
Threshold-Based Comparison
Compare current benchmark results against baseline using percentage thresholds. A regression is flagged when the current mean exceeds the baseline mean by more than the configured threshold:
#!/usr/bin/env python3
"""compare-benchmarks.py -- Detect benchmark regressions from BenchmarkDotNet JSON exports."""
import json
import sys
from pathlib import Path
def load_benchmarks(results_dir: str) -> dict:
"""Load benchmark results from BenchmarkDotNet JSON export files."""
benchmarks = {}
for json_file in Path(results_dir).glob("*-report-full.json"):
with open(json_file) as f:
data = json.load(f)
for bm in data.get("Benchmarks", []):
name = bm["FullName"]
benchmarks[name] = {
"mean": bm["Statistics"]["Mean"],
"median": bm["Statistics"]["Median"],
"stddev": bm["Statistics"]["StandardDeviation"],
"allocated": bm.get("Memory", {}).get("BytesAllocatedPerOperation", 0),
}
return benchmarks
def compare(baseline_dir: str, current_dir: str, threshold_pct: float) -> list:
"""Compare current results against baseline. Returns list of regressions."""
baseline = load_benchmarks(baseline_dir)
current = load_benchmarks(current_dir)
regressions = []
for name, curr in current.items():
if name not in baseline:
continue # new benchmark, no comparison possible
base = baseline[name]
if base["mean"] == 0:
continue # avoid division by zero
time_change_pct = ((curr["mean"] - base["mean"]) / base["mean"]) * 100
alloc_change = curr["allocated"] - base["allocated"]
if time_change_pct > threshold_pct:
regressions.append({
"name": name,
"baseline_mean": base["mean"],
"current_mean": curr["mean"],
"change_pct": time_change_pct,
"alloc_change": alloc_change,
})
return regressions
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Compare BenchmarkDotNet results")
parser.add_argument("--baseline", required=True, help="Path to baseline results directory")
parser.add_argument("--current", required=True, help="Path to current results directory")
parser.add_argument("--threshold", type=float, default=10.0,
help="Regression threshold percentage (default: 10)")
parser.add_argument("--output", default="comparison.md", help="Output markdown file")
args = parser.parse_args()
regressions = compare(args.baseline, args.current, args.threshold)
with open(args.output, "w") as f:
if regressions:
f.write("## Benchmark Regressions Detected\n\n")
f.write("| Benchmark | Baseline (ns) | Current (ns) | Change | Alloc Delta |\n")
f.write("|-----------|--------------|-------------|--------|-------------|\n")
for r in regressions:
f.write(f"| `{r['name']}` | {r['baseline_mean']:.2f} | "
f"{r['current_mean']:.2f} | +{r['change_pct']:.1f}% | "
f"{r['alloc_change']:+d} B |\n")
f.write(f"\nThreshold: {args.threshold}%\n")
else:
f.write("## Benchmark Results\n\nNo regressions detected ")
f.write(f"(threshold: {args.threshold}%).\n")
if regressions:
print(f"REGRESSION: {len(regressions)} benchmark(s) exceeded "
f"{args.threshold}% threshold", file=sys.stderr)
sys.exit(1)
Choosing Thresholds
| Environment | Suggested Threshold | Rationale |
|---|---|---|
| Dedicated benchmark hardware | 5% | Low noise floor; small regressions are signal |
| GitHub Actions shared runners | 10-15% | Shared runners introduce 5-10% variance from noisy neighbors |
| Self-hosted runners | 5-10% | More stable than shared, but still monitor variance |
Calibrate thresholds empirically: Run the same benchmark suite 5-10 times on your CI environment without code changes. The maximum observed variance sets your noise floor. Set the threshold above this noise floor (typically 2x the observed variance).
Allocation Regression Detection
Memory allocation regressions are more reliable signals than timing regressions because allocations are deterministic (not affected by noisy neighbors):
# Add to the compare function:
if alloc_change > 0:
regressions.append({
"name": name,
"type": "allocation",
"baseline_alloc": base["allocated"],
"current_alloc": curr["allocated"],
"alloc_change": alloc_change,
})
Use allocation changes as a hard gate (zero tolerance for new allocations in zero-alloc paths) and timing changes as a soft gate (warning with threshold).
Alerting Strategies
PR Comment with Regression Summary
Post benchmark comparison results as a PR comment for reviewer visibility:
- name: Comment PR with results
if: steps.download-baseline.outcome == 'success' && github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const body = fs.readFileSync('benchmark-comparison.md', 'utf8');
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: body
});
Fail the Build on Regression
Exit with non-zero status from the comparison script to fail the GitHub Actions job. This prevents merging PRs that introduce performance regressions:
- name: Check for regressions
if: steps.download-baseline.outcome == 'success'
shell: bash
run: |
set -euo pipefail
python3 scripts/compare-benchmarks.py \
--baseline ./baseline-results \
--current "${{ env.RESULTS_DIR }}" \
--threshold 10
# Script exits non-zero if regressions found -- fails the job
For required status checks and branch protection integration with benchmark gates, see [skill:dotnet-gha-patterns].
Trend Tracking
For long-term trend analysis beyond single-PR comparison, upload results to a persistent store and track metrics over time:
| Approach | Tool | Complexity |
|---|---|---|
| GitHub Actions artifacts | Built-in, 90-day retention | Low — artifact download/upload only |
| GitHub Pages with benchmark-action | benchmark-action/github-action-benchmark@v1 |
Medium — auto-generates trend charts |
| External time-series DB | InfluxDB, Prometheus + Grafana | High — full observability stack |
The simplest approach for most projects is the artifact-based baseline comparison shown in this skill. Graduate to trend tracking when you need historical regression analysis across many releases.
CI-Specific BenchmarkDotNet Configuration
ShortRun for CI Speed
Full benchmark runs take 10-30+ minutes. Use Job.ShortRun in CI to reduce iteration counts while retaining regression detection capability:
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
public class CiConfig : ManualConfig
{
public CiConfig()
{
AddJob(Job.ShortRun
.WithWarmupCount(3)
.WithIterationCount(5)
.WithInvocationCount(1));
AddExporter(BenchmarkDotNet.Exporters.Json.JsonExporter.Full);
}
}
Apply conditionally based on environment:
var config = Environment.GetEnvironmentVariable("CI") is not null
? new CiConfig()
: DefaultConfig.Instance;
BenchmarkRunner.Run<CriticalPathBenchmarks>(config);
Filtering Benchmarks for CI
Run only critical-path benchmarks in CI to reduce pipeline duration:
# Run only benchmarks in the "Critical" category
dotnet run -c Release --project benchmarks/MyBenchmarks.csproj -- \
--filter *Critical* --exporters json
[BenchmarkCategory("Critical")]
[MemoryDiagnoser]
[JsonExporterAttribute.Full]
public class CriticalPathBenchmarks
{
[Benchmark]
public void ProcessOrder() { /* ... */ }
}
[BenchmarkCategory("Extended")]
[MemoryDiagnoser]
public class ExtendedBenchmarks
{
[Benchmark]
public void RareCodePath() { /* ... */ }
}
Run Critical benchmarks on every PR; run Extended benchmarks on a nightly schedule.
Nightly Benchmark Schedule
name: Nightly Benchmarks (Full Suite)
on:
schedule:
- cron: '0 3 * * *' # 3 AM UTC daily
workflow_dispatch:
jobs:
benchmark-full:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Run full benchmark suite
run: dotnet run -c Release --project benchmarks/MyBenchmarks.csproj -- --exporters json
# No --filter: runs all benchmarks including Extended category
- name: Upload full results
uses: actions/upload-artifact@v4
with:
name: benchmark-full-${{ github.run_number }}
path: benchmarks/BenchmarkDotNet.Artifacts/results/
retention-days: 90
For scheduled workflow patterns and matrix builds across TFMs, see [skill:dotnet-gha-patterns].
Agent Gotchas
- Use
Job.ShortRunin CI, notJob.Default— default benchmark jobs run many iterations for statistical precision, taking 10-30+ minutes per benchmark class. CI pipelines need faster feedback withShortRun(3 warmup, 5 iteration). - Set threshold above measured noise floor — shared CI runners introduce 5-10% timing variance from noisy neighbors. A 5% threshold on shared runners produces false positives. Calibrate by running the same code multiple times and measuring variance.
- Use allocation changes as hard gates — allocation counts are deterministic and unaffected by runner noise. A zero-to-nonzero allocation change is always a real regression, unlike timing variations.
- Only update baselines from main branch — if PR branches can update the baseline, a regression in one PR becomes the new baseline, masking it from subsequent comparisons.
- Always set
set -euo pipefailin bash steps — withoutpipefail, a regression detection script that exits non-zero in a pipeline (e.g.,script | tee) does not fail the GitHub Actions step. - Handle missing baselines gracefully — the first CI run has no baseline to compare against. Use
continue-on-error: trueon the baseline download step and skip comparison when no baseline exists. - Export JSON, not just Markdown — Markdown reports are human-readable but not machine-parseable for automated regression detection. Always include
[JsonExporterAttribute.Full]orJsonExporter.Fullin the config.