google-continuous-fuzzing
4
总安装量
4
周安装量
#47903
全站排名
安装命令
npx skills add https://github.com/copyleftdev/sk1llz --skill google-continuous-fuzzing
Agent 安装分布
openclaw
3
gemini-cli
3
claude-code
3
github-copilot
3
codex
3
opencode
3
Skill 文档
Google Continuous Fuzzing
Overview
Google’s continuous fuzzing infrastructure (OSS-Fuzz + ClusterFuzz) has found over 10,000 bugs in 1,000+ open source projects, including critical security vulnerabilities like Heartbleed-class bugs. This technique turns fuzzing from a one-time activity into a continuous quality gate.
References
- Paper: “OSS-Fuzz – Google’s continuous fuzzing service for open source software” (USENIX Security ’17)
- Documentation: https://google.github.io/oss-fuzz/
- ClusterFuzz: https://google.github.io/clusterfuzz/
Core Philosophy
“Fuzzing should be continuous, not a one-time event.”
“Every bug found by fuzzing is a bug not found by attackers.”
Fuzzing is most effective when it runs continuously against the latest code, with automatic bug reporting and regression tracking.
Key Concepts
Coverage-Guided Fuzzing
Traditional Fuzzing: Random input generation
Coverage-Guided Fuzzing: Inputs that increase code coverage are kept
Corpus â Mutate â Execute â Measure Coverage â Keep interesting inputs
â |
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
The Fuzzing Pipeline
- Build: Compile with sanitizers (ASan, MSan, UBSan)
- Fuzz: Run fuzzers continuously on cluster
- Triage: Automatically deduplicate and file bugs
- Reproduce: Generate minimal reproducer
- Verify: Confirm fix eliminates the bug
- Regress: Add reproducer to regression corpus
When Implementing
Always
- Use sanitizers (AddressSanitizer, MemorySanitizer, UndefinedBehaviorSanitizer)
- Build seed corpus from existing tests and real inputs
- Integrate fuzzing into CI/CD pipeline
- Track coverage metrics over time
- Minimize reproducers for easier debugging
- Keep regression tests for all found bugs
Never
- Fuzz only once and declare victory
- Ignore crashes in dependencies
- Skip sanitizers to “improve performance”
- Discard valuable corpus data
- Treat fuzzing as separate from testing
Prefer
- LibFuzzer/AFL++ over basic random testing
- Structure-aware fuzzing for complex formats
- Continuous fuzzing over periodic runs
- Automated triage over manual analysis
- Coverage metrics over time-based metrics
Implementation Patterns
Basic Fuzz Target (C/C++)
// fuzz_target.cc
// A fuzz target is a function that takes arbitrary bytes
#include <stdint.h>
#include <stddef.h>
// Your library headers
#include "parser.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Call the function under test with fuzzer-provided data
parse_input(data, size);
// Return 0 - non-zero return values are reserved
return 0;
}
// Build with:
// clang++ -g -fsanitize=address,fuzzer fuzz_target.cc parser.cc -o fuzzer
// Run with:
// ./fuzzer corpus_dir/
Fuzz Target with Structure
// Structure-aware fuzzing for better coverage
#include <stdint.h>
#include <stddef.h>
#include <string.h>
// Fuzz a function expecting a specific structure
struct Header {
uint32_t magic;
uint32_t version;
uint32_t length;
uint8_t flags;
};
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Need at least header size
if (size < sizeof(Header)) {
return 0;
}
Header header;
memcpy(&header, data, sizeof(Header));
// Constrain to valid magic (helps fuzzer find deeper paths)
if (header.magic != 0xDEADBEEF) {
return 0;
}
// Constrain length to available data
size_t payload_size = size - sizeof(Header);
if (header.length > payload_size) {
header.length = payload_size;
}
const uint8_t *payload = data + sizeof(Header);
// Now fuzz with valid-looking input
process_packet(&header, payload, header.length);
return 0;
}
Python Fuzzing with Atheris
#!/usr/bin/env python3
# fuzz_json_parser.py
import atheris
import sys
# Import the module to fuzz
import json
def test_one_input(data):
"""Fuzz target: called with random bytes"""
fdp = atheris.FuzzedDataProvider(data)
# Convert bytes to string for JSON parsing
json_str = fdp.ConsumeUnicodeNoSurrogates(
fdp.ConsumeIntInRange(0, 1024)
)
try:
# This should never crash, only raise ValueError
json.loads(json_str)
except (json.JSONDecodeError, ValueError):
pass # Expected for invalid input
except Exception as e:
# Unexpected exception = potential bug
raise
def main():
atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()
if __name__ == "__main__":
main()
# Run with:
# python fuzz_json_parser.py corpus_dir/ -max_len=1024
Go Fuzzing (Native)
// fuzz_test.go
// Go 1.18+ has built-in fuzzing support
package parser
import (
"testing"
)
func FuzzParseInput(f *testing.F) {
// Seed corpus with known inputs
f.Add([]byte("valid input"))
f.Add([]byte("{\"key\": \"value\"}"))
f.Add([]byte(""))
f.Fuzz(func(t *testing.T, data []byte) {
// Call function under test
result, err := ParseInput(data)
if err != nil {
// Errors are fine, panics are not
return
}
// Optionally verify invariants
if result != nil && result.Length < 0 {
t.Errorf("negative length: %d", result.Length)
}
})
}
// Run with:
// go test -fuzz=FuzzParseInput -fuzztime=60s
OSS-Fuzz Integration
# Dockerfile for OSS-Fuzz integration
FROM gcr.io/oss-fuzz-base/base-builder
RUN apt-get update && apt-get install -y \
make \
autoconf \
automake \
libtool
# Clone your project
RUN git clone --depth 1 https://github.com/your/project.git
WORKDIR project
COPY build.sh $SRC/
#!/bin/bash
# build.sh - OSS-Fuzz build script
# Build the library with fuzzing instrumentation
./configure
make clean
make -j$(nproc) CC="$CC" CXX="$CXX" CFLAGS="$CFLAGS" CXXFLAGS="$CXXFLAGS"
# Build fuzz targets
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE \
fuzz_target.cc -o $OUT/fuzz_target \
-I. libproject.a
# Copy seed corpus
zip -j $OUT/fuzz_target_seed_corpus.zip seeds/*
# Copy dictionary if available
cp project.dict $OUT/fuzz_target.dict
Corpus Management
# corpus_manager.py
# Manage and minimize fuzzing corpus
import subprocess
import hashlib
import os
from pathlib import Path
class CorpusManager:
def __init__(self, corpus_dir: str):
self.corpus_dir = Path(corpus_dir)
self.corpus_dir.mkdir(exist_ok=True)
def add(self, data: bytes) -> str:
"""Add input to corpus with content-based filename"""
hash_name = hashlib.sha256(data).hexdigest()[:16]
path = self.corpus_dir / hash_name
if not path.exists():
path.write_bytes(data)
return str(path)
def minimize(self, fuzzer_binary: str) -> int:
"""Minimize corpus using fuzzer's merge feature"""
minimized_dir = self.corpus_dir.parent / "corpus_minimized"
minimized_dir.mkdir(exist_ok=True)
# LibFuzzer merge minimizes corpus
result = subprocess.run([
fuzzer_binary,
"-merge=1",
str(minimized_dir),
str(self.corpus_dir)
], capture_output=True)
return len(list(minimized_dir.iterdir()))
def get_coverage_report(self, fuzzer_binary: str) -> dict:
"""Generate coverage report for corpus"""
# Run with coverage instrumentation
result = subprocess.run([
fuzzer_binary,
"-runs=0", # Don't generate new inputs
str(self.corpus_dir)
], capture_output=True, text=True)
# Parse coverage from output
# (actual implementation depends on sanitizer output format)
return {"corpus_size": len(list(self.corpus_dir.iterdir()))}
CI/CD Integration
# .github/workflows/fuzz.yml
name: Continuous Fuzzing
on:
push:
branches: [main]
schedule:
- cron: '0 0 * * *' # Daily
jobs:
fuzz:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build fuzzer
run: |
clang++ -g -O1 \
-fsanitize=address,fuzzer \
-fno-omit-frame-pointer \
fuzz_target.cc -o fuzzer
- name: Download corpus
uses: actions/cache@v3
with:
path: corpus
key: fuzz-corpus-${{ github.sha }}
restore-keys: fuzz-corpus-
- name: Run fuzzer
run: |
mkdir -p corpus
timeout 600 ./fuzzer corpus/ -max_total_time=600 || true
- name: Upload crash artifacts
if: always()
uses: actions/upload-artifact@v3
with:
name: crashes
path: crash-*
if-no-files-found: ignore
- name: Check for crashes
run: |
if ls crash-* 1> /dev/null 2>&1; then
echo "Crashes found!"
exit 1
fi
Mental Model
Google’s fuzzing approach asks:
- Is this running continuously? One-time fuzzing misses regression bugs
- Are sanitizers enabled? Crashes without sanitizers miss real bugs
- Is the corpus growing? Coverage should increase over time
- Are bugs being tracked? Automatic filing and deduplication
- Are fixes verified? Reproducers become regression tests
Signature Moves
- Coverage-guided mutation (LibFuzzer, AFL++)
- Sanitizer builds (ASan, MSan, UBSan, TSan)
- Automatic corpus management and minimization
- CI/CD integration for every commit
- Regression corpus from found bugs
- Structure-aware fuzzing for protocols