semgrep
91
总安装量
91
周安装量
#2533
全站排名
安装命令
npx skills add https://github.com/semgrep/skills --skill semgrep
Agent 安装分布
claude-code
72
gemini-cli
69
opencode
68
codex
65
amp
49
Skill 文档
Semgrep Static Analysis
Fast, pattern-based static analysis for security scanning and custom rule creation.
When to Use Semgrep
Ideal scenarios:
- Quick security scans (minutes, not hours)
- Pattern-based bug and vulnerability detection
- Enforcing coding standards and best practices
- Finding known vulnerability patterns (OWASP, CWE)
- Creating custom detection rules for your codebase
- Data flow analysis with taint mode
Installation
# pip (recommended)
python3 -m pip install semgrep
# Homebrew
brew install semgrep
# Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src
Part 1: Running Scans
Quick Scan
semgrep --config auto . # Auto-detect rules
Using Rulesets
semgrep --config p/<RULESET> . # Single ruleset
semgrep --config p/security-audit --config p/trailofbits . # Multiple
| Ruleset | Description |
|---|---|
p/default |
General security and code quality |
p/security-audit |
Comprehensive security rules |
p/owasp-top-ten |
OWASP Top 10 vulnerabilities |
p/cwe-top-25 |
CWE Top 25 vulnerabilities |
p/trailofbits |
Trail of Bits security rules |
p/python |
Python-specific |
p/javascript |
JavaScript-specific |
p/golang |
Go-specific |
Output Formats
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF
semgrep --config p/security-audit --json -o results.json . # JSON
Scan Specific Paths
semgrep --config p/python app.py # Single file
semgrep --config p/javascript src/ # Directory
semgrep --config auto --include='**/test/**' . # Include tests
Configuration
.semgrepignore
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/
Suppress False Positives
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgrep
Part 2: Creating Custom Rules
When to Create Custom Rules
- Detecting project-specific vulnerability patterns
- Enforcing internal coding standards
- Building security checks for custom frameworks
- Creating taint-mode rules for data flow analysis
Approach Selection
| Approach | Use When |
|---|---|
| Taint mode | Data flows from untrusted source to dangerous sink (injection vulnerabilities) |
| Pattern matching | Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values) |
Prioritize taint mode for injection vulnerabilities. Pattern matching alone can’t distinguish between eval(user_input) (vulnerable) and eval("safe_literal") (safe).
Quick Start: Pattern Matching
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"
Quick Start: Taint Mode
rules:
- id: command-injection
languages: [python]
message: User input flows to command execution
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: os.system(...)
- pattern: subprocess.call($CMD, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)
Pattern Syntax Quick Reference
| Syntax | Description | Example |
|---|---|---|
... |
Match anything | func(...) |
$VAR |
Capture metavariable | $FUNC($INPUT) |
<... ...> |
Deep expression match | <... user_input ...> |
| Operator | Description |
|---|---|
pattern |
Match exact pattern |
patterns |
All must match (AND) |
pattern-either |
Any matches (OR) |
pattern-not |
Exclude matches |
pattern-inside |
Match only inside context |
pattern-not-inside |
Match only outside context |
metavariable-regex |
Regex on captured value |
Testing Rules
Test-first is mandatory. Create test files with annotations:
# test_rule.py
def test_vulnerable():
user_input = request.args.get("id")
# ruleid: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
Run tests:
semgrep --test --config rule.yaml test-file
Command Reference
| Task | Command |
|---|---|
| Run tests | semgrep --test --config rule.yaml test-file |
| Validate YAML | semgrep --validate --config rule.yaml |
| Dump AST | semgrep --dump-ast -l <lang> <file> |
| Debug taint flow | semgrep --dataflow-traces -f rule.yaml file |
Rule Creation Workflow
- Analyze the problem – Understand the bug pattern, determine taint vs pattern approach
- Create test cases first – Write
ruleid:andok:annotations before the rule - Analyze AST – Run
semgrep --dump-astto understand code structure - Write the rule – Start simple, iterate
- Test until 100% pass – No “missed lines” or “incorrect lines”
- Optimize patterns – Remove redundancies only after tests pass
Output structure:
<rule-id>/
âââ <rule-id>.yaml # Semgrep rule
âââ <rule-id>.<ext> # Test file
Detailed References
Official Semgrep Documentation:
- Rule Syntax – Complete YAML structure, operators, and options
- Rule Schema – Full JSON schema specification
Local References:
- Workflow Guide – Complete step-by-step rule creation process
- Quick Reference – Pattern operators and taint components
Anti-Patterns to Avoid
Too broad:
# BAD: Matches any function call
pattern: $FUNC(...)
# GOOD: Specific dangerous function
pattern: eval(...)
Missing safe cases:
# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)
# GOOD: Include safe cases
# ruleid: my-rule
dangerous(user_input)
# ok: my-rule
dangerous(sanitize(user_input))
Rationalizations to Reject
| Shortcut | Why It’s Wrong |
|---|---|
| “Semgrep found nothing, code is clean” | Semgrep is pattern-based; can’t track complex cross-function data flow |
| “The pattern looks complete” | Untested rules have hidden false positives/negatives |
| “It matches the vulnerable case” | Matching vulnerabilities is half the job; verify safe cases don’t match |
| “Taint mode is overkill” | For injection vulnerabilities, taint mode gives better precision |
| “One test case is enough” | Include edge cases: different coding styles, sanitized inputs, safe alternatives |
CI/CD Integration
GitHub Actions
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *'
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbits
Resources
Rule Writing:
- Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
- Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
- Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml
General:
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Docs: https://semgrep.dev/docs/
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules