opengrep

📁 opengrep/skills 📅 7 days ago
10
总安装量
5
周安装量
#30450
全站排名
安装命令
npx skills add https://github.com/opengrep/skills --skill opengrep

Agent 安装分布

opencode 5
github-copilot 5
codex 5
kimi-cli 5
gemini-cli 5

Skill 文档

Opengrep Static Analysis

Opengrep is a community-maintained, open-source static analysis tool forked from Semgrep. It uses the same rule syntax and CLI interface, making existing Semgrep rules and knowledge transferable.

Two Use Cases

1. Semantic Code Search (grep alternative)

When exploring a codebase, grep finds text patterns but misses structural patterns. Opengrep understands code structure:

Task Grep Opengrep
Find text “execute” Fast, works Overkill
Find cursor.execute(...) calls May match comments, strings Matches only actual calls
Find functions that call os.system Difficult pattern-inside + pattern
Find unparameterized SQL queries Nearly impossible Taint mode

Use Opengrep when:

  • You need to find function/method calls with specific arguments
  • Grep returns too many false positives (matches in comments, strings, similar names)
  • You need to find patterns inside specific contexts (inside loops, inside try blocks)
  • The pattern has structural meaning, not just text

Stick with grep when:

  • Simple text search
  • Speed is critical
  • The pattern is a literal string or simple regex

2. Security Scanning

Run rulesets to detect vulnerabilities and insecure patterns.

Installation

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/opengrep/opengrep/main/install.sh | bash

Windows (PowerShell)

irm https://raw.githubusercontent.com/opengrep/opengrep/main/install.ps1 | iex

Manual Install

Download binaries from the releases page.

Verify

opengrep --version

Self-contained binaries are available for macOS, Linux, and Windows. No Python required.

Run opengrep scan --help to discover all available options and flags.

Code Search Patterns

Quick One-Liners

# Find all calls to a function
opengrep scan -e 'dangerous_function(...)' -l python .

# Find method calls on specific objects
opengrep scan -e '$OBJ.execute(...)' -l python .

# Find assignments to a variable name
opengrep scan -e '$VAR = os.environ.get(...)' -l python .

# Find function definitions
opengrep scan -e 'def $FUNC(...): ...' -l python .

Performance Note

Opengrep parses the entire file into an AST. For a quick text search, grep is 10-100x faster. Use Opengrep when the structural match is worth the overhead.

Security Scanning

Quick Scan

# Scan with a ruleset
opengrep scan --config p/security-audit .

# Multiple rulesets
opengrep scan --config p/security-audit --config p/owasp-top-ten .

# Scan specific paths
opengrep scan --config p/python src/

Output Formats

# SARIF for tooling
opengrep scan --config p/default --sarif -o results.sarif .

# JSON for automation
opengrep scan --config p/default --json -o results.json .

# Show data flow traces
opengrep scan --dataflow-traces -f rule.yaml .

# Include enclosing context (function/class) in JSON output (experimental)
opengrep scan --output-enclosing-context --json -f rule.yaml . --experimental

Filtering

# By severity
opengrep scan --config p/default --severity ERROR .

# By path
opengrep scan --config p/default --include='src/**' --exclude='**/test/**' .

# Apply exclusions to explicitly passed file targets (not just directory scan roots)
opengrep scan --force-exclude --exclude='**/vendor/**' -f rule.yaml vendor/lib.py

Intrafile Cross-Function Tainting

Opengrep supports tracking taint across functions within a file:

opengrep scan --taint-intrafile -f taint-rule.yaml .

This enables higher-order function support and is similar to Semgrep Pro’s --pro-intrafile.

Writing Custom Rules

Basic Rule Structure

rules:
  - id: hardcoded-secret
    languages: [python]
    message: "Hardcoded secret detected in $VAR"
    severity: ERROR
    patterns:
      - pattern: $VAR = "$VALUE"
      - metavariable-regex:
          metavariable: $VALUE
          regex: ^sk_live_

Rule ID Uniqueness

Important: Rule IDs must be unique across all rules in a configuration. If multiple rules share the same ID, only one will be used due to deduplication during rule loading.

This is particularly important when writing rules for multiple languages. You cannot reuse the same rule ID even if the rules target different languages:

# WRONG - both rules have id: taint, only one will be active
rules:
  - id: taint
    languages: [python]
    pattern: dangerous_call(...)
    # ...

  - id: taint
    languages: [rust]
    pattern: unsafe_fn(...)
    # ...
# CORRECT - unique IDs for each rule
rules:
  - id: taint-python-dangerous-call
    languages: [python]
    pattern: dangerous_call(...)
    # ...

  - id: taint-rust-unsafe-fn
    languages: [rust]
    pattern: unsafe_fn(...)
    # ...

Use descriptive IDs that include the language or context to avoid collisions.

Pattern Syntax

Syntax Meaning
... Match zero or more arguments/statements
$VAR Metavariable (captures any expression)
$...ARGS Ellipsis metavariable (captures zero or more)
<... $X ...> Deep expression match (nested)

Pattern Operators

Operator Purpose
pattern Match exact pattern
patterns All must match (AND)
pattern-either Any can match (OR)
pattern-not Exclude matches
pattern-inside Must be inside context
pattern-not-inside Must not be inside context
pattern-regex Regex matching
metavariable-regex Filter captured values by regex
metavariable-comparison Compare captured values

Combining Patterns

rules:
  - id: dangerous-deserialization
    languages: [python]
    message: "Unsafe pickle load on potentially untrusted data"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: pickle.load(...)
          - pattern: pickle.loads(...)
      - pattern-not-inside: |
          def $FUNC(...):
            ...

Taint Mode

For tracking data flow from sources to sinks:

rules:
  - id: sql-injection
    languages: [python]
    message: "User input flows to SQL query without parameterization"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: cursor.execute($QUERY, ...)
        focus-metavariable: $QUERY
    pattern-sanitizers:
      - pattern: int(...)
      - pattern: escape(...)

Key taint concepts:

  • Sources: Where untrusted data enters
  • Sinks: Where data becomes dangerous
  • Sanitizers: What makes data safe

YAML Pitfalls

Patterns are YAML string values. Any pattern containing : (colon-space) will be misinterpreted as a YAML mapping and cause a validation error. Always quote such patterns:

# BROKEN -- YAML parser sees "shell: true" as a nested mapping
- pattern: spawn($CMD, { ..., shell: true, ... })

# CORRECT -- quoted string, parsed as a single pattern value
- pattern: "spawn($CMD, { ..., shell: true, ... })"

Common triggers: shell: true, mode: 0o777, redirect: "follow", error: $ERR. When in doubt, quote the pattern.

For patterns containing both double quotes and colons, use escaped inner quotes:

- pattern: "fetch($URL, { ..., redirect: \"follow\", ... })"

Scanning Non-Code Files (generic language)

For XML configs, YAML pipelines, Dockerfiles, and other non-code files, structural patterns may not work. Use languages: [generic] with pattern-regex:

rules:
  - id: cleartext-traffic
    patterns:
      - pattern-regex: 'cleartextTrafficPermitted\s*=\s*"true"'
    languages: [generic]
    severity: WARNING
    paths:
      include:
        - "app/src/"

The paths key restricts which files a rule applies to:

paths:
  include:
    - ".github/workflows/"    # Only scan GHA workflow files
  exclude:
    - "test/"                 # Skip test directories

Note: languages: [dockerfile] exists but pattern-not-inside does not work reliably across multiple Dockerfile directives. Prefer generic + regex for Dockerfile security checks that span multiple lines.

metavariable-pattern

Use metavariable-pattern to constrain what a metavariable can match. It must be a list item inside patterns:, alongside the pattern that captures the metavariable:

rules:
  - id: dynamic-property-assignment
    patterns:
      - pattern: $OBJ[$KEY] = $VALUE
      - metavariable-pattern:
          metavariable: $KEY
          patterns:
            - pattern-not: "..."    # Exclude literal string keys
    languages: [typescript, javascript]
    severity: WARNING

This is distinct from metavariable-regex which filters by regex rather than by pattern.

Pattern Parsing Limitations

Some language constructs cannot be matched as standalone fragments:

  • catch blocks: catch ($ERR) { ... } alone is invalid. You must include the try: try { ... } catch ($ERR) { ... }. Even then, complex ellipsis inside the catch body may fail to parse for TypeScript.
  • Workaround: Match the dangerous expression directly instead of wrapping in try/catch context.

Rule Options (Opengrep-specific)

rules:
  - id: expensive-rule
    options:
      timeout: 10              # Per-rule timeout (requires --allow-rule-timeout-control)
      dynamic_timeout: true    # Scale timeout with file size
      max_match_per_file: 100  # Limit matches per file
    # ... rest of rule

Use --allow-rule-timeout-control to enable per-rule timeouts.

Testing Rules

Test File Annotations

Place # ruleid: on the line immediately before the expected finding (for taint rules, before the sink):

# test_rule.py

def vulnerable():
    user_id = request.args.get("id")
    # ruleid: sql-injection
    cursor.execute("SELECT * FROM users WHERE id = " + user_id)

def safe():
    user_id = int(request.args.get("id"))
    # ok: sql-injection
    cursor.execute("SELECT * FROM users WHERE id = " + user_id)

Running Tests

# Test a rule (supports multiple target files)
opengrep test --config rule.yaml test_file.py test_file2.py

# Validate rule syntax (positional argument, not --config)
opengrep validate rule.yaml

# Debug taint flow
opengrep scan --dataflow-traces -f rule.yaml test_file.py

Configuration

.semgrepignore

Opengrep uses .semgrepignore for compatibility. Custom filename via --semgrepignore-filename:

# Ignore directories
vendor/
node_modules/
**/testdata/

# Ignore patterns
*.min.js
*.generated.go

Inline Suppressions

Default annotations: nosemgrep, nosem, noopengrep (all work):

password = get_from_vault()  # nosemgrep: hardcoded-password
password = get_from_vault()  # nosem: hardcoded-password
password = get_from_vault()  # noopengrep: hardcoded-password

Extend with additional patterns using --opengrep-ignore-pattern:

# Add nosec as an additional suppression annotation
opengrep scan --opengrep-ignore-pattern='nosec' -f rule.yaml .

Rule Metadata

rules:
  - id: command-injection
    metadata:
      category: security
      subcategory:
        - vuln              # vuln = confirmed vulnerability, audit = needs review
      cwe:
        - "CWE-78: Improper Neutralization of Special Elements used in an OS Command"
      owasp:
        - "A03:2021 - Injection"
      confidence: HIGH      # HIGH, MEDIUM, or LOW
      references:
        - https://owasp.org/Top10/A03_2021-Injection/
    # ... rest of rule

The category/subcategory/confidence fields are used by tooling to classify and filter findings. Use subcategory: [vuln] for high-confidence vulnerabilities and subcategory: [audit] for patterns that need manual review.

Use --inline-metavariables to include metavariable values in metadata output.

Common Rulesets

Ruleset Focus
p/security-audit Comprehensive security rules
p/owasp-top-ten OWASP Top 10 vulnerabilities
p/cwe-top-25 CWE Top 25 vulnerabilities
p/python / p/javascript / p/golang Language-specific

Note: Ruleset availability may differ from Semgrep registry.

Differences from Semgrep

Opengrep is forked from Semgrep 1.100.0. Key differences:

  • Semgrep Pro features, open: Intrafile cross-function tainting (--taint-intrafile), higher-order function support, and inter-method taint flow — all available without a commercial license
  • Additional languages: Visual Basic (not available in Semgrep CE or Pro), Apex, Elixir (not in Semgrep CE), improved Clojure with taint support
  • Windows: Native support
  • Per-rule timeouts: timeout and dynamic_timeout rule options
  • Match limits: max_match_per_file rule option and --max-match-per-file CLI flag
  • Context output: --output-enclosing-context shows function/class context
  • Custom ignore patterns: --opengrep-ignore-pattern extends default suppressions

For full changelog, see: https://github.com/opengrep/opengrep/blob/main/OPENGREP.md

Existing Semgrep rules and documentation generally apply.

Resources