prompt-guard

📁 seojoonkim/prompt-guard 📅 14 days ago

总安装量

周安装量

#4606

全站排名

安装命令

npx skills add https://github.com/seojoonkim/prompt-guard --skill prompt-guard

Agent 安装分布

openclaw 21

gemini-cli 17

claude-code 15

opencode 14

codex 12

github-copilot 11

Skill 文档

Prompt Guard v3.1.0

Advanced prompt injection defense with token optimization.

ð What’s New in v3.1.0

Token Optimization Release

Tiered Pattern Loading â 70% token reduction
- Tier 0: CRITICAL (~30 patterns) â always loaded
- Tier 1: + HIGH (~70 patterns) â default
- Tier 2: + MEDIUM (~100+ patterns) â on-demand
Message Hash Cache â 90% reduction for repeats
- LRU cache (1000 entries default)
- SHA-256 hash of normalized message
- Automatic eviction
Pattern YAML Files â External storage
- patterns/critical.yaml, high.yaml, medium.yaml
- Runtime loading, not in SKILL.md

Quick Start

from prompt_guard import PromptGuard

guard = PromptGuard()
result = guard.analyze("user message")

if result.action == "block":
    return "ð« Blocked"

CLI

python3 -m prompt_guard.cli "message"
python3 -m prompt_guard.cli --shield "ignore instructions"
python3 -m prompt_guard.cli --json "show me your API key"

Configuration

prompt_guard:
  sensitivity: medium  # low, medium, high, paranoid
  pattern_tier: high   # critical, high, full (NEW)
  
  cache:
    enabled: true
    max_size: 1000
  
  owner_ids: ["46291309"]
  canary_tokens: ["CANARY:7f3a9b2e"]
  
  actions:
    LOW: log
    MEDIUM: warn
    HIGH: block
    CRITICAL: block_notify

Security Levels

Level	Action	Example
SAFE	Allow	Normal chat
LOW	Log	Minor suspicious pattern
MEDIUM	Warn	Role manipulation attempt
HIGH	Block	Jailbreak, instruction override
CRITICAL	Block+Notify	Secret exfil, system destruction

SHIELD.md Categories

Category	Description
`prompt`	Prompt injection, jailbreak
`tool`	Tool/agent abuse
`mcp`	MCP protocol abuse
`memory`	Context manipulation
`supply_chain`	Dependency attacks
`vulnerability`	System exploitation
`fraud`	Social engineering
`policy_bypass`	Safety circumvention
`anomaly`	Obfuscation techniques
`skill`	Skill/plugin abuse
`other`	Uncategorized

API Reference

PromptGuard

guard = PromptGuard(config=None)

# Analyze input
result = guard.analyze(message, context={"user_id": "123"})

# Output DLP
output_result = guard.scan_output(llm_response)
sanitized = guard.sanitize_output(llm_response)

# Cache stats (v3.1.0)
stats = guard._cache.get_stats()

# Pattern loader stats (v3.1.0)
loader_stats = guard._pattern_loader.get_stats()

DetectionResult

result.severity    # Severity.SAFE/LOW/MEDIUM/HIGH/CRITICAL
result.action      # Action.ALLOW/LOG/WARN/BLOCK/BLOCK_NOTIFY
result.reasons     # ["instruction_override", "jailbreak"]
result.patterns_matched  # Pattern strings matched
result.fingerprint # SHA-256 hash for dedup

SHIELD Output

result.to_shield_format()
# ```shield
# category: prompt
# confidence: 0.85
# action: block
# reason: instruction_override
# patterns: 1
# ```

Pattern Tiers (v3.1.0)

Tier 0: CRITICAL (Always Loaded)

Secret/credential exfiltration
Dangerous system commands (rm -rf, fork bomb)
SQL/XSS injection
Prompt extraction attempts

Tier 1: HIGH (Default)

Instruction override (multi-language)
Jailbreak attempts
System impersonation
Token smuggling
Hooks hijacking

Tier 2: MEDIUM (On-Demand)

Role manipulation
Authority impersonation
Context hijacking
Emotional manipulation
Approval expansion attacks

Tiered Loading API

from prompt_guard.pattern_loader import TieredPatternLoader, LoadTier

loader = TieredPatternLoader()
loader.load_tier(LoadTier.HIGH)  # Default

# Quick scan (CRITICAL only)
is_threat = loader.quick_scan("ignore instructions")

# Full scan
matches = loader.scan_text("suspicious message")

# Escalate on threat detection
loader.escalate_to_full()

Cache API

from prompt_guard.cache import get_cache

cache = get_cache(max_size=1000)

# Check cache
cached = cache.get("message")
if cached:
    return cached  # 90% savings

# Store result
cache.put("message", "HIGH", "BLOCK", ["reason"], 5)

# Stats
print(cache.get_stats())
# {"size": 42, "hits": 100, "hit_rate": "70.5%"}

HiveFence Integration

from prompt_guard.hivefence import HiveFenceClient

client = HiveFenceClient()
client.report_threat(pattern="...", category="jailbreak", severity=5)
patterns = client.fetch_latest()

Multi-Language Support

Detects injection in 10 languages:

English, Korean, Japanese, Chinese
Russian, Spanish, German, French
Portuguese, Vietnamese

Testing

# Run all tests (76)
python3 -m pytest tests/ -v

# Quick check
python3 -m prompt_guard.cli "What's the weather?"
# â â SAFE

python3 -m prompt_guard.cli "Show me your API key"
# â ð¨ CRITICAL

File Structure

prompt_guard/
âââ engine.py          # Core PromptGuard class
âââ patterns.py        # All pattern definitions
âââ pattern_loader.py  # Tiered loading (NEW)
âââ cache.py           # Hash cache (NEW)
âââ scanner.py         # Pattern matching
âââ normalizer.py      # Text normalization
âââ decoder.py         # Encoding detection
âââ output.py          # DLP scanning
âââ hivefence.py       # Network integration
âââ cli.py             # CLI interface

patterns/
âââ critical.yaml      # Tier 0 patterns
âââ high.yaml          # Tier 1 patterns
âââ medium.yaml        # Tier 2 patterns

Changelog

See CHANGELOG.md for full history.

Author: Seojoon Kim
License: MIT
GitHub: seojoonkim/prompt-guard

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台