regex-builder

📁 mathews-tom/praxis-skills 📅 Today

总安装量

周安装量

#76170

全站排名

安装命令

npx skills add https://github.com/mathews-tom/praxis-skills --skill regex-builder

Agent 安装分布

amp 1

cline 1

pi 1

opencode 1

cursor 1

kimi-cli 1

Skill 文档

Regex Builder

Transforms matching requirements (positive and negative examples) into tested regex patterns with component-by-component explanations, capture group documentation, edge case identification, and ready-to-use code in Python and JavaScript.

Reference Files

File	Contents	Load When
`references/character-classes.md`	Character class reference, Unicode categories, POSIX classes	Always
`references/quantifiers.md`	Quantifier behavior, greedy vs lazy vs possessive, backtracking	Pattern needs repetition
`references/common-patterns.md`	Validated patterns for email, URL, phone, IP, date, UUID, etc.	Common validation requested
`references/flavor-differences.md`	Syntax differences between Python, JavaScript, PCRE, POSIX	Multi-language usage needed

Prerequisites

Clear specification: what should match and what should not
Target regex flavor (Python re, JavaScript, PCRE) â defaults to Python

Workflow

Phase 1: Collect Examples

Gather positive (should match) and negative (should not match) examples:

From user â Explicit examples provided
From context â If the user says “match email addresses,” infer standard positive and negative examples
From data â If sample data is provided, identify the pattern within it

Minimum: 3 positive examples and 3 negative examples. Fewer examples risk overfitting the pattern to specific cases.

Phase 2: Infer Pattern

Analyze the examples to build a pattern:

Identify fixed literals â Characters that appear in the same position across all positive examples
Identify character classes â Positions where different characters appear but follow a pattern (digits, letters, alphanumeric)
Identify repetition â Elements that appear a variable number of times
Identify optional elements â Parts present in some positive examples but not others
Identify anchoring â Must the pattern match the entire string or can it be a substring?

Phase 3: Explain Pattern

Break down the pattern into a component table:

Component	Meaning
`^`	Start of string
`[A-Za-z]`	One letter (upper or lower)
`\d{3,5}`	3 to 5 digits
`$`	End of string

Document capture groups separately if the pattern uses them.

Phase 4: Generate Edge Cases

For every pattern, identify inputs that are likely to cause problems:

Empty string â Does the pattern handle it correctly?
Almost-matching strings â One character off from a valid match
Boundary lengths â Minimum and maximum valid lengths
Special characters â Dots, brackets, backslashes in the input
Unicode â Multi-byte characters, emoji, diacritics
Catastrophic backtracking â Inputs that cause exponential matching time

Phase 5: Output

Produce the pattern, explanation, test cases, and usage examples.

Output Format

## Regex Pattern: {Brief Description}

### Requirements
- **Must match:** {description of valid inputs}
- **Must reject:** {description of invalid inputs}
- **Flavor:** {Python re | JavaScript | PCRE}

### Pattern
```regex
{pattern}

Explanation

Component	Meaning
`{component}`	{what it matches and why}

Capture Groups

Group	Name	Captures	Example
1	{name}	{what}	{example value}

Test Cases

#	Input	Should Match	Reason
1	`{input}`	Yes	{why â happy path}
2	`{input}`	Yes	{why â boundary}
3	`{input}`	No	{why â invalid}
4	`{input}`	No	{why â near-miss}
5	“ (empty)	No	Empty input

Edge Cases

{Edge case 1}: {what to watch for}
{Edge case 2}: {what to watch for}

Usage

Python:

import re

pattern = re.compile(r'{pattern}')

# Match entire string
if pattern.fullmatch(text):
    ...

# Search within string
match = pattern.search(text)
if match:
    captured = match.group(1)

# Find all matches
matches = pattern.findall(text)

JavaScript:

const pattern = /{pattern}/;

// Test
if (pattern.test(text)) { ... }

// Match
const match = text.match(pattern);
if (match) {
    const captured = match[1];
}

// Find all
const matches = [...text.matchAll(/{pattern}/g)];


## Calibration Rules

1. **Correctness over cleverness.** A readable, slightly longer pattern is better than
   a cryptic short one. `[A-Za-z0-9]` is clearer than `\w` when you specifically mean
   alphanumeric without underscores.
2. **Test negatives as rigorously as positives.** A pattern that matches everything
   technically matches all positive examples. Negative examples prevent over-matching.
3. **Anchor when appropriate.** `^\d{3}$` matches exactly 3 digits. `\d{3}` matches
   3 digits anywhere in the string. State the anchoring intent explicitly.
4. **Avoid catastrophic backtracking.** Nested quantifiers like `(a+)+` cause exponential
   time on non-matching input. Test with adversarial inputs.
5. **Named groups over numbered groups.** `(?P<year>\d{4})` (Python) or `(?<year>\d{4})`
   (JS) is self-documenting. Use numbered groups only for simple patterns.
6. **Specify the flavor.** Python `re`, JavaScript, and PCRE have different feature sets.
   Lookaheads, lookbehinds, and Unicode support vary.

## Error Handling

| Problem | Resolution |
|---------|------------|
| Insufficient examples | Ask for more. Minimum 3 positive, 3 negative. |
| Contradictory examples | Flag the contradiction. Ask which examples are correct. |
| Requirements too complex for regex | Suggest a parser instead. Regex cannot handle recursive structures (nested brackets, HTML). |
| Pattern causes backtracking | Rewrite with atomic groups or possessive quantifiers. Test with worst-case input. |
| Unicode requirements unclear | Ask if the pattern needs to handle non-ASCII. Default to ASCII unless specified. |
| Multiple valid patterns | Present the simplest one. Mention alternatives if they have meaningful tradeoffs (performance vs readability). |

## When NOT to Build Regex

Push back if:
- The input requires parsing a recursive grammar (HTML, JSON, nested expressions) â use a parser
- The validation is for a standard format with a library (email validation, URL parsing) â use the standard library
- The pattern is for security-critical input validation as the sole defense â regex is a first filter, not a security boundary
- The user wants to modify matched content in complex ways â regex replacement has limits; suggest code instead

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台