regex-builder
1
总安装量
1
周安装量
#76170
全站排名
安装命令
npx skills add https://github.com/mathews-tom/praxis-skills --skill regex-builder
Agent 安装分布
amp
1
cline
1
pi
1
opencode
1
cursor
1
kimi-cli
1
Skill 文档
Regex Builder
Transforms matching requirements (positive and negative examples) into tested regex patterns with component-by-component explanations, capture group documentation, edge case identification, and ready-to-use code in Python and JavaScript.
Reference Files
| File | Contents | Load When |
|---|---|---|
references/character-classes.md |
Character class reference, Unicode categories, POSIX classes | Always |
references/quantifiers.md |
Quantifier behavior, greedy vs lazy vs possessive, backtracking | Pattern needs repetition |
references/common-patterns.md |
Validated patterns for email, URL, phone, IP, date, UUID, etc. | Common validation requested |
references/flavor-differences.md |
Syntax differences between Python, JavaScript, PCRE, POSIX | Multi-language usage needed |
Prerequisites
- Clear specification: what should match and what should not
- Target regex flavor (Python
re, JavaScript, PCRE) â defaults to Python
Workflow
Phase 1: Collect Examples
Gather positive (should match) and negative (should not match) examples:
- From user â Explicit examples provided
- From context â If the user says “match email addresses,” infer standard positive and negative examples
- From data â If sample data is provided, identify the pattern within it
Minimum: 3 positive examples and 3 negative examples. Fewer examples risk overfitting the pattern to specific cases.
Phase 2: Infer Pattern
Analyze the examples to build a pattern:
- Identify fixed literals â Characters that appear in the same position across all positive examples
- Identify character classes â Positions where different characters appear but follow a pattern (digits, letters, alphanumeric)
- Identify repetition â Elements that appear a variable number of times
- Identify optional elements â Parts present in some positive examples but not others
- Identify anchoring â Must the pattern match the entire string or can it be a substring?
Phase 3: Explain Pattern
Break down the pattern into a component table:
| Component | Meaning |
|---|---|
^ |
Start of string |
[A-Za-z] |
One letter (upper or lower) |
\d{3,5} |
3 to 5 digits |
$ |
End of string |
Document capture groups separately if the pattern uses them.
Phase 4: Generate Edge Cases
For every pattern, identify inputs that are likely to cause problems:
- Empty string â Does the pattern handle it correctly?
- Almost-matching strings â One character off from a valid match
- Boundary lengths â Minimum and maximum valid lengths
- Special characters â Dots, brackets, backslashes in the input
- Unicode â Multi-byte characters, emoji, diacritics
- Catastrophic backtracking â Inputs that cause exponential matching time
Phase 5: Output
Produce the pattern, explanation, test cases, and usage examples.
Output Format
## Regex Pattern: {Brief Description}
### Requirements
- **Must match:** {description of valid inputs}
- **Must reject:** {description of invalid inputs}
- **Flavor:** {Python re | JavaScript | PCRE}
### Pattern
```regex
{pattern}
Explanation
| Component | Meaning |
|---|---|
{component} |
{what it matches and why} |
Capture Groups
| Group | Name | Captures | Example |
|---|---|---|---|
| 1 | {name} | {what} | {example value} |
Test Cases
| # | Input | Should Match | Reason |
|---|---|---|---|
| 1 | {input} |
Yes | {why â happy path} |
| 2 | {input} |
Yes | {why â boundary} |
| 3 | {input} |
No | {why â invalid} |
| 4 | {input} |
No | {why â near-miss} |
| 5 | “ (empty) | No | Empty input |
Edge Cases
- {Edge case 1}: {what to watch for}
- {Edge case 2}: {what to watch for}
Usage
Python:
import re
pattern = re.compile(r'{pattern}')
# Match entire string
if pattern.fullmatch(text):
...
# Search within string
match = pattern.search(text)
if match:
captured = match.group(1)
# Find all matches
matches = pattern.findall(text)
JavaScript:
const pattern = /{pattern}/;
// Test
if (pattern.test(text)) { ... }
// Match
const match = text.match(pattern);
if (match) {
const captured = match[1];
}
// Find all
const matches = [...text.matchAll(/{pattern}/g)];
## Calibration Rules
1. **Correctness over cleverness.** A readable, slightly longer pattern is better than
a cryptic short one. `[A-Za-z0-9]` is clearer than `\w` when you specifically mean
alphanumeric without underscores.
2. **Test negatives as rigorously as positives.** A pattern that matches everything
technically matches all positive examples. Negative examples prevent over-matching.
3. **Anchor when appropriate.** `^\d{3}$` matches exactly 3 digits. `\d{3}` matches
3 digits anywhere in the string. State the anchoring intent explicitly.
4. **Avoid catastrophic backtracking.** Nested quantifiers like `(a+)+` cause exponential
time on non-matching input. Test with adversarial inputs.
5. **Named groups over numbered groups.** `(?P<year>\d{4})` (Python) or `(?<year>\d{4})`
(JS) is self-documenting. Use numbered groups only for simple patterns.
6. **Specify the flavor.** Python `re`, JavaScript, and PCRE have different feature sets.
Lookaheads, lookbehinds, and Unicode support vary.
## Error Handling
| Problem | Resolution |
|---------|------------|
| Insufficient examples | Ask for more. Minimum 3 positive, 3 negative. |
| Contradictory examples | Flag the contradiction. Ask which examples are correct. |
| Requirements too complex for regex | Suggest a parser instead. Regex cannot handle recursive structures (nested brackets, HTML). |
| Pattern causes backtracking | Rewrite with atomic groups or possessive quantifiers. Test with worst-case input. |
| Unicode requirements unclear | Ask if the pattern needs to handle non-ASCII. Default to ASCII unless specified. |
| Multiple valid patterns | Present the simplest one. Mention alternatives if they have meaningful tradeoffs (performance vs readability). |
## When NOT to Build Regex
Push back if:
- The input requires parsing a recursive grammar (HTML, JSON, nested expressions) â use a parser
- The validation is for a standard format with a library (email validation, URL parsing) â use the standard library
- The pattern is for security-critical input validation as the sole defense â regex is a first filter, not a security boundary
- The user wants to modify matched content in complex ways â regex replacement has limits; suggest code instead