regex-log

📁 letta-ai/skills 📅 Jan 24, 2026
22
总安装量
19
周安装量
#16886
全站排名
安装命令
npx skills add https://github.com/letta-ai/skills --skill regex-log

Agent 安装分布

claude-code 14
codex 13
gemini-cli 13
opencode 13
antigravity 12
cursor 10

Skill 文档

Regex Log Parsing Skill

This skill provides a systematic approach for constructing complex regular expressions that extract and validate structured data from log files.

When to Use This Skill

This skill applies when:

  • Building regex patterns to extract data from log entries
  • Validating specific formats (IPv4 addresses, dates, timestamps) within logs
  • Handling requirements for first/last occurrence selection
  • Enforcing word boundary conditions
  • Combining multiple validation constraints in a single pattern

Approach: Decomposition Strategy

Complex log parsing regex should be built by decomposing the problem into sub-patterns:

Step 1: Identify All Requirements

Before writing any regex, create a complete list of requirements:

  • What data needs to be validated (present but not captured)?
  • What data needs to be captured?
  • What boundary conditions apply (word boundaries, line anchors)?
  • Are there positional requirements (first, last, nth occurrence)?
  • What constitutes an invalid match?

Step 2: Build Sub-Patterns Independently

Construct each validation pattern separately before combining:

IPv4 Address Pattern

For valid IPv4 addresses (0-255 per octet, no leading zeros except for 0 itself):

  • Octet pattern: (?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])
  • Order alternatives from most specific to least specific
  • Full IPv4: (?:(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])

Date Pattern (YYYY-MM-DD)

For valid dates with proper month-day validation:

  • 31-day months: (?:0[13578]|1[02])-(?:0[1-9]|[12][0-9]|3[01])
  • 30-day months: (?:0[469]|11)-(?:0[1-9]|[12][0-9]|30)
  • February (up to 29): 02-(?:0[1-9]|1[0-9]|2[0-9])
  • Combine with year: [0-9]{4}-(?:...combined month-day patterns...)

Step 3: Apply Positional Requirements

Selecting Last Occurrence

To capture the last valid pattern in a line:

^.*<pattern>(?!.*<pattern>)
  • Use ^.* to greedily consume characters
  • Use negative lookahead (?!.*<pattern>) to ensure no pattern follows

Selecting First Occurrence

To capture the first valid pattern:

^(?:(?!<pattern>).)*<pattern>

Or simply rely on regex engines returning the first match by default.

Step 4: Apply Validation Without Capture

To require presence of a pattern without capturing it:

  • Use lookahead: (?=.*<pattern>) at the start of the regex
  • This validates the line contains the pattern without affecting the capture

Step 5: Apply Word Boundaries

For patterns that must not be adjacent to alphanumeric characters:

  • Use \b word boundaries: \b<pattern>\b
  • Be aware that \b matches between word and non-word characters

Verification Strategy

Create Comprehensive Test Cases

Organize tests by category:

  1. Valid cases: Confirm expected matches

    • Minimum/maximum valid values (e.g., 0.0.0.0, 255.255.255.255)
    • Edge values for each component
  2. Invalid format cases: Confirm rejection

    • Out-of-range values (e.g., 256.0.0.0)
    • Invalid formatting (leading zeros where prohibited)
    • Invalid months (00, 13) or days (32)
  3. Boundary condition cases:

    • Pattern at start/end of line
    • Pattern adjacent to alphanumeric characters (should fail with word boundaries)
    • Pattern adjacent to punctuation (should pass with word boundaries)
  4. Positional cases:

    • Multiple valid patterns in one line (verify correct one is captured)
    • Single pattern in line
    • No valid pattern in line

Test File Structure

Create a structured test file that:

  • Groups tests by category
  • Uses clear naming for each test case
  • Reports pass/fail status for each test
  • Summarizes overall results

Example structure:

test_cases = {
    "valid_ipv4": [...],
    "invalid_ipv4": [...],
    "valid_dates": [...],
    "invalid_dates": [...],
    "last_occurrence": [...],
    "boundary_conditions": [...]
}

Common Pitfalls

1. Incomplete First Attempt

  • Problem: Creating incomplete or truncated test files
  • Solution: Plan the full test structure before writing; validate file completeness before execution

2. Environment Assumptions

  • Problem: Assuming python command exists when only python3 is available
  • Solution: Check the Python environment first or use python3 explicitly

3. Scattered Reasoning

  • Problem: Disorganized thought process leading to repeated work
  • Solution: Follow the decomposition strategy linearly; complete each sub-pattern before moving to the next

4. Duplicate Patterns Without Abstraction

  • Problem: Same regex pattern repeated multiple times, increasing error risk
  • Solution: Define complex sub-patterns once in reasoning, then reference them; in code, use variables

5. Missing Edge Cases

  • Problem: Focusing only on happy path validation
  • Solution: Explicitly test:
    • Boundary values (min/max for each component)
    • Invalid values just outside valid range
    • Empty and null cases
    • Patterns at different positions in the line

6. Order of Alternatives

  • Problem: Less specific alternatives matching before more specific ones
  • Solution: Order regex alternatives from most specific to least specific (e.g., 25[0-5] before 2[0-4][0-9] before [0-9])

7. Greedy vs Non-Greedy Matching

  • Problem: Unexpected capture due to greedy quantifiers
  • Solution: Understand when to use .* vs .*?; for “last occurrence” patterns, greedy .* is typically correct

Workflow Summary

  1. List all requirements explicitly
  2. Build and test sub-patterns independently
  3. Combine sub-patterns with appropriate anchors and lookaheads
  4. Create comprehensive test cases covering all categories
  5. Run tests and verify all pass
  6. Clean up test files after validation