split

📁 bdambrosio/cognitive_workbench 📅 8 days ago
1
总安装量
1
周安装量
#76281
全站排名
安装命令
npx skills add https://github.com/bdambrosio/cognitive_workbench --skill split

Agent 安装分布

mcpjam 1
claude-code 1
replit 1
junie 1
windsurf 1
zencoder 1

Skill 文档

Split Primitive

Transform a Note’s internal structure into a Collection of Notes.

Overview

The split primitive transforms a single Note into a Collection by splitting its content based on structure or delimiter. This is a structure transformation, not content inspection. To view Collection contents, use display (show to user) or flatten (merge into single Note).

Input Types

The split primitive handles four input types:

  1. JSON Array: Directly splits array elements (e.g., [1, 2, 3] → Collection of 3 Notes)
  2. JSON Object with Array Field: Extracts array from specified field (default: results)
  3. JSONL Format: Multiple JSON objects separated by newlines
  4. Plain Text: Splits by delimiter (default: sentence boundaries)

Parameters

Required

  • target: $variable – Note containing JSON array, JSON object with array field, JSONL, or plain text
  • out: $variable – Variable name for resulting Collection

Optional

  • field: string (default: 'results') – Name of array field in JSON object case
  • delimiter: string (default: 'sentence') – For plain text splitting:
    • 'sentence' – Splits on sentence boundaries (. , .\n, ! , !\n, ? , ?\n)
    • 'paragraph' – Splits on double newlines (\n\n)
    • 'line' – Splits on single newlines (\n)
    • Custom string – Splits on the specified delimiter string

Behavior

For Plain Text (default delimiter: 'sentence')

When splitting plain text with the default 'sentence' delimiter:

  • Splits on sentence boundaries: period (.), exclamation (!), or question mark (?) followed by space or newline
  • Normalizes whitespace: Internal newlines are removed and multiple spaces are collapsed to single spaces within each segment
  • Filters empty segments: Zero-length or whitespace-only segments are removed
  • Preserves semantic units (complete sentences) even when text spans multiple lines (e.g., PDF-extracted text)

For Other Delimiters

  • 'paragraph': Splits on double newlines, normalizes whitespace within paragraphs
  • 'line': Splits on single newlines, normalizes whitespace within lines
  • Custom delimiter: Splits on the specified string, normalizes whitespace within segments

Examples

JSON Array

{"type":"split","target":"$json_array_note","out":"$items"}

Input Note: [1, 2, 3] → Collection of 3 Notes

JSON Object with Array Field

{"type":"split","target":"$data_note","out":"$items"}

Input Note: {"results": [{"x":1}, {"x":2}]} → Collection of 2 Notes

JSONL Format

{"type":"split","target":"$jsonl_note","out":"$items"}

Input Note: {"key":"val1"}\n{"key":"val2"} → Collection of 2 Notes

Plain Text – Sentence Splitting (Default)

{"type":"split","target":"$text_note","out":"$sentences"}

Input Note: "First sentence. Second sentence! Third sentence?" → Collection of 3 Notes

Plain Text – Paragraph Splitting

{"type":"split","target":"$text_note","delimiter":"paragraph","out":"$paragraphs"}

Input Note: "Para 1\n\nPara 2" → Collection of 2 Notes

Plain Text – Line Splitting

{"type":"split","target":"$text_note","delimiter":"line","out":"$lines"}

Input Note: "Line 1\nLine 2\nLine 3" → Collection of 3 Notes

Plain Text – Custom Delimiter

{"type":"split","target":"$text_note","delimiter":"---","out":"$sections"}

Input Note: "Section 1---Section 2---Section 3" → Collection of 3 Notes

Use Cases

  • Document Processing: Split RFP documents into sentences for compliance analysis
  • Text Analysis: Break down large text documents into semantic units
  • Data Extraction: Transform structured JSON arrays into Collections for processing
  • PDF Text Processing: Handle PDF-extracted text where sentences span multiple lines

Important Notes

  • NOT for inspecting Collections: Collections are already split. Use display to view Collection contents or flatten to merge back into a single Note.
  • search-web and semantic-scholar: These tools return Collections directly – NO split needed.
  • Whitespace Normalization: For plain text, internal newlines are removed and multiple spaces are collapsed to ensure clean semantic units.
  • Empty Filtering: Empty or whitespace-only segments are automatically filtered out.

Common Mistakes

  • Trying to split a Collection to “see inside it” – Collections are already split, use display instead
  • Forgetting that default behavior is sentence splitting (not line splitting) for plain text
  • Expecting line breaks to be preserved – they are normalized to spaces for semantic processing