transcript-pipeline

📁 prakharmnnit/skills-and-personas 📅 6 days ago
3
总安装量
3
周安装量
#58248
全站排名
安装命令
npx skills add https://github.com/prakharmnnit/skills-and-personas --skill transcript-pipeline

Agent 安装分布

codex 3
mcpjam 2
claude-code 2
junie 2
windsurf 2
zencoder 2

Skill 文档

Transcript Pipeline Skill

Run a deterministic, auditable transcript-to-tutorial workflow with optional resource enrichment.

Purpose

Use this skill to convert raw class captions into high-quality study notes while preserving accountability through ledger + validation artifacts.

Use scripts for deterministic work. Use chat/stage prompts for language-heavy transformation.

Core Contract

  1. Keep stage order: ingest -> refine -> synthesize -> enhance -> validate -> publish.
  2. Run deterministic gates with scripts, never with LLM self-certification.
  3. Preserve traceability in .pipeline/* artifacts.
  4. Keep learner-facing notes readable and sanitized.
  5. Treat validation status as PASS/FAIL source of truth.

Scripts

Use these scripts from scripts/:

  • ingest_zoom_captions.py – deterministic ingestion and segment ledger creation
  • run_chat_pipeline.py – guided orchestration for stage handoffs and validation
  • validate_coverage.py – hard-gate coverage validation
  • publish_tutorial_notes.py – learner-facing file naming and sanitization
  • merge_chunks.py – merge chunk outputs for large transcripts
  • run_colab_notebook_pipeline.py – AI/ML Colab appendix and code explainer pipeline
  • update_ai_notes_with_resources_and_colab.py – AI/ML notes enrichment utility
  • resource_enrichment.py – authenticated enrichment for Notion/Canva/Drive resources

Stage Workflow

Stage 0: Ingest (Deterministic)

Run:

python scripts/ingest_zoom_captions.py "<transcript_or_session_path>"

Required outputs:

  • .pipeline/segment_ledger.jsonl
  • .pipeline/segment_manifest.jsonl

Stage 1: Refine (Chat Stage)

Load references/stage1-refine.md.

Produce:

  • .pipeline/refined_transcript.md
  • .pipeline/topic_inventory.json
  • .pipeline/corrections_log.csv
  • .pipeline/uncertainty_report.json

Stage 2: Synthesize (Chat Stage)

Load references/stage2-synthesize.md.

Produce:

  • .pipeline/structured_notes.md
  • .pipeline/coverage_matrix.json

Stage 3: Enhance (Chat Stage)

Load:

  • references/stage3-enhance.md
  • references/tutorial-tech-bar-raiser.md

Produce:

  • .pipeline/enhanced_notes.md
  • final_notes.md
  • bootcamp_index.md

Stage 4: Validate (Deterministic)

Run:

python scripts/validate_coverage.py --pipeline-dir .pipeline

Validation guidance: references/stage4-validate.md.

Hard gates:

  1. Segment coverage accountability
  2. Uncertainty retention
  3. No orphan claims

Stage 5: Publish

Run:

python scripts/publish_tutorial_notes.py --root "<sessions_root>" --session-dir "<session_dir>"

Result:

  • Published tutorial filename in canonical format
  • Learner-safe note without noisy source tags
  • Updated course index links

One-Command Guided Mode

Use guided runner for chat-window workflows:

python scripts/run_chat_pipeline.py run "<transcript_or_session_path>" --deep-pass

This enforces required handoffs and deep quality gates.

Optional Resource Enrichment Stage

Run when class notes include external links (Notion/Canva/Drive):

python scripts/resource_enrichment.py --all-sessions

Single session:

python scripts/resource_enrichment.py --session-dir "<session_dir>"

Auth options:

  • Notion: NOTION_TOKEN_V2, NOTION_ACTIVE_USER
  • Canva: RESOURCE_PLAYWRIGHT_STORAGE_STATE

Reference: references/resource-enrichment-authenticated-flow.md.

Optional AI/ML Colab Enrichment

Run for Colab-backed AI/ML classes:

python scripts/run_colab_notebook_pipeline.py

Reference: references/colab-notebook-explainer-pipeline.md.

Large Transcript Handling

If input exceeds context comfort:

  1. Run Stage 1 by chunks.
  2. Merge chunk artifacts:
python scripts/merge_chunks.py --chunk-dirs "<chunkA/.pipeline>" "<chunkB/.pipeline>" --output-dir "<session/.pipeline>"
  1. Continue Stage 2 onward on merged artifacts.

Required Outputs Checklist

Learner-facing:

  • final_notes.md
  • <Domain> Class <NN> [DD-MM-YYYY] - <Topic>.md
  • bootcamp_index.md

Pipeline/audit:

  • .pipeline/segment_ledger.jsonl
  • .pipeline/segment_manifest.jsonl
  • .pipeline/refined_transcript.md
  • .pipeline/topic_inventory.json
  • .pipeline/corrections_log.csv
  • .pipeline/uncertainty_report.json
  • .pipeline/structured_notes.md
  • .pipeline/coverage_matrix.json
  • .pipeline/enhanced_notes.md
  • .pipeline/validation_report.md
  • .pipeline/exceptions.json (if fail)

Quality gates:

  • .pipeline/deep_pass_report.md (when --deep-pass)
  • .pipeline/deep_pass_exceptions.json (when --deep-pass)

Resource enrichment (optional):

  • .resources/resource_enrichment_report.json

Execution Rules

  • Fail fast on missing required artifacts.
  • Report missing outputs explicitly by file path.
  • Retry only from earliest failing stage.
  • Keep resource extraction status explicit (success/fallback/blocked).