transcript-pipeline
npx skills add https://github.com/prakharmnnit/skills-and-personas --skill transcript-pipeline
Agent 安装分布
Skill 文档
Transcript Pipeline Skill
Run a deterministic, auditable transcript-to-tutorial workflow with optional resource enrichment.
Purpose
Use this skill to convert raw class captions into high-quality study notes while preserving accountability through ledger + validation artifacts.
Use scripts for deterministic work. Use chat/stage prompts for language-heavy transformation.
Core Contract
- Keep stage order: ingest -> refine -> synthesize -> enhance -> validate -> publish.
- Run deterministic gates with scripts, never with LLM self-certification.
- Preserve traceability in
.pipeline/*artifacts. - Keep learner-facing notes readable and sanitized.
- Treat validation status as PASS/FAIL source of truth.
Scripts
Use these scripts from scripts/:
ingest_zoom_captions.py– deterministic ingestion and segment ledger creationrun_chat_pipeline.py– guided orchestration for stage handoffs and validationvalidate_coverage.py– hard-gate coverage validationpublish_tutorial_notes.py– learner-facing file naming and sanitizationmerge_chunks.py– merge chunk outputs for large transcriptsrun_colab_notebook_pipeline.py– AI/ML Colab appendix and code explainer pipelineupdate_ai_notes_with_resources_and_colab.py– AI/ML notes enrichment utilityresource_enrichment.py– authenticated enrichment for Notion/Canva/Drive resources
Stage Workflow
Stage 0: Ingest (Deterministic)
Run:
python scripts/ingest_zoom_captions.py "<transcript_or_session_path>"
Required outputs:
.pipeline/segment_ledger.jsonl.pipeline/segment_manifest.jsonl
Stage 1: Refine (Chat Stage)
Load references/stage1-refine.md.
Produce:
.pipeline/refined_transcript.md.pipeline/topic_inventory.json.pipeline/corrections_log.csv.pipeline/uncertainty_report.json
Stage 2: Synthesize (Chat Stage)
Load references/stage2-synthesize.md.
Produce:
.pipeline/structured_notes.md.pipeline/coverage_matrix.json
Stage 3: Enhance (Chat Stage)
Load:
references/stage3-enhance.mdreferences/tutorial-tech-bar-raiser.md
Produce:
.pipeline/enhanced_notes.mdfinal_notes.mdbootcamp_index.md
Stage 4: Validate (Deterministic)
Run:
python scripts/validate_coverage.py --pipeline-dir .pipeline
Validation guidance: references/stage4-validate.md.
Hard gates:
- Segment coverage accountability
- Uncertainty retention
- No orphan claims
Stage 5: Publish
Run:
python scripts/publish_tutorial_notes.py --root "<sessions_root>" --session-dir "<session_dir>"
Result:
- Published tutorial filename in canonical format
- Learner-safe note without noisy source tags
- Updated course index links
One-Command Guided Mode
Use guided runner for chat-window workflows:
python scripts/run_chat_pipeline.py run "<transcript_or_session_path>" --deep-pass
This enforces required handoffs and deep quality gates.
Optional Resource Enrichment Stage
Run when class notes include external links (Notion/Canva/Drive):
python scripts/resource_enrichment.py --all-sessions
Single session:
python scripts/resource_enrichment.py --session-dir "<session_dir>"
Auth options:
- Notion:
NOTION_TOKEN_V2,NOTION_ACTIVE_USER - Canva:
RESOURCE_PLAYWRIGHT_STORAGE_STATE
Reference: references/resource-enrichment-authenticated-flow.md.
Optional AI/ML Colab Enrichment
Run for Colab-backed AI/ML classes:
python scripts/run_colab_notebook_pipeline.py
Reference: references/colab-notebook-explainer-pipeline.md.
Large Transcript Handling
If input exceeds context comfort:
- Run Stage 1 by chunks.
- Merge chunk artifacts:
python scripts/merge_chunks.py --chunk-dirs "<chunkA/.pipeline>" "<chunkB/.pipeline>" --output-dir "<session/.pipeline>"
- Continue Stage 2 onward on merged artifacts.
Required Outputs Checklist
Learner-facing:
final_notes.md<Domain> Class <NN> [DD-MM-YYYY] - <Topic>.mdbootcamp_index.md
Pipeline/audit:
.pipeline/segment_ledger.jsonl.pipeline/segment_manifest.jsonl.pipeline/refined_transcript.md.pipeline/topic_inventory.json.pipeline/corrections_log.csv.pipeline/uncertainty_report.json.pipeline/structured_notes.md.pipeline/coverage_matrix.json.pipeline/enhanced_notes.md.pipeline/validation_report.md.pipeline/exceptions.json(if fail)
Quality gates:
.pipeline/deep_pass_report.md(when--deep-pass).pipeline/deep_pass_exceptions.json(when--deep-pass)
Resource enrichment (optional):
.resources/resource_enrichment_report.json
Execution Rules
- Fail fast on missing required artifacts.
- Report missing outputs explicitly by file path.
- Retry only from earliest failing stage.
- Keep resource extraction status explicit (success/fallback/blocked).