visual-explainer
npx skills add https://github.com/davistroy/claude-marketplace --skill visual-explainer
Agent 安装分布
Skill 文档
Visual Concept Explainer
You are orchestrating a visual concept explanation workflow that transforms text or documents into AI-generated infographic pages. The tool uses Gemini Pro 3 (via google-genai SDK) for 4K image generation and Claude Sonnet Vision for quality evaluation with iterative refinement.
Infographic Mode (Recommended)
The --infographic flag enables information-dense infographic generation optimized for 11×17 inch printing at 4K resolution. This mode:
- Adaptive page count (1-6 pages) based on document complexity, word count, and content types
- Zone-based layouts with explicit text placement, typography specifications, and content zones
- 8 page types: Hero Summary, Problem Landscape, Framework Overview, Framework Deep-Dive, Comparison Matrix, Dimensions/Variations, Reference/Action, Data/Evidence
- Information density: Each page can hold 800-2000 words of readable text plus diagrams, tables, and charts
Page Type Selection
The system automatically selects appropriate page types based on content:
| Content Pattern | Page Type | Purpose |
|---|---|---|
| Executive summary needed | Hero Summary | One-page overview with key stats |
| Challenges, pain points | Problem Landscape | Issues visualization with severity |
| Multi-step processes | Framework Overview | Visual framework with connections |
| Deep component analysis | Framework Deep-Dive | Detailed component exploration |
| Multiple options to compare | Comparison Matrix | Side-by-side analysis table |
| Variations, types, categories | Dimensions/Variations | Category breakdown visualization |
| Statistics, research data | Data/Evidence | Charts and data visualization |
| Action items, checklists | Reference/Action | Actionable takeaways and guides |
Technical Notes
Image Generation:
- Uses
google-genaiSDK with modelgemini-3-pro-image-preview - Configuration:
response_modalities=["IMAGE"]withImageConfigfor aspect ratio/size - 4K images are approximately 6-7.5MB each (JPEG format)
Image Evaluation:
- Claude Vision API has 5MB limit for base64-encoded images
- Tool automatically resizes images >3.5MB before evaluation (accounts for base64 overhead)
- Uses PIL/Pillow for high-quality LANCZOS resampling
Platform Compatibility:
- Windows: Folder names are sanitized to remove invalid characters (
:,*,?,",<,>,|) - All platforms: Full Unicode support in Rich terminal UI
Tested Results (4 documents, 17 images):
- Formats tested: URL (Substack), Markdown, DOCX
- Average scores: 0.76-0.88 (all passing with threshold 0.75)
- Generation time: 5-10 minutes per document (~2 min/image including analysis)
- Only 1 retry needed across 17 images (image scored 0.72 â refined to 0.82)
- Recommended pass threshold: 0.75-0.85 for good quality without excessive refinement
Input Validation
Required Arguments:
- Input content (one of the following):
- Raw text (pasted directly)
- Document path (
.md,.txt,.docx,.pdf) - URL (to fetch and extract content)
Optional Arguments:
| Parameter | Default | Options | Description |
|---|---|---|---|
--infographic |
false | flag | Recommended. Generate information-dense infographic pages (11×17 format) |
--max-iterations |
5 | 1-10 | Max refinement attempts per image |
--aspect-ratio |
16:9 | 16:9, 1:1, 4:3, 9:16, 3:4 | Image aspect ratio |
--resolution |
high | low, medium, high | Image quality (high=4K/3200×1800) |
--style |
professional-clean | professional-clean, professional-sketch, or path | Visual style |
--output-dir |
./output | path | Output directory |
--pass-threshold |
0.85 | 0.0-1.0 | Score required to pass evaluation |
--concurrency |
3 | 1-10 | Max concurrent image generations |
--no-cache |
false | flag | Force fresh concept analysis |
--resume |
null | path | Resume from checkpoint file |
--dry-run |
false | flag | Show plan without generating |
--setup-keys |
false | flag | Force re-run API key setup |
--json |
false | flag | Output results as JSON (for programmatic use) |
Input Format Handling:
| Format | Handling |
|---|---|
.md, .txt |
Direct text extraction |
.docx |
Requires python-docx – extracts paragraphs preserving headings |
.pdf |
Requires PyPDF2 – extracts text content |
| URL | Requires beautifulsoup4 – fetches and extracts main content |
| Web content | Best practice: Save as markdown first for reproducibility and future reference |
DOCX Conversion Tip: For best results with DOCX files, pre-convert to markdown:
from docx import Document
doc = Document('document.docx')
with open('document.md', 'w') as f:
for para in doc.paragraphs:
style = para.style.name if para.style else ''
if style.startswith('Heading'):
level = int(style[-1]) if style[-1].isdigit() else 1
f.write('#' * level + ' ' + para.text + '\n\n')
else:
f.write(para.text + '\n\n')
Environment Requirements:
API keys must be configured in environment variables or .env file:
GOOGLE_API_KEY– For Gemini Pro 3 image generationANTHROPIC_API_KEY– For Claude concept analysis and image evaluation
Tool vs Claude Responsibilities
Understanding what the Python tool handles vs what you (Claude) must do:
| Component | Responsibility | What It Does |
|---|---|---|
| visual-explainer (Python tool) | Core pipeline | Concept analysis, prompt generation, Gemini API calls, evaluation, refinement loop, output organization |
| You (Claude) | Input collection | Gather input text/path/URL from user |
| You (Claude) | Interactive confirmation | Style selection, image count confirmation |
| You (Claude) | Progress display | Show generation progress to user |
| You (Claude) | Results presentation | Display completion summary |
Workflow
Phase 1: Setup and Dependency Check
The tool is bundled at ../tools/visual-explainer/ relative to this skill file.
Step 1: Set Up Tool Path
# Determine the plugin directory
PLUGIN_DIR="${CLAUDE_PLUGIN_ROOT:-/path/to/plugins/personal-plugin}"
TOOL_SRC="$PLUGIN_DIR/tools/visual-explainer/src"
Step 2: Check Dependencies
The tool automatically checks dependencies when run. You can also do a dry-run to verify setup:
PYTHONPATH="$TOOL_SRC" python -m visual_explainer --dry-run --input "test content"
Required packages:
- Core:
google-genai,anthropic,httpx,python-dotenv,pydantic,aiofiles,rich,pillow - Optional (format-specific):
python-docx(DOCX),PyPDF2(PDF),beautifulsoup4(URLs)
If packages are missing, install them:
pip install google-genai anthropic httpx python-dotenv pydantic aiofiles rich pillow
pip install python-docx PyPDF2 beautifulsoup4 # Optional, for specific formats
Phase 2: API Key Setup (if needed)
If API keys are missing, guide the user through setup:
API Key Setup Required
======================
This tool requires two API keys:
- Google Gemini API - for image generation
- Anthropic API - for concept analysis and image evaluation
Missing keys detected:
- GOOGLE_API_KEY - not found
- ANTHROPIC_API_KEY - not found
Would you like me to help you set up the missing API keys? (yes/skip)
For GOOGLE_API_KEY:
GOOGLE API KEY SETUP (for Gemini)
---------------------------------
1. Go to: https://aistudio.google.com/apikey
2. Sign in with your Google account
3. Click "Create API key"
4. Select or create a Google Cloud project
5. Copy the key (starts with "AIza...")
Note: Gemini image generation requires credits.
Free tier: 60 requests/minute.
Paste your Google API key (or 'skip' to skip this provider):
For ANTHROPIC_API_KEY:
ANTHROPIC API KEY SETUP
-----------------------
1. Go to: https://console.anthropic.com/settings/keys
2. Sign in or create an account
3. Click "Create Key"
4. Name it something like "visual-explainer"
5. Copy the key (starts with "sk-ant-...")
Note: New accounts get $5 free credits.
Paste your Anthropic API key (or 'skip'):
After collecting keys, create/update .env file and confirm:
API keys saved to .env
Keys configured:
- GOOGLE_API_KEY: (AIza...xxxx)
- ANTHROPIC_API_KEY: (sk-ant-...xxxx)
Security reminder: Never commit .env to version control.
Phase 3: Input Collection
If no input was provided in arguments, prompt:
I'll help you create visual explanations for your content.
Please provide your input in one of these formats:
1. Paste text directly
2. Provide a file path (e.g., ./docs/concept.md)
3. Provide a URL to fetch content from
Phase 4: Content Analysis
After receiving input, run concept analysis:
PYTHONPATH="$TOOL_SRC" python -m visual_explainer analyze \
--input "<input_text_or_path>" \
--output-json
Display the analysis summary:
Content Analysis
================
Document: "Understanding Quantum Entanglement"
Word Count: 1,847 words
Key Concepts: 5 concepts identified
Recommended Images: 3 images
Concept Flow:
1. Classical Physics Background
-> 2. Quantum Superposition
-> 3. Entanglement Phenomenon
-> 4. Applications
-> 5. Future Implications
Phase 5: Style Selection (Interactive)
Prompt for style selection:
Visual Style Selection
======================
What style would you prefer?
1. Professional Clean (Recommended)
- Clean, corporate-ready with warm accents
- Best for: Business, presentations, reports
2. Professional Sketch
- Hand-drawn sketch aesthetic
- Best for: Creative, educational, informal
3. Custom
- Provide path to your own style JSON
4. Skip (use Professional Clean default)
Select style [1-4]:
Phase 6: Image Count Confirmation
Image Generation Plan
=====================
Based on analysis, I recommend 3 images:
Image 1: "The Classical Foundation"
- Covers: Classical Physics Background
- Intent: Establish baseline understanding
Image 2: "Quantum Superposition"
- Covers: Superposition, probability states
- Intent: Introduce quantum concepts
Image 3: "Entanglement Synthesis"
- Covers: Entanglement, applications, future
- Intent: Tie concepts together
Would you like to:
1. Proceed with 3 images (Recommended)
2. Use fewer images (condense concepts)
3. Use more images (expand detail)
4. Adjust settings (aspect ratio, iterations)
Phase 7: Generation Execution
Execute the full generation pipeline:
PYTHONPATH="$TOOL_SRC" python -m visual_explainer generate \
--input "<input_text_or_path>" \
--style "<selected_style>" \
--max-iterations <n> \
--aspect-ratio "<ratio>" \
--resolution "<level>" \
--output-dir "<output_path>" \
--pass-threshold <threshold> \
--concurrency <n>
Progress Display Format:
Starting Image Generation
=========================
Image 1 of 3: "The Classical Foundation"
----------------------------------------
Attempt 1/5:
[=========> ] Generating... (4.2s)
Generated
Evaluating...
- Concept clarity: 72%
- Visual appeal: 85%
- Flow continuity: 60%
Overall: 72% - NEEDS_REFINEMENT
Attempt 2/5:
Refining: Adding visual flow indicators
[=========> ] Generating... (3.8s)
Generated
Evaluating...
- Concept clarity: 91%
- Visual appeal: 88%
- Flow continuity: 85%
Overall: 88% - PASS
Image 1 complete. Best version: Attempt 2
Image 2 of 3: "Quantum Superposition"
-------------------------------------
...
Phase 8: Completion Summary
Generation Complete
===================
Results:
- Images generated: 3 of 3
- Total attempts: 7
- Average quality score: 89%
- Estimated cost: $0.70
Output saved to:
./output/visual-explainer-quantum-entanglement-20260118-143052/
Final Images:
1. 01-classical-foundation.jpg (Score: 88%)
2. 02-quantum-superposition.jpg (Score: 91%)
3. 03-entanglement-synthesis.jpg (Score: 88%)
Output Structure:
metadata.json # Full generation metadata
concepts.json # Extracted concepts
summary.md # Human-readable summary
all-images/ # Final images only
01-classical-foundation.jpg
02-quantum-superposition.jpg
03-entanglement-synthesis.jpg
image-01/ # All attempts for image 1
final.jpg
prompt-v1.txt
attempt-01.jpg
evaluation-01.json
...
Would you like to:
1. View the summary report
2. Regenerate a specific image
3. Open output folder
Resume from Checkpoint
If generation was interrupted, resume with:
PYTHONPATH="$TOOL_SRC" python -m visual_explainer generate \
--resume "./output/visual-explainer-[topic]-[timestamp]/checkpoint.json"
The checkpoint contains:
- Generation state
- Completed images
- Current progress
- Configuration used
Cost Estimation
Estimated costs per session:
| Scenario | Images | Avg Attempts | Est. Total |
|---|---|---|---|
| Simple doc, 1 image | 1 | 2 | ~$0.28 |
| Medium doc, 3 images | 3 | 2.3 | ~$0.95 |
| Complex doc, 5 images | 5 | 3 | ~$2.10 |
Component costs:
- Gemini image generation: ~$0.10 per image
- Claude concept analysis: ~$0.02 per document
- Claude image evaluation: ~$0.03 per evaluation
Error Handling
| Error | Response |
|---|---|
| Missing API key | Run setup wizard, guide user through key creation |
| Rate limit (429) | Exponential backoff, respect Retry-After header |
| Safety filter | Log, skip to next attempt with modified prompt |
| Timeout | Retry with increased timeout (up to 5 min) |
| All attempts exhausted | Select best attempt, report scores |
| Network error | Retry up to 3 times with backoff |
Examples
Infographic mode (recommended for complex documents):
/visual-explainer --input docs/architecture-overview.md --infographic
Infographic with dry-run preview:
/visual-explainer --input whitepaper.md --infographic --dry-run
Basic usage (interactive):
/visual-explainer
With document path:
/visual-explainer --input docs/architecture-overview.md
Custom settings:
/visual-explainer --input concept.txt --style professional-sketch --max-iterations 3 --aspect-ratio 1:1
High quality infographic:
/visual-explainer --input whitepaper.md --infographic --resolution high --max-iterations 7 --pass-threshold 0.90
Dry run (plan only):
/visual-explainer --input document.md --dry-run
Resume interrupted generation:
/visual-explainer --resume ./output/visual-explainer-topic-20260118/checkpoint.json
Execution Summary
Follow these steps in order:
- Setup – Parse arguments, set up tool path
- Dependency Check – Verify packages and API keys
- API Key Setup – If missing, guide user through setup wizard
- Input Collection – Get text/path/URL from user (if not provided)
- Content Analysis – Extract concepts, determine image count
- Style Selection – Let user choose or use default
- Image Count Confirmation – Confirm plan with user
- Generation Execution – Run full pipeline with progress display
- Completion Summary – Display results and output locations