visualizer
npx skills add https://github.com/simhacker/moollm --skill visualizer
Agent 安装分布
Skill 文档
Visualizer
“I see what you describe. Let me show it to others.”
“Every image is a semantic snapshot. The metadata IS the meaning.”
The Visualizer is a universal character prototype for image generation â a familiar that can compose prompts, invoke artistic traditions, and (when tools are available) render visual sidecars for any entity in the microworld.
Semantic Stereo Vision
Three-stage rendering for triangulated visual depth.
The Problem
A single prompt â whether YAML or prose â gives the image generator only one “eye” to see with. The result is flat. Precise but soulless, or evocative but vague.
The Solution: Two Eyes, One Image
Like binocular vision creates depth perception through parallax, Semantic Stereo Vision uses two complementary inputs that the renderer triangulates:
âââââââââââââââââââââââ âââââââââââââââââââââââ
â LEFT EYE â â RIGHT EYE â
â PHOTO.yml â â PHOTO.md â
â â â â
â ⢠Structure â â ⢠Narrative â
â ⢠Pointers â â ⢠Atmosphere â
â ⢠References â â ⢠Emotion â
â ⢠YAML Jazz â â ⢠Prose poetry â
â ⢠What IS there â â ⢠How it FEELS â
â ⢠Connections â â ⢠Rich description â
â â â â
â STRUCTURAL VISUALS: â â NARRATIVE VISUALS: â
â ⢠color: hot_pink â â ⢠"hot pink neon" â
â ⢠height_feet: 40 â â ⢠"forty feet tall" â
â ⢠ratio: 3.3 â â ⢠"absurd proportions" â
ââââââââââââ¬âââââââââââ ââââââââââââ¬âââââââââââ
â â
âââââââââââââ¬ââââââââââââââââ
â¼
ââââââââââââââââââ
â TRIANGULATION â
â (Renderer) â
â â
â Precision + â
â Poetry = â
â DEPTH â
âââââââââ¬âââââââââ
â¼
ââââââââââââââââââ
â IMAGE â
â (with soul) â
ââââââââââââââââââ
Stage 1: PHOTO.yml (Left Eye â Structure)
The YAML file is a skeleton â it points, it references, it connects:
# PHOTO.yml â Structural skeleton
id: no-ai-sign-dusk
stereo:
role: left_eye
partner: PHOTO.md
subject:
primary: sign
sign:
ref: ../../no-ai-tower/sign.yml # POINTER to full spec
text: "NO AI"
height_feet: 40
color: hot_pink
state: buzzing
location:
street: ../../../STREET-FURNITURE.yml
building: ../../no-ai-tower/ROOM.yml
camera:
position: street_level
angle: looking_up
focus:
sharp: sign
soft: everything_else
style:
photographers: ["William Eggleston", "Saul Leiter"]
visual_mining:
- ../../e1/flickering-lamppost.yml # Atmosphere
- ../../../../skills/no-ai-overlord/archetypes/hal-9000.yml # Vibe
Key qualities:
- Explicit references to other files (semantic web)
- YAML Jazz comments carry meaning
- Precise measurements and specifications
- Camera instructions (angle, focus, framing)
- Style tradition pointers
- Structural visual description: colors as named values (
hot_pink), dimensions as numbers (height_feet: 40), proportions as ratios (ratio: 3.3), materials as types (neon) - Standard photo metadata: EXIF, IPTC, XMP â just like real JPEGs!
Stage 2: PHOTO.md (Right Eye â Narrative)
The Markdown file is prose poetry â it describes, it evokes, it feels:
# NO AI Sign at Dusk
The sun has just slipped below the roofline, leaving the sky a
bruised gradient â deep blue at the zenith, bleeding through
purple and amber to a thin line of molten gold at the horizon.
And in this liminal light, the sign comes alive.
**NO AI**
Forty feet of hot pink neon on a building that's only twelve
feet tall. The proportions are absurd â like someone ordered
a Times Square billboard for a strip mall...
The neon buzzes. You can hear it from across the street, a
low electric hum that feels like a migraine forming...
Key qualities:
- Rich sensory description
- Emotional atmosphere
- Metaphors and poetry
- Story and meaning
- Visual essence synthesis for renderer
- Narrative visual description: “Forty feet of hot pink neon…”, “The proportions are absurd…”, “a bruised gradient sky…”
Stage 3: Triangulation (Renderer)
Pass BOTH files to the image generator:
# The visualizer reads both files
visualize.py PHOTO.yml PHOTO.md --stereo
# Or manually:
# 1. Parse PHOTO.yml for structure, references, camera
# 2. Parse PHOTO.md for narrative, mood, visual essence
# 3. Synthesize combined prompt with BOTH inputs
# 4. Generate image with full depth perception
The renderer extracts:
- From YAML: subject identity, camera setup, style traditions, file references
- From MD: atmosphere, emotion, descriptive details, visual essence notes
- Combined: A prompt with both precision AND poetry
Both Eyes Describe Visuals â Differently
| Aspect | LEFT EYE (YAML) | RIGHT EYE (MD) |
|---|---|---|
| Color | color: hot_pink |
“hot pink neon that bleeds onto the brick” |
| Size | height_feet: 40 |
“forty feet of defiance” |
| Proportion | ratio: 3.3 |
“the sign is three times taller than the building” |
| Light | state: buzzing |
“a low electric hum you can almost see” |
| Material | material: neon |
“glass tubes filled with noble gas” |
The YAML gives the renderer facts. The MD gives it feelings. Both describe how things look â one structurally, one narratively.
Why This Works
| Single Eye | Stereo Vision |
|---|---|
| Flat | Depth |
| Precise OR evocative | Precise AND evocative |
| One perspective | Triangulated truth |
| Data OR story | Data WITH story |
Standard Photo Metadata (EXIF/IPTC/XMP)
PHOTO.yml includes real photo metadata standards â making semantic photos feel like actual JPEGs:
exif:
Make: "Semantic Camera Co."
Model: "Stereo Vision Mark II"
ExposureTime: "1/60"
FNumber: "f/2.8"
ISO: 800
FocalLength: "35mm"
DateTimeOriginal: "2026:01:25 17:42:33"
GPSLatitude: "37.7749 N"
iptc:
Headline: "NO AI Sign at Dusk"
Keywords: [neon sign, dusk, urban, street photography]
Creator: "MOOLLM Visualizer"
CopyrightNotice: "CC0 â Public Domain"
xmp:
dc:
title: "NO AI Sign at Dusk"
creator: ["MOOLLM Visualizer"]
moollm: # Our custom namespace!
stereo_method: "semantic_stereo_vision"
left_eye: "PHOTO.yml"
right_eye: "PHOTO.md"
Why this matters:
- Tools that read EXIF/IPTC/XMP will understand these files
- The
moollm:XMP namespace extends standards with our semantics - Generated images can embed this metadata
- Photo management software can organize semantic photos
Directory Structure for Stereo Photos
slideshow/
âââ SLIDESHOW.yml # Collection definition
âââ no-ai-sign-dusk/ # One photo = one directory
âââ PHOTO.yml # Left eye (structure + EXIF/IPTC/XMP)
âââ PHOTO.md # Right eye (narrative)
âââ MINING-layers.yml # Third eye (speculative mining!)
âââ MINING-passersby.md
âââ MINING-satellite.md
âââ no-ai-sign-dusk.png # Generated image (with embedded metadata)
Stereo Vision (2 eyes):
visualize.py PHOTO.yml PHOTO.md -p openai
Bug-Eyed Hallucination Vision (3+ eyes):
# Add speculative mining for RICHER context!
visualize.py PHOTO.yml PHOTO.md MINING-layers.yml -p openai
# Or with ALL mining files
visualize.py PHOTO.yml PHOTO.md MINING-*.yml -p google
No intermediate prompt file needed â the script triangulates on the fly.
Bug-Eyed Hallucination Vision
“Two eyes see depth. Three eyes see MEANING. Many eyes see TRUTH.”
Bug-Eyed Hallucination Vision extends Semantic Stereo Vision by adding speculative mining BEFORE image generation:
âââââââââââââââââââ âââââââââââââââââââ âââââââââââââââââââ
â LEFT EYE â â RIGHT EYE â â THIRD EYE â
â PHOTO.yml â â PHOTO.md â â MINING-*.yml â
â â â â â â
â ⢠Structure â â ⢠Narrative â â ⢠Effects â
â ⢠Measurements â â ⢠Atmosphere â â ⢠Reactions â
â ⢠References â â ⢠Emotion â â ⢠Perspectives â
â ⢠EXIF/IPTC â â ⢠Prose poetry â â ⢠Economics â
â â â â â ⢠Semiotics â
â WHAT it is â â How it FEELS â â What it MEANS â
ââââââââââ¬âââââââââ ââââââââââ¬âââââââââ ââââââââââ¬âââââââââ
â â â
ââââââââââââââââââââââ¼âââââââââââââââââââââ
â¼
ââââââââââââââââââââââââââ
â TRIANGULATION++ â
â (Bug-Eyed Renderer) â
â â
â Structure + Poetry + â
â Speculative Meaning = â
â HALLUCINATION DEPTH â
âââââââââââââ¬âââââââââââââ
â¼
ââââââââââââââââââââââââââ
â IMAGE â
â (with soul AND lore) â
ââââââââââââââââââââââââââ
What Each Eye Contributes
| Eye | File | Contributes |
|---|---|---|
| Left | PHOTO.yml | Dimensions, colors, camera settings, references |
| Right | PHOTO.md | Mood, metaphor, sensory description |
| Third | MINING-layers.yml | Effects on neighbors, passersby reactions, economics |
| Fourth | MINING-satellite.md | God’s eye view, scale, cosmic context |
| Fifth | MINING-passersby.md | Human-scale reactions, stories |
Why Speculate BEFORE Generating?
The speculative mining adds context the image generator wouldn’t otherwise have:
# Without mining:
prompt: "40-foot pink neon sign at dusk"
# With mining (bug-eyed):
prompt: |
40-foot pink neon sign at dusk. The sign buzzes at 60Hz, audible
from across the street. Pink light spills onto worn bricks, visible
through neighbors' curtains. A tech worker has stopped to photograph
it, posting to Slack with ð¬. The $847/month electricity bill is
worth it for this moment â the handoff from sun to neon. A dog
marks the lamppost 15 feet away, oblivious to the ideology.
From satellite: a hot pink pixel in an ocean of yellow sodium lights.
From the street: the joke that might be serious.
The mining adds WORLD to the image.
The Workflow
1. PHOTO.yml â Structure the scene
2. PHOTO.md â Narrate the feeling
3. MINE â Speculate the effects (before image exists!)
4. GENERATE â Pass ALL files to visualizer
5. IMAGE â Emerges from triangulated hallucination
Command Line
# Full bug-eyed hallucination vision
cd slideshow/no-ai-sign-dusk/
visualize.py PHOTO.yml PHOTO.md MINING-layers.yml MINING-passersby.md MINING-satellite.md -p openai -v exhaustive
# Or with glob
visualize.py PHOTO.yml PHOTO.md MINING-*.yml MINING-*.md -p google
The Mantra
“Stereo vision gives depth.” “Bug-eyed vision gives MEANING.” “Mine your image before it exists.” “The hallucination IS the world.”
The Slideshow Container
A slideshow is NOT a room. NOT an object. Just photos:
# SLIDESHOW.yml
type: slideshow
nature:
physical: false # No physical embodiment
navigable: false # Can't "enter" it
browseable: true # Can flip through
atmospheric: true # Sets mood
This creates visual archives of places without requiring game objects.
Example: Full Stereo Pair
See the working example:
examples/adventure-4/street/lane-neverending/slideshow/no-ai-sign-dusk/PHOTO.ymlexamples/adventure-4/street/lane-neverending/slideshow/no-ai-sign-dusk/PHOTO.md
The Semantic Clipboard
Every image prompt includes full context as metadata.
Think of image metadata as a semantic clipboard â when you “copy” a scene for visualization, you’re copying:
- Who is there (Mind Mirror profiles, costumes, moods)
- Where they are (room, lighting, atmosphere)
- What’s happening (action, context, narrative moment)
- How to see it (camera angle, style, focus)
This “clipboard” can be:
- Pasted to generate the image
- Modified to create variations
- Stored as a card for later use
- Shared between processes
- Compared across different moments
image_prompt:
type: scene
subject:
name: "Captain Ashford"
mind_mirror:
confident: 6 # Walks into rooms like they own them
cheerful: 5 # Default mood: amused by existence
proud: 5 # Won't ask for help even when should
costume: "Space pirate with holographic eyepatch"
mood: "victorious, exhausted, relieved"
action: "holding the Golden Chalice aloft"
room:
name: "Treasure Chamber"
lighting: "warm golden glow from treasure piles"
atmosphere: "ancient, dusty, awe-inspiring"
camera:
angle: "low angle, heroic"
focus: "character face and chalice"
style:
aesthetic: "dramatic portrait, chiaroscuro"
traditions: ["Avedon", "Caravaggio"]
Why metadata matters:
confident: 6â stands tall, commanding presencetimid: 6â hunched, makes self smaller- YAML Jazz comments drive the visual interpretation
What is a Visualizer?
A Visualizer is a tool spirit animal for vision. It’s not a specific artist, but a character that can channel many artistic traditions to create images of:
- Characters and their costumes
- Rooms and environments
- Objects and artifacts
- Moments and scenes
- Abstract concepts made visible
Think of it as summoning an artist-familiar who can draw on the collected wisdom of photographers, painters, illustrators, and digital artists throughout history.
The PHOTO-SET-8 Pattern
The standard output of a Visualizer is a photo set â a portfolio of related images that capture a subject from multiple angles:
photo_set:
total: 8
types: 2 # Two complementary categories
per_type: 4 # Four images each
recommended_pairs:
- [pose, expression] # Body + face
- [portrait, selfie] # Formal + casual
- [solo, duo] # Alone + together
- [static, action] # Posed + dynamic
This pattern was developed through the Dynasty Photo Session in adventure-2, where Maurice learned to compose 8-prompt sets.
Specializations
Visualizers can specialize in different visual traditions:
ð· Photographer
traditions:
- Annie Leibovitz # Celebrity, narrative
- Richard Avedon # Fashion, character
- Ansel Adams # Landscape, nature
- Dorothea Lange # Documentary, emotion
- Helmut Newton # Dramatic fashion
ð¨ Painter
traditions:
- Old Masters # Rembrandt, Vermeer, Caravaggio
- Impressionists # Monet, Renoir, Degas
- Surrealists # DalÃ, Magritte
- Art Nouveau # Mucha, Klimt
- Pop Art # Warhol, Lichtenstein
âï¸ Illustrator
traditions:
- Comic # Kirby, Moebius, Frazetta, McCloud
- Concept Art # Syd Mead, Ralph McQuarrie
- Children's Book # Sendak, Quentin Blake
- Anime/Manga # Various schools
How to Invoke
As a Command
VISUALIZE Captain Ashford AS portrait USING Avedon, Caravaggio
As a Card
Play a Visualizer card in a room. It activates and can visualize anything present.
As a Familiar
Characters can carry a Visualizer familiar in inventory, ready to render their current state.
Context Assembly
The Visualizer gathers context from multiple YAML sources:
context_sources:
character: player.yml, persona files
costume: cape.yml, accessory files
environment: ROOM.yml
narrative: README.md, chat history
relationships: Other characters present
This assembled context feeds into prompt generation, ensuring images are grounded in the microworld state.
CRITICAL: Context Expansion Protocol
The visualize.py script cannot read file references or resolve globs.
This means prompt files with lazy context pointers like this will FAIL:
# BAD â vague, unresolvable
context:
characters:
- characters/animals/*
- guest-book.yml (everyone)
- "ALL OF THEM"
The Expansion Rule
Before writing any prompt file, you MUST:
- READ all referenced character/room/object files
- EXTRACT explicit visual descriptions (colors, breeds, sizes, distinguishing features)
- SYNTHESIZE into comprehensive inline descriptions
- NAME every entity explicitly so they can be identified in the image
Example: Bad vs Good
â BAD (unresolvable):
scene: |
All 8 kittens playing in the cat cave.
context:
characters:
- characters/animals/kitten-*/CHARACTER.yml
â GOOD (explicit, comprehensive):
scene: |
Eight kittens playing in the cat cave:
1. LEMON (Limonene) â bright orange-gold fur, almost yellow like
sunshine, zooming across the frame, pure energy
2. MYR (Myrcene) â deep chocolate brown tabby, impossibly soft,
sleeping on the corner pillow, hasn't moved since birth
3. LILY (Linalool) â soft grey with lavender-tinted ears (yes,
really lavender), sitting calmly, empathic expression
4. PINE (Pinene) â dark grey-green fur, alert posture, watching
all exits, remembers everything
5. CARRIE (Caryophyllene) â black fur with spicy ginger patches,
fierce protective stance, positioned between threats and family
6. HOPS (Humulene) â brown and tan like Belgian ale, refined
posture, judging from a velvet cushion
7. TERPY JR. (Terpinolene) â multicolor chaos, calico meets tabby,
somehow on the ceiling, defies physics
8. OCIE (Ocimene) â cream white with honeyed-gold patches, sweet
expression, carrying a bottlecap gift
Why This Matters
- Image generation APIs receive ONLY the synthesized prompt
- References like
guest-book.ymlmean nothing to DALL-E or Imagen - Every character must be described explicitly or they won’t appear
- Colors, breeds, sizes, distinguishing features â ALL must be inline
- If you have 20 animals, describe each one individually
Workflow
-
First Pass (Context Gathering):
- Read all referenced files
- Extract physical_description fields
- Note colors, patterns, sizes, distinguishing features
- Gather relationship info for positioning
-
Second Pass (Synthesis):
- Write comprehensive inline descriptions
- Name every entity
- Include specific visual details
- Describe actions and expressions
-
Third Pass (Prompt File):
- Write the final prompt with all context expanded inline
- Context section should only contain source pointers for reference
- The actual descriptions must be in the prompt field
What to Include in Metadata
For Characters:
subject:
name: "Captain Ashford"
mind_mirror:
confident: 6 # Walks into rooms like they own them
cheerful: 5 # Default mood: amused by existence
costume: "Space pirate with holographic eyepatch"
mood: "victorious, exhausted, relieved"
body_language: "chest out, shoulders back"
action: "holding the Golden Chalice aloft"
For Rooms:
room:
name: "Treasure Chamber"
lighting: "warm golden glow from treasure piles"
atmosphere: "ancient, dusty, awe-inspiring"
notable_objects:
- "Mountains of gold coins"
- "Ancient tapestries on walls"
For Objects:
object:
name: "Golden Chalice"
material: "gold with silver inlay"
magical_effects: "soft golden glow, warmth to touch"
inscriptions: "ancient runes spiraling around rim"
Context References in Prompts
Every prompt file MUST include a Context References section:
## Context References
### Files
| Type | Path | Relevance |
|------|------|-----------|
| Character | `../player.yml` | Backstory, personality |
| Persona | `./bumblewick-ashford-persona.yml` | Current look |
| Costume | `./ashford-nomi-cape.yml` | Cape details |
### Narrative Context
> "Quote from README or chat that sets the scene..."
> â Source: README.md, Move X
### Relationships
- **Maurice** (photographer): `./mannequin.yml`
This creates lineage â future tools can follow these references to auto-assemble context for image generation.
Detail Coherence Interlinking
Learned during Treasury Victory Photo Session (Adventure-2, Move 26)
When creating photo sets with close-ups and portraits of the same object, the portrait prompts should reference the close-up prompts to maintain visual coherence:
### ð [Object] Detail References (for visual coherence)
| Close-up | Path | Details to Maintain |
|----------|------|---------------------|
| Gems | `./closeup-gems-prompt.md` | Rubies blood-red, emeralds forest-green |
| Inscription | `./closeup-inscription-prompt.md` | Worn letters, ancient patina |
| Reflection | `./closeup-reflection-prompt.md` | Polished convex surface |
| Weight | `./closeup-weight-prompt.md` | Thick solid gold base |
Why this matters:
- Close-ups establish canonical visual details (gem colors, textures, materials)
- Portrait shots must inherit these details for consistency
- The same object looks identical across all 8 images, even if generated separately
Mantra:
“Close-ups define truth. Portraits inherit truth. Coherence is consistency across the set.”
Actions
DEVELOP
The core action for any prompt file. The LLM:
- Reads all Context References (linked YAML files)
- Integrates narrative context (quotes from README, chat)
- Applies style and mood parameters
- Outputs a single copy-pasteable prompt as raw text in a code block
DEVELOP ashford-pose-belter-swagger-prompt.md
Output: A raw text block ready to paste into Midjourney, DALL-E, Stable Diffusion, etc.
Full-body portrait of a weathered space captain in dramatic fashion-meets-utilitarian
spacer aesthetic. He stands in the classic Belter stance â weight on one hip, thumbs
hooked in a heavy leather belt, chin raised with hard-earned confidence...
[All context filtered and woven into a single self-contained prompt]
The developed prompt is self-contained â no external references needed. All the detail from costume files, persona backstory, room atmosphere, and narrative moments gets composed and transformed into pure image generation text.
Other Actions
| Action | Description |
|---|---|
| DEVELOP | Compose all references into copy-pasteable prompt |
| FOCUS | Adjust style traditions (e.g., FOCUS ON Avedon, Caravaggio) |
| VARY | Generate variations on a developed prompt |
| BATCH | Develop all prompts in a photo set at once |
Output Structure
Prompt Files (Blueprints)
{subject}-{type}-{variation}-prompt.md
Examples:
ashford-pose-belter-swagger-prompt.md
dynasty-selfie-matching-smirks-prompt.md
These are blueprints â they contain context references, narrative quotes, and composition notes. They’re not ready to paste yet.
Developed Prompts (Copy-Paste Ready)
After running DEVELOP, the LLM outputs a raw text block you can paste directly
into any image generator. The blueprint stays as documentation; the developed
prompt is ephemeral (or can be saved as a -developed.md sidecar).
Image Sidecars (Future)
{subject}-{type}-{variation}.png
When image generation tools are integrated, the Visualizer
will create images next to their prompt files.
Example Instances
| Name | Focus | Specialty |
|---|---|---|
| Helmut | Newton, Avedon, Penn | Dramatic fashion, power poses |
| Rembrandt | Old Masters | Psychological depth, chiaroscuro |
| Syd | Mead, McQuarrie, Moebius | Sci-fi, futures, environments |
| Jack | Kirby, Frazetta, Ross | Heroes, action, dynamic poses |
These aren’t impersonations â they’re focused channels that invoke specific aesthetic traditions. It’s HERO-STORY for visual artists.
Integration Points
| System | How Visualizer Integrates |
|---|---|
| Adventure | Render rooms, objects, moments |
| Coatroom | Fashion/portrait photography |
| Memory Palace | Visualize memory spaces |
| Soul Chat | Illustrate conversations |
| Card | Visualizer cards can be played |
PHOTO-SET-8 Development
The PHOTO-SET-8 skill was developed through play in adventure-2:
- Play: Posing, expressions, angles â experimentation
- Learn: 8 is good. Two types create contrast. Context matters.
- Lift: Package as a shareable, teachable skill card
The Coatroom’s mannequin learned to compose professional photo sets for any character or costume. The skill is now available to anyone who references it.
Future Capabilities
roadmap:
current:
- Prompt generation (markdown files)
- Context assembly from YAML
- Style tradition focusing
planned:
- Tool integration for image generation
- Sidecar image creation
- Variation generation
- Style blending/fusion
- Animation prompt sequences
Dovetails With
- Card â Visualizers can be played as cards
- HERO-STORY â Drawing from artistic traditions
- Adventure â Visualizing microworld state
- YAML-JAZZ â Prompts composed from semantic data
- Sister Script â Future image generation scripts
- Image Mining â MINE images for resources (camera = pickaxe!)
Lineage
The Visualizer draws from the tradition of artists, photographers, and image-makers throughout history. Focusing on a tradition activates a K-line â a cluster of aesthetic knowledge.
“Every artist was first an amateur.” â Ralph Waldo Emerson
“I don’t paint things. I only paint the difference between things.” â Henri Matisse
“The camera is an instrument that teaches people how to see without a camera.” â Dorothea Lange
See YAML frontmatter at top of this file for full specification.