image-generator

📁 dair-ai/dair-academy-plugins 📅 Jan 1, 1970

总安装量

周安装量

安装命令

npx skills add https://github.com/dair-ai/dair-academy-plugins --skill Image Generator

Skill 文档

Image Generator

This skill generates and edits images using Google’s Gemini Nano Banana Pro model (gemini-3-pro-image-preview).

IMPORTANT: Setup Required

Before using this skill, the user must set the GEMINI_API_KEY environment variable:

Get a free API key from Google AI Studio
Export the key in your shell profile (~/.zshrc, ~/.bashrc, etc.):
```
export GEMINI_API_KEY="your_api_key_here"
```
Restart your terminal or run source ~/.zshrc (or ~/.bashrc)

The skill will not work without this configuration.

Pre-flight Check

Before making any API call, verify the key is set:

if [ -z "$GEMINI_API_KEY" ]; then
  echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile."
  exit 1
fi

If the key is missing, stop and tell the user to set it using the instructions above.

Configuration

Model: gemini-3-pro-image-preview

API Key: Read from the GEMINI_API_KEY environment variable

Iterating on User-Provided Images

When the user provides a path to an image they want to edit or iterate on, use this workflow:

Step 1: Read and encode the image to base64

# Get the image path from user
IMG_PATH="/path/to/user/image.png"

# Detect mime type
if [[ "$IMG_PATH" == *.png ]]; then
    MIME_TYPE="image/png"
elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then
    MIME_TYPE="image/jpeg"
elif [[ "$IMG_PATH" == *.webp ]]; then
    MIME_TYPE="image/webp"
else
    MIME_TYPE="image/png"
fi

# Encode to base64 (works on both macOS and Linux)
if [[ "$(uname)" == "Darwin" ]]; then
    IMG_BASE64=$(base64 -i "$IMG_PATH")
else
    IMG_BASE64=$(base64 -w0 "$IMG_PATH")
fi

Step 2: Send image with edit prompt (File-Based Approach)

IMPORTANT: Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause “argument list too long” errors.

# User's edit request
EDIT_PROMPT="Add a santa hat to the person in this image"

# Write request to a JSON file (avoids command line length limits)
cat > /tmp/gemini_request.json << JSONEOF
{
  "contents": [{
    "parts": [
      {"text": "$EDIT_PROMPT"},
      {
        "inline_data": {
          "mime_type": "$MIME_TYPE",
          "data": "$IMG_BASE64"
        }
      }
    ]
  }],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"]
  }
}
JSONEOF

# Call the API using the file
curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/gemini_request.json > /tmp/gemini_response.json

Step 3: Extract and save the edited image

# Extract image from response and save
python3 -c "
import json
import base64

with open('/tmp/gemini_response.json') as f:
    data = json.load(f)

for part in data['candidates'][0]['content']['parts']:
    if 'inlineData' in part:
        img_data = part['inlineData']['data']
        mime = part['inlineData']['mimeType']
        ext = 'png' if 'png' in mime else 'jpg'
        with open('edited_image.' + ext, 'wb') as out:
            out.write(base64.b64decode(img_data))
        print(f'Saved: edited_image.{ext}')
    elif 'text' in part:
        print(part['text'])
"

Complete Example (File-Based)

For iterating on images, always use file-based requests:

# Variables
IMG_PATH="/path/to/image.png"
EDIT_PROMPT="Make the background a sunset beach"
OUTPUT_PATH="edited_output.png"
# Detect mime type and encode
MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg")
IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH")

# Write request to file (required - base64 images are too large for command line)
cat > /tmp/gemini_request.json << JSONEOF
{
  "contents": [{
    "parts": [
      {"text": "$EDIT_PROMPT"},
      {"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}}
    ]
  }],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"]
  }
}
JSONEOF

# Call API and extract image
curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/gemini_request.json > /tmp/gemini_response.json

# Save the output image
python3 -c "
import json, base64
with open('/tmp/gemini_response.json') as f:
    data = json.load(f)
for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []):
    if 'inlineData' in part:
        with open('$OUTPUT_PATH', 'wb') as f:
            f.write(base64.b64decode(part['inlineData']['data']))
        print('Saved: $OUTPUT_PATH')
"

Multi-Image Input (Combine/Compose)

To combine elements from multiple images (also uses file-based approach):

IMG1_PATH="/path/to/image1.png"
IMG2_PATH="/path/to/image2.png"
PROMPT="Put the dress from the first image on the person in the second image"
IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH")
IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH")

# Write request to file
cat > /tmp/gemini_request.json << JSONEOF
{
  "contents": [{
    "parts": [
      {"text": "$PROMPT"},
      {"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}},
      {"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}}
    ]
  }],
  "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
JSONEOF

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/gemini_request.json > /tmp/gemini_response.json

Capabilities

Text-to-Image Generation

Generate high-quality images from text descriptions
Support for photorealistic, stylized, and artistic outputs
Accurate text rendering in images (logos, infographics, diagrams)

Image Editing

Add or remove elements from images
Inpainting with semantic masking (edit specific parts)
Style transfer (apply artistic styles to photos)
Multi-image composition (combine elements from multiple images)

Advanced Features

High Resolution: 1K, 2K, or 4K output
Aspect Ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Google Search Grounding: Generate images based on real-time data
Multi-turn Editing: Iteratively refine images through conversation
Up to 14 Reference Images: Combine multiple inputs for complex compositions

API Usage

Basic Text-to-Image (Python)

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Your prompt here"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",  # Optional
            image_size="2K"       # Optional: "1K", "2K", "4K"
        )
    )
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")

Basic Text-to-Image (JavaScript)

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

const ai = new GoogleGenAI({});

const response = await ai.models.generateContent({
    model: "gemini-3-pro-image-preview",
    contents: "Your prompt here",
    config: {
        responseModalities: ['TEXT', 'IMAGE'],
        imageConfig: {
            aspectRatio: "16:9",
            imageSize: "2K"
        }
    }
});

for (const part of response.candidates[0].content.parts) {
    if (part.text) {
        console.log(part.text);
    } else if (part.inlineData) {
        const buffer = Buffer.from(part.inlineData.data, "base64");
        fs.writeFileSync("generated_image.png", buffer);
    }
}

REST API (curl)

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Your prompt here"}]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png

Image Editing (with input image)

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client()

input_image = Image.open('input.png')
prompt = "Add a wizard hat to the cat in this image"

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt, input_image],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.parts:
    if part.inline_data is not None:
        image = part.as_image()
        image.save("edited_image.png")

Multi-Image Composition

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client()

image1 = Image.open('dress.png')
image2 = Image.open('model.png')
prompt = "Put the dress from the first image on the model from the second image"

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[image1, image2, prompt],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="3:4",
            image_size="2K"
        )
    )
)

With Google Search Grounding

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Visualize the current weather forecast for San Francisco",
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(aspect_ratio="16:9"),
        tools=[{"google_search": {}}]
    )
)

Prompting Best Practices

1. Be Descriptive, Not Keyword-Based

Instead of: cat, wizard hat, cute Write: A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window

2. Specify Style and Mood

Photography terms: “shot with 85mm lens”, “soft bokeh background”, “golden hour lighting”
Artistic styles: “in the style of Van Gogh”, “minimalist illustration”, “photorealistic”
Mood: “warm and cozy atmosphere”, “dramatic noir lighting”

3. For Text in Images

Be explicit about:

The exact text to render
Font style (descriptively): “clean, bold, sans-serif font”
Placement and size

4. For Editing

Describe what to change and what to preserve
Use “keep everything else unchanged”
Reference specific elements clearly

5. For Product/Commercial Images

Mention:

Lighting setup: “three-point softbox lighting”
Background: “clean white studio background”
Camera angle: “slightly elevated 45-degree shot”

Resolution and Aspect Ratio Reference

Aspect Ratio	1K Resolution	2K Resolution	4K Resolution
1:1	1024×1024	2048×2048	4096×4096
16:9	1376×768	2752×1536	5504×3072
9:16	768×1376	1536×2752	3072×5504
3:2	1264×848	2528×1696	5056×3392
2:3	848×1264	1696×2528	3392×5056

Common Use Cases

Logo Creation

Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'.
The text should be in a clean, bold, sans-serif font.
Black and white color scheme. Put the logo in a circle.

Product Photography

A high-resolution, studio-lit product photograph of a minimalist ceramic
coffee mug in matte black on a polished concrete surface. Three-point
softbox lighting with soft, diffused highlights. Slightly elevated
45-degree camera angle. Sharp focus on steam rising from the coffee.

Style Transfer

Transform this photograph of a city street at night into Vincent van Gogh's
'Starry Night' style. Preserve the composition but render with swirling,
impasto brushstrokes and deep blues with bright yellows.

Infographic

Create a vibrant infographic explaining photosynthesis as a recipe.
Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy).
Style like a colorful kids' cookbook, suitable for 4th graders.

Error Handling

Common issues:

No image returned: Check that response_modalities includes 'IMAGE'
Safety filters: Some prompts may be blocked; try rephrasing
Rate limits: Implement exponential backoff for retries
Large images: For 4K, ensure sufficient timeout settings

Dependencies

To use the Python SDK:

pip install google-genai pillow

For JavaScript:

npm install @google/genai

Important Notes

All generated images include a SynthID watermark
The model uses a “thinking” process for complex prompts
For best text rendering, generate text first, then request image with that text
Images are not stored by the API – save outputs locally

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台