image-generator

📁 dair-ai/dair-academy-plugins 📅 Jan 1, 1970
0
总安装量
0
周安装量
安装命令
npx skills add https://github.com/dair-ai/dair-academy-plugins --skill Image Generator

Skill 文档

Image Generator

This skill generates and edits images using Google’s Gemini Nano Banana Pro model (gemini-3-pro-image-preview).

IMPORTANT: Setup Required

Before using this skill, the user must set the GEMINI_API_KEY environment variable:

  1. Get a free API key from Google AI Studio
  2. Export the key in your shell profile (~/.zshrc, ~/.bashrc, etc.):
    export GEMINI_API_KEY="your_api_key_here"
    
  3. Restart your terminal or run source ~/.zshrc (or ~/.bashrc)

The skill will not work without this configuration.

Pre-flight Check

Before making any API call, verify the key is set:

if [ -z "$GEMINI_API_KEY" ]; then
  echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile."
  exit 1
fi

If the key is missing, stop and tell the user to set it using the instructions above.

Configuration

Model: gemini-3-pro-image-preview

API Key: Read from the GEMINI_API_KEY environment variable

Iterating on User-Provided Images

When the user provides a path to an image they want to edit or iterate on, use this workflow:

Step 1: Read and encode the image to base64

# Get the image path from user
IMG_PATH="/path/to/user/image.png"

# Detect mime type
if [[ "$IMG_PATH" == *.png ]]; then
    MIME_TYPE="image/png"
elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then
    MIME_TYPE="image/jpeg"
elif [[ "$IMG_PATH" == *.webp ]]; then
    MIME_TYPE="image/webp"
else
    MIME_TYPE="image/png"
fi

# Encode to base64 (works on both macOS and Linux)
if [[ "$(uname)" == "Darwin" ]]; then
    IMG_BASE64=$(base64 -i "$IMG_PATH")
else
    IMG_BASE64=$(base64 -w0 "$IMG_PATH")
fi

Step 2: Send image with edit prompt (File-Based Approach)

IMPORTANT: Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause “argument list too long” errors.

# User's edit request
EDIT_PROMPT="Add a santa hat to the person in this image"

# Write request to a JSON file (avoids command line length limits)
cat > /tmp/gemini_request.json << JSONEOF
{
  "contents": [{
    "parts": [
      {"text": "$EDIT_PROMPT"},
      {
        "inline_data": {
          "mime_type": "$MIME_TYPE",
          "data": "$IMG_BASE64"
        }
      }
    ]
  }],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"]
  }
}
JSONEOF

# Call the API using the file
curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/gemini_request.json > /tmp/gemini_response.json

Step 3: Extract and save the edited image

# Extract image from response and save
python3 -c "
import json
import base64

with open('/tmp/gemini_response.json') as f:
    data = json.load(f)

for part in data['candidates'][0]['content']['parts']:
    if 'inlineData' in part:
        img_data = part['inlineData']['data']
        mime = part['inlineData']['mimeType']
        ext = 'png' if 'png' in mime else 'jpg'
        with open('edited_image.' + ext, 'wb') as out:
            out.write(base64.b64decode(img_data))
        print(f'Saved: edited_image.{ext}')
    elif 'text' in part:
        print(part['text'])
"

Complete Example (File-Based)

For iterating on images, always use file-based requests:

# Variables
IMG_PATH="/path/to/image.png"
EDIT_PROMPT="Make the background a sunset beach"
OUTPUT_PATH="edited_output.png"
# Detect mime type and encode
MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg")
IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH")

# Write request to file (required - base64 images are too large for command line)
cat > /tmp/gemini_request.json << JSONEOF
{
  "contents": [{
    "parts": [
      {"text": "$EDIT_PROMPT"},
      {"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}}
    ]
  }],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"]
  }
}
JSONEOF

# Call API and extract image
curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/gemini_request.json > /tmp/gemini_response.json

# Save the output image
python3 -c "
import json, base64
with open('/tmp/gemini_response.json') as f:
    data = json.load(f)
for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []):
    if 'inlineData' in part:
        with open('$OUTPUT_PATH', 'wb') as f:
            f.write(base64.b64decode(part['inlineData']['data']))
        print('Saved: $OUTPUT_PATH')
"

Multi-Image Input (Combine/Compose)

To combine elements from multiple images (also uses file-based approach):

IMG1_PATH="/path/to/image1.png"
IMG2_PATH="/path/to/image2.png"
PROMPT="Put the dress from the first image on the person in the second image"
IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH")
IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH")

# Write request to file
cat > /tmp/gemini_request.json << JSONEOF
{
  "contents": [{
    "parts": [
      {"text": "$PROMPT"},
      {"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}},
      {"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}}
    ]
  }],
  "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
JSONEOF

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/gemini_request.json > /tmp/gemini_response.json

Capabilities

Text-to-Image Generation

  • Generate high-quality images from text descriptions
  • Support for photorealistic, stylized, and artistic outputs
  • Accurate text rendering in images (logos, infographics, diagrams)

Image Editing

  • Add or remove elements from images
  • Inpainting with semantic masking (edit specific parts)
  • Style transfer (apply artistic styles to photos)
  • Multi-image composition (combine elements from multiple images)

Advanced Features

  • High Resolution: 1K, 2K, or 4K output
  • Aspect Ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
  • Google Search Grounding: Generate images based on real-time data
  • Multi-turn Editing: Iteratively refine images through conversation
  • Up to 14 Reference Images: Combine multiple inputs for complex compositions

API Usage

Basic Text-to-Image (Python)

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Your prompt here"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",  # Optional
            image_size="2K"       # Optional: "1K", "2K", "4K"
        )
    )
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")

Basic Text-to-Image (JavaScript)

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

const ai = new GoogleGenAI({});

const response = await ai.models.generateContent({
    model: "gemini-3-pro-image-preview",
    contents: "Your prompt here",
    config: {
        responseModalities: ['TEXT', 'IMAGE'],
        imageConfig: {
            aspectRatio: "16:9",
            imageSize: "2K"
        }
    }
});

for (const part of response.candidates[0].content.parts) {
    if (part.text) {
        console.log(part.text);
    } else if (part.inlineData) {
        const buffer = Buffer.from(part.inlineData.data, "base64");
        fs.writeFileSync("generated_image.png", buffer);
    }
}

REST API (curl)

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Your prompt here"}]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png

Image Editing (with input image)

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client()

input_image = Image.open('input.png')
prompt = "Add a wizard hat to the cat in this image"

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt, input_image],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.parts:
    if part.inline_data is not None:
        image = part.as_image()
        image.save("edited_image.png")

Multi-Image Composition

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client()

image1 = Image.open('dress.png')
image2 = Image.open('model.png')
prompt = "Put the dress from the first image on the model from the second image"

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[image1, image2, prompt],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="3:4",
            image_size="2K"
        )
    )
)

With Google Search Grounding

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Visualize the current weather forecast for San Francisco",
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(aspect_ratio="16:9"),
        tools=[{"google_search": {}}]
    )
)

Prompting Best Practices

1. Be Descriptive, Not Keyword-Based

Instead of: cat, wizard hat, cute Write: A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window

2. Specify Style and Mood

  • Photography terms: “shot with 85mm lens”, “soft bokeh background”, “golden hour lighting”
  • Artistic styles: “in the style of Van Gogh”, “minimalist illustration”, “photorealistic”
  • Mood: “warm and cozy atmosphere”, “dramatic noir lighting”

3. For Text in Images

Be explicit about:

  • The exact text to render
  • Font style (descriptively): “clean, bold, sans-serif font”
  • Placement and size

4. For Editing

  • Describe what to change and what to preserve
  • Use “keep everything else unchanged”
  • Reference specific elements clearly

5. For Product/Commercial Images

Mention:

  • Lighting setup: “three-point softbox lighting”
  • Background: “clean white studio background”
  • Camera angle: “slightly elevated 45-degree shot”

Resolution and Aspect Ratio Reference

Aspect Ratio 1K Resolution 2K Resolution 4K Resolution
1:1 1024×1024 2048×2048 4096×4096
16:9 1376×768 2752×1536 5504×3072
9:16 768×1376 1536×2752 3072×5504
3:2 1264×848 2528×1696 5056×3392
2:3 848×1264 1696×2528 3392×5056

Common Use Cases

Logo Creation

Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'.
The text should be in a clean, bold, sans-serif font.
Black and white color scheme. Put the logo in a circle.

Product Photography

A high-resolution, studio-lit product photograph of a minimalist ceramic
coffee mug in matte black on a polished concrete surface. Three-point
softbox lighting with soft, diffused highlights. Slightly elevated
45-degree camera angle. Sharp focus on steam rising from the coffee.

Style Transfer

Transform this photograph of a city street at night into Vincent van Gogh's
'Starry Night' style. Preserve the composition but render with swirling,
impasto brushstrokes and deep blues with bright yellows.

Infographic

Create a vibrant infographic explaining photosynthesis as a recipe.
Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy).
Style like a colorful kids' cookbook, suitable for 4th graders.

Error Handling

Common issues:

  • No image returned: Check that response_modalities includes 'IMAGE'
  • Safety filters: Some prompts may be blocked; try rephrasing
  • Rate limits: Implement exponential backoff for retries
  • Large images: For 4K, ensure sufficient timeout settings

Dependencies

To use the Python SDK:

pip install google-genai pillow

For JavaScript:

npm install @google/genai

Important Notes

  • All generated images include a SynthID watermark
  • The model uses a “thinking” process for complex prompts
  • For best text rendering, generate text first, then request image with that text
  • Images are not stored by the API – save outputs locally