inworld

📁 itechmeat/llm-code 📅 Jan 29, 2026
0
总安装量
8
周安装量
安装命令
npx skills add https://github.com/itechmeat/llm-code --skill inworld

Agent 安装分布

opencode 6
github-copilot 6
codex 6
cursor 6
gemini-cli 5

Skill 文档

Inworld AI

Text-to-Speech platform with voice cloning, audio markups, and timestamp alignment.

Quick Navigation

Topic Reference
Installation installation.md
Voice Cloning cloning.md
Voice Control voice-control.md
API Reference api.md

When to Use

  • Text-to-speech audio generation
  • Voice cloning from 5-15 seconds of audio
  • Emotion-controlled speech ([happy], [sad], etc.)
  • Word/phoneme timestamps for lip sync
  • Custom pronunciation with IPA

Models

Model ID Latency Price
TTS 1.5 Max inworld-tts-1.5-max ~200ms $10/1M chars
TTS 1.5 Mini inworld-tts-1.5-mini ~120ms $5/1M chars

Minimal Example

import requests, base64, os

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
    json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])

Key Features

  • 15 languages — en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
  • Instant cloning — 5-15 seconds audio, no training
  • Audio markups — [happy], [laughing], [sigh] (English only)
  • Timestamps — word, phoneme, viseme timing for lip sync
  • Streaming — /voice:stream endpoint

Prohibitions

  • Audio markups work only in English
  • Use ONE emotion markup at text beginning
  • Match voice language to text language
  • Instant cloning may not work for children’s voices or unique accents

Links