inworld

📁 itechmeat/llm-code 📅 Jan 29, 2026

总安装量

周安装量

#19860

全站排名

安装命令

npx skills add https://github.com/itechmeat/llm-code --skill inworld

Agent 安装分布

opencode 6

github-copilot 6

codex 6

cursor 6

gemini-cli 5

Skill 文档

Inworld AI

Text-to-Speech platform with voice cloning, audio markups, and timestamp alignment.

Quick Navigation

Topic	Reference
Installation	installation.md
Voice Cloning	cloning.md
Voice Control	voice-control.md
API Reference	api.md

When to Use

Text-to-speech audio generation
Voice cloning from 5-15 seconds of audio
Emotion-controlled speech ([happy], [sad], etc.)
Word/phoneme timestamps for lip sync
Custom pronunciation with IPA

Models

Model	ID	Latency	Price
TTS 1.5 Max	`inworld-tts-1.5-max`	~200ms	$10/1M chars
TTS 1.5 Mini	`inworld-tts-1.5-mini`	~120ms	$5/1M chars

Minimal Example

import requests, base64, os

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
    json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])

Key Features

15 languages â en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
Instant cloning â 5-15 seconds audio, no training
Audio markups â [happy], [laughing], [sigh] (English only)
Timestamps â word, phoneme, viseme timing for lip sync
Streaming â /voice:stream endpoint

Prohibitions

Audio markups work only in English
Use ONE emotion markup at text beginning
Match voice language to text language
Instant cloning may not work for children’s voices or unique accents