voice-localization
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill voice-localization
Agent 安装分布
Skill 文档
AI Voice Localization
Scale your brand voice across multiple languages using AI voice synthesis, maintaining consistent character and quality for global content.
When to Use This Skill
- Expanding video content to new language markets
- Creating multilingual courses or training
- Localizing ads and marketing videos
- Dubbing existing content for international audiences
- Building consistent global brand voice
- Deciding between dubbing vs. subtitles
Methodology Foundation
Source: ElevenLabs Multilingual + Global Content Best Practices
Core Principle: True localization means the same perceived person speaks each language nativelyânot a translated voice, but a voice that sounds local while maintaining brand character. AI voice synthesis enables this at scale by preserving voice identity while adapting pronunciation and rhythm to each language.
Why This Matters: Global content traditionally required separate voice actors per language, losing brand consistency. AI voice localization maintains the same “person” across 29+ languages, creating unified brand experience worldwide while reducing production costs 70-90%.
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |
What This Skill Does
- Maintains voice identity across languages – Same character, different language
- Handles cultural adaptation – Beyond translation to localization
- Manages multilingual production – Efficient workflows for many languages
- Ensures quality per market – Native speaker validation
- Calculates ROI – Traditional dubbing vs. AI localization costs
How to Use
Plan Localization Project
Help me plan voice localization for [content].
Source language: [original]
Target languages: [list]
Content type: [video/audio/course]
Volume: [duration/number of assets]
Evaluate Localization Approach
Should I use AI voice localization or traditional dubbing?
Content: [describe]
Markets: [target countries]
Budget: [range]
Timeline: [deadline]
Instructions
When localizing voice content, follow this methodology:
Step 1: Assess Localization Needs
Determine the right approach for your content.
## Localization Decision Matrix
### When to Use AI Voice Localization
â Same brand voice needed across markets
â Frequent content updates (efficiency matters)
â Educational/informational content
â Budget constraints
â Quick turnaround needed
â 5+ languages needed
### When to Use Traditional Dubbing
â Character-driven content (emotions critical)
â One-time major production
â Markets expect dubbed content (Germany, France)
â Complex lip-sync requirements
â Budget allows $1,000+ per language
### When to Use Subtitles Instead
â Documentary/interview content
â Authenticity of original voice matters
â Lowest budget option
â Markets prefer subtitles (Nordics, Netherlands)
â Legal/compliance content (exact words matter)
### Hybrid Approach
Hero content â Traditional dubbing
Supporting content â AI localization
Supplementary â Subtitles
Step 2: Select Languages Strategically
Prioritize languages based on market opportunity.
## Language Prioritization Framework
### Tier 1: High Volume Languages (1B+ speakers)
| Language | Global Speakers | Key Markets |
|----------|----------------|-------------|
| English | 1.5B | Global |
| Mandarin | 1.1B | China |
| Spanish | 550M | LATAM, Spain |
| Hindi | 600M | India |
### Tier 2: High Value Languages
| Language | Economic Value | Markets |
|----------|---------------|---------|
| German | High GDP | DACH |
| French | Colonial reach | France, Africa |
| Japanese | High spending | Japan |
| Portuguese | Large market | Brazil |
### Tier 3: Strategic Languages
| Language | Strategic Value | Markets |
|----------|----------------|---------|
| Arabic | Growing middle class | MENA |
| Korean | Tech-forward | South Korea |
| Italian | Fashion/luxury | Italy |
| Dutch | High English | Benelux |
### ElevenLabs Supported Languages (29+)
English, Spanish, French, German, Italian, Portuguese,
Polish, Dutch, Hindi, Arabic, Chinese, Japanese, Korean,
Turkish, Swedish, Indonesian, Filipino, Malay, Russian,
Czech, Danish, Finnish, Greek, Romanian, Ukrainian,
Vietnamese, Norwegian, Hungarian, Tamil, and more.
Step 3: Prepare Content for Localization
Translation alone isn’t enoughâprepare for voice adaptation.
## Content Preparation Checklist
### Script Adaptation
**Text expansion/contraction**:
| Language | vs English |
|----------|-----------|
| German | +30% longer |
| French | +15-20% longer |
| Spanish | +15-25% longer |
| Chinese | -30% shorter |
| Japanese | Variable |
**Implications**:
- Video may need re-timing
- Allow flexibility in pacing
- Consider sentence splitting for longer languages
**Localization notes to provide**:
â¡ Brand terms (don't translate, keep English)
â¡ Product names (pronunciation guide)
â¡ Numbers (format varies by locale)
â¡ Dates (format varies by locale)
â¡ Currency (localize amounts)
â¡ Cultural references (adapt or explain)
### Voice Consistency Notes
**Preserve across languages**:
- Character/personality
- Energy level
- Authority/warmth balance
- Pace relative to content
**Adapt per language**:
- Natural rhythm and cadence
- Pronunciation of brand terms
- Formal/informal register (varies by culture)
Step 4: Production Workflow
Efficient process for multilingual voice production.
## Multilingual Production Pipeline
### Phase 1: Source Production
1. Finalize English script
2. Record/generate English voice
3. Lock timing and pacing
4. Create master video/audio
### Phase 2: Translation
1. Professional translation (not machine)
2. Localization review (cultural adaptation)
3. Timing adaptation (fit original duration)
4. Brand term glossary enforcement
### Phase 3: Voice Generation
**Per language**:
- Load translated script
- Apply same voice settings as source
- Generate voice in target language
- Check pronunciation of brand terms
- Adjust pacing if needed
- Review for naturalness
### Phase 4: Quality Control
**Native speaker review checklist**:
â¡ Natural pronunciation
â¡ Correct emphasis and intonation
â¡ Brand terms handled correctly
â¡ No awkward phrasing
â¡ Appropriate formality level
â¡ Cultural appropriateness
### Phase 5: Integration
1. Replace audio track in video
2. Re-sync if timing changed
3. Update text overlays
4. Localize captions/subtitles
5. Final review per language
Step 5: Quality Assurance
Ensure each language meets standards.
## Localization QA Framework
### Technical QA
â¡ Audio levels consistent across languages
â¡ No clipping or distortion
â¡ Background music balanced correctly
â¡ Transitions smooth
â¡ Sync with video acceptable
### Linguistic QA
â¡ Translation accuracy (spot check 10%)
â¡ Natural flow and rhythm
â¡ Brand voice maintained
â¡ Technical terms correct
â¡ No machine-translation artifacts
### Cultural QA
â¡ No offensive content for market
â¡ References appropriate
â¡ Humor/idioms adapted correctly
â¡ Visual content appropriate
â¡ Call-to-action localized
### Native Speaker Sign-Off
For each language:
- [ ] Spanish (Reviewer: _____) â Approved
- [ ] French (Reviewer: _____) â Approved
- [ ] German (Reviewer: _____) â Approved
- [ ] [Add languages...]
Step 6: Calculate ROI
Compare AI localization to traditional approaches.
## Localization Cost Comparison
### Traditional Dubbing (per language)
| Component | Cost |
|-----------|------|
| Translation | $0.15/word |
| Voice talent | $300-1,000/hour finished |
| Studio time | $100-200/hour |
| Direction | $50-100/hour |
| Engineering | $50-100/hour |
**Example**: 10-minute video (1,500 words)
- Translation: $225
- Voice talent: $400
- Studio: $200
- Direction: $150
- Engineering: $100
- **Total: ~$1,075 per language**
### AI Voice Localization
| Component | Cost |
|-----------|------|
| Translation | $0.15/word |
| ElevenLabs Pro | $99/mo (unlimited) |
| QA review | $50-100/language |
**Example**: 10-minute video (1,500 words)
- Translation: $225
- Voice generation: ~$0 (within plan)
- QA review: $75
- **Total: ~$300 per language**
### ROI Summary
| Languages | Traditional | AI Localization | Savings |
|-----------|-------------|-----------------|---------|
| 5 | $5,375 | $1,500 | 72% |
| 10 | $10,750 | $3,000 | 72% |
| 20 | $21,500 | $6,000 | 72% |
**Break-even**: AI localization typically saves 70%+ vs. traditional
Examples
Example 1: E-Learning Course Localization
Context: Localizing a 20-video course to 5 languages.
Input:
We have a 20-video SaaS training course in English (total 3 hours). Need to localize to Spanish, French, German, Portuguese, and Japanese. What’s the best approach?
Output:
Course Localization Plan
Project Scope
- Content: 20 videos, 3 hours total
- Source: English
- Targets: Spanish, French, German, Portuguese, Japanese
- Content type: Educational (clarity > emotion)
Recommended Approach: AI Voice Localization
Why AI is ideal here:
- Educational content (fact delivery, not drama)
- 5 languages (efficiency critical)
- Likely ongoing updates (need repeatable process)
- Consistent “instructor” voice across all versions
Implementation Plan
Platform: ElevenLabs Pro ($99/mo)
Voice Strategy:
- Clone original instructor OR design consistent voice
- Same voice speaks all 6 languages
- Maintains authority and teaching style
Workflow per Video:
-
Translation (outsource)
- Professional translation (not machine)
- Provide glossary of product terms
- Timing guidance for text expansion
-
Voice Generation
- Generate all 5 languages from same voice
- ~15 minutes per language per video
- Total: ~25 hours voice generation
-
QA Review
- Native speaker review per language
- Focus: pronunciation, naturalness, accuracy
- Budget 1 hour review per language per video
-
Video Integration
- Replace audio tracks
- Adjust timing if needed
- Verify captions match
Timeline
| Phase | Duration | Notes |
|---|---|---|
| Translation (all) | 2 weeks | Parallel |
| Voice generation | 1 week | ~5 hours/day |
| QA review | 2 weeks | Parallel per language |
| Integration | 1 week | Parallel |
| Total | 5-6 weeks | With buffer |
Budget Estimate
| Item | Cost |
|---|---|
| Translation (5 lang à 27k words) | $20,250 |
| ElevenLabs (2 months) | $200 |
| QA review (5 lang à 20 videos à $30) | $3,000 |
| Integration (20 hours @ $50) | $1,000 |
| Total | ~$24,500 |
vs. Traditional dubbing: ~$107,500 (77% savings)
Example 2: Global Ad Campaign
Context: 30-second ad needs localization for global launch.
Input:
We’re launching a 30-second video ad globally. Need 10 language versions in 2 weeks. Budget is $15,000.
Output:
Ad Localization: Rapid Global Launch
Constraints
- 30-second ad (tight timing)
- 10 languages
- 2-week deadline
- $15,000 budget (~$1,500/language)
Recommended: Hybrid Approach
Tier 1 (Hero Markets) – Traditional Dubbing
- English (source)
- Spanish (largest reach)
- German (high value)
- French (high value)
Tier 2 (Scale Markets) – AI Localization
- Portuguese, Italian, Dutch, Polish, Japanese, Korean
Rationale
- Hero markets get premium treatment
- AI handles scale efficiently
- Both meet deadline
Production Schedule
Week 1:
| Day | Task |
|---|---|
| 1-2 | All translations complete |
| 2-3 | Traditional dubbing sessions (4 languages) |
| 3-4 | AI voice generation (6 languages) |
| 4-5 | QA review all versions |
Week 2:
| Day | Task |
|---|---|
| 1-2 | Revisions and fixes |
| 3-4 | Video integration all versions |
| 5 | Final review and delivery |
Budget Allocation
| Item | Cost |
|---|---|
| Translation (10 Ã ~120 words) | $1,800 |
| Traditional dubbing (4 lang) | $4,800 |
| AI generation (6 lang) | $600 |
| QA review (10 lang) | $2,000 |
| Integration (10 lang) | $2,500 |
| Buffer | $3,300 |
| Total | $15,000 |
Checklists & Templates
Localization Project Checklist
## Pre-Production
â¡ Languages selected and prioritized
â¡ Budget allocated per language
â¡ Timeline established
â¡ Translation vendor selected
â¡ Brand glossary prepared
â¡ Voice consistency plan defined
## Production
â¡ Translations complete
â¡ Translations reviewed for brand terms
â¡ Voice generated per language
â¡ Pronunciation verified
â¡ Timing adjusted if needed
## Quality Assurance
â¡ Native speaker review complete
â¡ Technical QA passed
â¡ Brand guidelines verified
â¡ Cultural review passed
â¡ Legal/compliance check (if needed)
## Delivery
â¡ Files named correctly per language
â¡ All formats delivered
â¡ Captions/subtitles provided
â¡ Documentation complete
â¡ Source files archived
Brand Glossary Template
## [Brand] Localization Glossary
### Never Translate
| English | Note |
|---------|------|
| [Brand Name] | Keep English, pronunciation: [X] |
| [Product Name] | Keep English |
| [Feature Name] | Keep English, explain in context |
### Translate Consistently
| English | Spanish | French | German |
|---------|---------|--------|--------|
| Dashboard | Panel | Tableau de bord | Dashboard |
| Workflow | Flujo de trabajo | Flux de travail | Arbeitsablauf |
| [Term] | | | |
### Pronunciation Guide
| Term | Pronunciation |
|------|--------------|
| [Brand] | /brÄnd/ |
| [Feature] | /fÄ-chÉr/ |
Skill Boundaries
What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches
What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success
References
- ElevenLabs. “Multilingual Voice Synthesis” – Platform documentation
- CSA Research. “Global Content Strategy” – Localization best practices
- Unbabel. “The State of Localization” – Industry benchmarks
- Nimdzi. “Localization Market Research” – Cost and ROI data
Related Skills
- voice-design – Creating the base voice
- voiceover-direction – Quality control principles
- transcription-to-content – Preparing source content
Skill Metadata (Internal Use)
name: voice-localization
category: audio
subcategory: voice
version: 1.0
author: MKTG Skills
source_expert: ElevenLabs, Localization Best Practices
source_work: Multilingual Content Production
difficulty: intermediate
estimated_value: 70%+ cost savings vs. traditional dubbing
tags: [localization, multilingual, dubbing, ai-voice, global]
created: 2026-01-26
updated: 2026-01-26