Voice Conversion

Voice Conversion Guide

Convert the timbre of any audio to your chosen target voice while preserving the original audio's emotion, rhythm, and expression. This guide will help you master the voice conversion feature.

Accessing the Feature

Visit the Voice Conversion page to get started.

Basic Usage Workflow

Step 1: Select Target Character

Choose the target voice you want to convert to:

My Voices

  • Use your cloned characters
  • Convert to your exclusive voice
  • Suitable for personal brand content

Voice Square

  • Use community-shared characters
  • Try different voice styles
  • Explore creative possibilities

Step 2: Upload Source Audio

Provide the original audio to convert:

Method 1: File Upload

Supported formats:

  • WAV
  • MP3
  • OGG

File requirements:

  • Maximum duration: 5 minutes
  • Maximum size: 20MB
  • Recommended sample rate: 16kHz or higher

Method 2: Online Recording

Record audio directly:

  1. Click the "Online Recording" tab
  2. Allow microphone permission
  3. Click record button to start
  4. Speak or play content to convert
  5. Click stop to finish recording

Step 3: Start Conversion

Click the "Start Conversion" button:

Processing Time:

  • Short audio (within 30s): 10-20 seconds
  • Medium audio (1-3 min): 1-2 minutes
  • Long audio (3-5 min): 1-3 minutes

Conversion Status:

  • Uploading: Uploading file
  • Processing: Converting timbre
  • Completed: Ready to listen and download
  • Failed: View error message

Step 4: Preview and Download

Online Preview:

  • Compare original and converted audio
  • Check timbre conversion effect
  • Verify emotion preservation

Download Audio:

  • Format: MP3
  • Sample rate: 24kHz
  • Bit rate: 128kbps
  • Preserves original audio duration

Core Features

Timbre Conversion

Preserved Content:

  • ✅ Original emotional expression
  • ✅ Speaking rhythm and pace
  • ✅ Speech content and pronunciation
  • ✅ Volume variations and emphasis

Changed Content:

  • 🔄 Voice timbre and characteristics
  • 🔄 Pitch and range
  • 🔄 Voice age and gender traits

Use Cases

Speech Content Conversion

  • Convert interviews to unified voice
  • Multi-character dubbing production
  • Voice style unification
  • Anonymization processing

Singing Conversion

  • Song cover creation
  • Voice style experimentation
  • Music production assistance
  • Creative audio art

Advanced Tips

Choosing the Right Target Character

Timbre Matching Principles:

Similar timbre conversion works better:

  • Male voice → Male voice
  • Female voice → Female voice
  • Adult voice → Adult voice
  • Child voice → Child voice

Cross-type conversion:

  • May have quality loss
  • Suitable for creative experiments
  • Requires multiple attempts

Source Audio Quality Optimization

Best Source Audio:

  • ✅ Clear vocals
  • ✅ Minimal background noise
  • ✅ Stable volume
  • ✅ Standard sample rate

Avoid Using:

  • ❌ Multiple people speaking simultaneously
  • ❌ Severely distorted audio
  • ❌ Over-compressed audio
  • ❌ Excessive background music

Handling Background Music

Audio with Background Music:

Option 1: Vocal Separation

  • Use audio editing software to extract vocals
  • Convert only the vocal part
  • Mix background music in post-production

Option 2: Direct Conversion

  • Background music will be preserved
  • May affect conversion quality
  • Suitable for minimal background music

Common Application Scenarios

Scenario 1: Video Dubbing

Requirement: Dub video content with unified voice

Workflow:

  1. Record or prepare voiceover speech
  2. Select target character (brand voice)
  3. Convert all dubbing segments
  4. Use in video editing software

Advantages:

  • Maintain voice consistency
  • Save professional dubbing costs
  • Quick iteration and modification

Scenario 2: Multi-Character Content

Requirement: One person voicing multiple characters

Workflow:

  1. Record all character lines with your own voice
  2. Select different target voices for each character
  3. Convert each character's audio separately
  4. Merge to create final content

Advantages:

  • Reduce production costs
  • Full control over timing and rhythm
  • Flexible adjustments and modifications

Scenario 3: Song Covers

Requirement: Sing songs with different voices

Workflow:

  1. Prepare a cappella or accompaniment version
  2. Select target singer voice
  3. Convert singing audio
  4. Mix to create complete version

Notes:

  • Respect original copyright
  • For personal learning or with authorization only
  • Indicate use of AI technology

Scenario 4: Voice Anonymization

Requirement: Protect speaker identity

Workflow:

  1. Upload audio requiring anonymization
  2. Select completely different voice
  3. Convert and verify effect
  4. Use converted audio

Applications:

  • News interview protection
  • Privacy content processing
  • Sensitive information sharing

Quality Optimization Tips

Achieving Best Conversion Results

Source Audio Preparation:

  1. Use high-quality recording equipment
  2. Record in quiet environment
  3. Maintain stable volume
  4. Avoid over-processing

Target Character Selection:

  1. Choose characters with similar timbre
  2. Listen to character samples
  3. Compare multiple tests
  4. Select best match

Parameter Adjustment:

  • If timbre deviation is large, try other characters
  • If emotion is lost, check source audio quality
  • If there's noise, use audio noise reduction tools

History Management

View Conversion History

View on the right side of the page:

  • Recent conversion records
  • Character information used
  • Conversion time and status

Re-download

  • No need to reconvert
  • Download historical results directly
  • Save quota and time

Quota Management

Duration Calculation

Calculated by source audio duration:

  • In minutes
  • Less than 1 minute counted as 1 minute
  • Failed conversions don't deduct quota

Check Usage

PlanText-to-SpeechVoice Conversion
Starter80,000 characters/month440 minutes/month
Standard270,000 characters/month1,500 minutes/month
Premium540,000 characters/month3,000 minutes/month

Subject to change, please refer to the Pricing page for current rates.

Quota Optimization

  • Merge short audio before conversion
  • Delete failed conversion records
  • Upgrade plan for more quota

Technical Limitations

Current Limitations

Unsupported Content:

  • Severely distorted or damaged audio
  • Non-vocal content (pure music, ambient sounds)

Quality Affecting Factors:

  • Source audio quality
  • Target character match degree
  • Audio complexity

Troubleshooting

Conversion Failed

Common Causes:

  • Unsupported audio format
  • File too large or too long
  • Audio quality too low
  • Contains violating content

Solutions:

  1. Convert audio format
  2. Trim audio length
  3. Improve audio quality
  4. Check content compliance

Unsatisfactory Results

Large Timbre Deviation:

  • Change target character
  • Select character with similar timbre
  • Adjust source audio quality

Lost Emotion:

  • Check source audio clarity
  • Ensure obvious emotional expression
  • Try different characters

Noise or Distortion:

  • Use noise reduction tools on source audio
  • Improve source audio quality
  • Reduce background noise

Usage Standards

Allowed:

  • ✅ Personal learning and experimentation
  • ✅ Authorized commercial use
  • ✅ Voice conversion of original content

Prohibited:

  • ❌ Unauthorized commercial use
  • ❌ Infringing others' copyright
  • ❌ Creating illegal or violating content
  • ❌ Impersonating others' identity

Disclaimer

  • Users are responsible for converted content
  • Comply with local laws and regulations
  • Respect intellectual property
  • Use reasonably and legally

Best Practices Summary

  1. Prepare High-Quality Source Audio - Clear, noise-free
  2. Choose Matching Target Character - Similar timbre works better
  3. Preview and Verify Results - Ensure satisfaction before download
  4. Use Quota Reasonably - Optimize audio length
  5. Follow Usage Guidelines - Legal and compliant use

Next Steps


Need help? Contact [email protected]