Technology Encyclopedia Home >How to evaluate the quality of speech synthesis?

How to evaluate the quality of speech synthesis?

Evaluating the quality of speech synthesis (TTS, Text-to-Speech) involves assessing both objective and subjective metrics to determine how natural, intelligible, and human-like the synthesized speech sounds. Here’s a breakdown:

1. Subjective Evaluation (Human Judgment)

This is the most reliable method, where real users rate the speech based on:

  • Naturalness: How human-like the voice sounds (e.g., smoothness, rhythm, intonation).
  • Intelligibility: How clearly the words can be understood.
  • Overall Preference: Which synthesized voice sounds better compared to others (or human speech).

Example: A listener rates a TTS output on a scale of 1-5 for naturalness. If the score is close to 5, the synthesis is high-quality.

2. Objective Evaluation (Automated Metrics)

These metrics compare the synthesized speech with ground truth (human-recorded speech) or measure inherent qualities:

  • MOS (Mean Opinion Score): Averaged human ratings (1-5), but can be approximated using AI models.
  • Mel-Cepstral Distortion (MCD): Measures spectral differences between synthesized and real speech (lower = better).
  • Word Error Rate (WER): Checks if the synthesized speech is correctly transcribed (higher WER = lower intelligibility).
  • Prosody Score: Evaluates rhythm, stress, and intonation accuracy.

Example: If MCD is low (e.g., <5 dB) and WER is near zero, the synthesis is likely high-quality.

3. Practical Testing

  • Edge Cases: Test with rare words, numbers, or emotional text (e.g., "The stock price dropped 20% today!" should sound urgent).
  • Speaker Consistency: Ensure the same voice maintains quality across different sentences.

Tencent Cloud Recommendation

For building or evaluating TTS systems, Tencent Cloud Text-to-Speech (TTS) provides high-fidelity voices with natural prosody. It supports multiple languages and styles (e.g., conversational, news reading) and offers APIs for easy integration. You can also use Tencent Cloud AI Lab’s evaluation tools to assess speech quality metrics.

Example Use Case: A customer service bot using Tencent Cloud TTS delivers clear, natural responses, improving user satisfaction.