The accuracy of speech recognition is typically measured by comparing the system's output (transcribed text) to a reference or ground truth transcript (the correct, human-annotated text). The most common metrics used are:
Word Error Rate (WER) – The most widely used metric. It calculates the minimum number of operations (insertions, deletions, and substitutions) needed to change the recognized text into the reference text, divided by the total number of words in the reference.
Character Error Rate (CER) – Similar to WER but measures errors at the character level, useful for languages with complex words or when evaluating short phrases.
Accuracy (or Match Rate) – The percentage of correctly recognized words or characters.
Example in Real Use:
For cloud-based speech recognition, services like Tencent Cloud ASR (Automatic Speech Recognition) provide high-accuracy transcription with low WER, optimized for different industries (e.g., finance, healthcare). It supports real-time and batch processing, with metrics to monitor performance.