The accuracy and accessibility of automatic speech-recognition applications such as speech-to-text APIs (Application Programming Interfaces) has dramatically increased in recent years, automating the generation of highly useful, digestible transcripts for companies and platforms alike.
Then, on top of this transcription data, platforms are offering natural-language processing and understanding applications, which some refer to as audio-intelligence tools, to identify trends across feedback data that human analysis might not have surfaced.
This article examines the three main ways in which ASR, NLP, and NLU—which are backed up by cutting-edge AI and machine-learning research—are transforming today’s best customer-research platforms, as follows:
- By facilitating a more efficient review process
- By surfacing key highlights, themes, and trends
- By creating smart tags to categorize and search for information
Facilitating a More Efficient Review Process
The best speech-to-text APIs today integrate with customer-research platforms to transcribe both asynchronous and live voice or video customer feedback, with nearly the same accuracy as a human transcriber as measured by word error rate.
Word error rate (WER) is the de facto standard of measurement for accuracy in speech recognition. Essentially, we can calculate WER by measuring the number of errors in the text of a transcription versus that of a human transcriber. While WER is a great starting point for comparing accuracy across transcriptions, keep in mind that it doesn’t consider other factors such as transcript readability, context, the use of text versus numbers—for example, seven versus 7; capitalization, and paragraph structure.
Thankfully, today’s speech-to-text APIs don’t just supply a wall of text. Imagine reading a novel without quotation marks, he or she said markers, paragraphs, or punctuation. Reading it would take an enormous amount of effort, right? Unfortunately, that is how speech-to-text APIs used to output transcripts.
But speech transcription has come a long way since its inception. In addition to its significantly higher accuracy, today’s speech-to-text APIs produce formatted documents that include automated punctuation and case, paragraph structure, and speaker labels, if applicable. This automated formatting greatly increases transcripts’ readability, as well as their utility.
In addition, some speech-to-text APIs can redact personally identifiable information (PII) from text automatically, in the event that companies are obligated contractually or legally to redact personal or sensitive information. The redaction of PII could include phone numbers, addresses, social-security numbers, and credit-card numbers, replacing each number in a sequence with a # symbol.
Through automated transcription, readability features, and PII redaction, customer-research platforms can already facilitate a much simpler review process for the companies they serve.