CHINA TOPIX

03/28/2024 04:17:55 pm

Make CT Your Homepage

Baidu Voice System Recognizes Languages Better Than Humans Can

baidu AI

(Photo : getty Images) Baidu CEO Robin Li introduces the new AI-powered digital assistant 'Duer' during the 2015 Baidu Technology Innovation Conference in September. Baidu has created a voice system that uses deep learning to be able to recognize English and Mandarin better than a person can.

Chinese Internet-search giant Baidu (BIDU) has created a voice system that uses deep learning to be able to recognize English and Mandarin better than a person can.

In a report published by Cornell University, Baidu researchers showed that "an end-to-end deep learning approach" can be used to recognize and differentiate two very different languages such as English and Mandarin.

Like Us on Facebook

So-called deep learning is a new area of machine learning research that has the objective of moving one step closer to artificial intelligence. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows people to handle a diverse variety of speech, including accents and different languages.

The key to the researchers' approach is their application of high performance computer (HPC) techniques, which resulted in a seven-fold increase in speed over their previous system. Because of the efficiency this allows, experiments that previously took weeks now take only days to complete.

"This enables us to iterate more quickly to identify superior architectures and algorithms," said the researchers in their report. "As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets."

Decades worth of hand-engineered domain knowledge has gone into current speech recognition pipelines. According to the Baidu researchers, an easy, yet powerful alternative solution is to train such ASR models end-to-end, is using deep learning to replace most modules with a single model.

Known as Deep Speech 2, Baidu's speech system approaches or exceeds the accuracy of Amazon Mechanical Turk human workers on several benchmarks, boasts the company. It also works in multiple languages with little modification, and is deployable in a production setting.

"It thus represents a significant step towards a single ASR system that addresses the entire range of speech recognition contexts handled by humans," said the report. "We show that through these techniques we are able to reduce error rates of our previous end-to-end system in English by up to 43%, and can also recognize Mandarin speech with high accuracy."

Real Time Analytics