An ancient story exists about a human civilization that began united by a common language, but the culture was scattered and its people confounded by the development of varied dialects. In reality, the multiplication of population and communication has resulted in about 7,000 languages in the world today.
But this month, Microsoft unveiled a technology that has the potential to break centuries of language barriers. The company demonstrated its translation method that can, within seconds, allow English speakers to have their own words, in their own voice, played back in Mandarin.
“One of the many scenarios we envision for this type of technology is within the public sector, helping government officials, law enforcement and immigration officers overcome language barriers when necessary to serve their communities,” said Rick Rashid, chief research officer at Microsoft Research.
Say, for example, a law enforcement official is working in the field and quickly needs to communicate with a Mandarin-speaking civilian. The officer could use speech translation technology to communicate in English, while the civilian hears Mandarin speech in the officer’s voice.
This technology has shown the potential to augment human translators in combat zones. In 2007, IBM provided the military two-way automatic translation devices to support better communication in Iraq. The systems could recognize and translate a vocabulary of more than 50,000 English and 100,000 Arabic words.
The hope, Rashid said, is that within a few years, they will be able to break down the language barriers between people, according to GeekWire.com. He explains that the company uses a technique called Deep Neural Networks, which are mathematical patterns of the human brain, to improve automatic speech recognition. In order to demonstrate the technology in Rashid’s voice, Microsoft researchers built a system that takes into account speech from a native Chinese speaker and an hour of recordings of past speeches by Rashid.
One of the most familiar uses of speech recognition technology is the automated message callers receive when calling customer service at a bank, utility company or airline. These machines ask customers to speak the reason for the call, and then, to confirm the request, the machine may repeat the purpose of the call back to the caller. Although there is room for improvement, technology such as Apple’s Siri is a promising testament to voice accuracy.
In a blog post on Technet.com, Rashid writes that Microsoft’s technology reduces the word error rate for speech by more than 30 percent compared to previous methods.
“For the last 60 years, computer scientists have been working to build systems that can understand what a person says when they talk,” Rashid said. “While there’s still work needed to perfect such a system, we’re very excited about the possibilities of this achievement in real time translation technology.”