November 30, 2000 By Peter M. Hermsen
Difficulty of Dictation
Dictation is perhaps the most difficult task for speech recognition systems to perform. Free-form speech is usually understandable by human beings. However, the mere nature of the way we communicate with one another, using accent, inflection and emotion, makes it more difficult for the computer to discriminate the words being spoken. Nonetheless, numerous products have appeared on the market, indicating that, while the technology may not be perfect, with some training on the part of the user and the computer, these systems can be highly effective and useful in a controlled environment.
Voice recognition using such products gets substantially better as the system is trained to understand an individual. Also, during the dictation process, if the system does not understand a word or utterance, the user may be prompted to type the word or spell the misunderstood word verbally.
Microphone quality also plays a substantial role in defining the overall quality of the user experience. Poor-quality microphones, including many that are built into monitors and laptop computers, yield less-than-desirable results in dictation systems. The preferred type of microphone for this environment is a headset microphone, which can be placed into fixed position in front of the user's mouth to provide consistent audio quality. A microphone with some form of noise cancellation is also preferred. Noise cancellation is the ability for a microphone to ignore unwanted noises. This type of microphone is typically sensitive in only one direction, and sounds reaching it from other than the speaker are largely ignored or canceled.
But even the best microphones cannot compensate for too much background noise. For a dictation system to perform acceptably, ambient noise levels must be kept to a minimum, as must be spurious noise. If the noise level increases beyond the system's threshold, dictation accuracy diminishes rapidly.
As both physical and data security become of greater concern, new methods of uniquely identifying people are emerging. These technologies rely largely upon the physical differences that make us individuals. Because every human being has a unique voice, voice can be used as a form of biometric user verification to physically secure an area, limit access to personnel files or verify a claimant's identity over the telephone for unemployment insurance processing.
This type of system works by enrolling each new user. The enrollment process consists of directing the user to repeat a series of numeric or verbal prompts. Once this is complete, the system generates a model of the user's vocal patterns. This model is unique to that individual. When used in conjunction with other forms of identification, such as username and password, a physical key or combination, voice biometrics provides a very high degree of confidence in verifying a user's identity.
Companies such as VeriVoice, T-Netix, Keyware and others offer products that perform the task of speaker recognition. These companies offer applications that handle:
* physical access;
* time and attendance;
* network and data security;
* securing Web-based applications and data; and
* custom application development.
As such, speaker recognition technology can let you into the building, clock you in at the start of the workday, give you
You may use or reference this story with attribution and a link to