|
OpenSpeech Recognizer [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
OpenSpeech Recognizer deployment density varies significantly with CPU speed, language packs installed, grammar size and structure, duty cycle, response time required, and features enabled. Most server applications will support at least 60 channels per server, and 90 channels per server is common.
OpenSpeech Recognizer supports 2-level dictionary look-up for determining the pronunciation of words.
OSR includes a "system" dictionary that covers most words in general usage.
OSR supports a supplemental "user" dictionary defined by the developer or system administrator. The user dictionary allows the default pronunciations assumed by OSR to be overridden. This is especially helpful when addressing applications with task-specific jargon, or when deploying applications with strong regional dialects.
The user dictionary can be edited in text form or, using the supplied graphical tool, by selecting phonemes from a palette and listening to the constructed pronunciations.
OpenSpeech Recognizer, without modification, will work well for a broad range of applications and callers. However, sometimes the population using an application is biased towards a particular channel type (wireline or wireless), dialect ( New England , or Southern), or environment (quiet or noisy). Even the application domain itself may favor one pronunciation over another, or one phrase over a similarly sounding one. OSR includes an administration tool called "LEARN" that can analyze caller responses and make adjustments to maximize accuracy.
LEARN is unique in that it optimizes all three aspects of the speech application that determine pronunciations:
Acoustic models will be altered to accommodate the acoustic environments observed and to match regional variation in phoneme pronunciation. For example, the models may be adjusted to work best with cellular signals in a mobile voice dialing application.
Pronunciation models will be altered to account for novel pronunciations. For example, a second pronunciation for "Bernstein" may be added in a call routing application.
Language models will be altered to favor frequently occurring phrases. For example, "IBM" may be preferred over "IDN" in a stock trading application.
LEARN works by analyzing data logged as the application executes. It focuses on instances where the speech recognition "confidence" score was low enough to require confirmation with the caller, but that confirmation proved the system was correct. LEARN is typically executed frequently when an application is first deployed and less frequently, if at all, once the application is mature. LEARN can reduce error rates by up to 70%.
GOTO=> [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
|