|
OpenSpeech Recognizer [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
OpenSpeech Recognizer Overview
OpenSpeech Recognizer (OSR) is a software package that provides high performance speaker independent speech recognition for telephony applications. It handles extremely large vocabularies, even over one million words, with outstanding accuracy and unmatched efficiency. OSR is available in all-in-one and client-server configurations that share a common API for integration flexibility. In addition, it includes several features that make it ideal for use in VoiceXML systems.
High performance combination of outstanding accuracy and efficiency
Supports multiple languages
Choice of all-in-one or client-server architectures
Features to simplify VoiceXML support
Integrated grammar compiler with 2-level cache and remote fetching
Parallel, shared, and scriptable grammars
Adaptation of acoustic models, pronunciations, and grammars
OpenSpeech Recognizer is provided as an SDK for integration with a telephony platform along with tools to assist in the development and maintenance of speech applications. The platform provides OSR with a digitized audio stream, directs the recognition actions to be performed, and acts on the results returned.
OSR provides large vocabulary, speaker independent, continuous speech recognition for telephone quality signals. It is not designed for desktop or dictation applications. OSR has been deployed in a wide range of applications, including call center screening, automated attendants, name and address capture, travel reservations, information retrieval, voice portals, and customer self service.
OSR includes many parameters that can be used to fine-tune its behavior. OSR includes several replaceable modules for added integration flexibility. Additional adjustments and customization can be provided by SpeechWorks Solutions group to support unusual applications.
OSR is written in C and is multi-threaded.
OpenSpeech Recognizer is available in two configurations: all-in-one and client-server. Both share a common API, allowing developers to switch between the two configurations without recompiling their code.
In the all-in-one configuration, speech recognition is typically performed on the host where calls terminate. This configuration is easy to manage and resource efficient. It readily delivers the low latency responses favored for high quality speech interfaces. However, the number of channels supported in this configuration is limited by the available processing power and memory. This limitation can be a concern when applications must support high traffic volumes and very large vocabularies, or when older, less capable hardware must be used.
In the client-server configuration, the client typically runs on the system where calls terminate, sending speech data over a network to one or more servers where speech recognition is performed. This generally allows for much higher line termination densities, as the OSR client consumers little computation. However, it does impose more of a systems management burden and can result in longer latencies if not carefully engineered. In fact, speech detection is often run on the client host to ensure callers perceive the application to be responsive.
GOTO=> [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
|