Automated Speech Recognition (ASR) Overview

Automated Speech Recognition (ASR) allows contacts to respond to IVRInteractive Voice Response; an automated phone menu that allows callers to interact through voice commands, key inputs, or both, to obtain information, route an inbound voice call, or both. prompts by speaking, either instead of or in addition to, pressing keys on their telephone. NICE inContact offers ASR as an optional feature using the industry-leading Nuance ASR engine.

As of the Fall 18 release, the Nuance ASR engine has been upgraded. For more information on what this means to existing ASR users, click here.

ASR is meant to simplify and speed up your callers' experience with your IVR. An ASR-enabled IVR should recognize not only words but also phrases, match them with values you have pre-defined, and route or answer calls accordingly.

Terminology

You should be familiar with ASR-specific usage of the following terms:

  • Utterance — Words or phrases spoken by a caller in response to IVR prompts.
  • Grammar file — Provides rules for the ASR engine covering the words or phrases callers can be expected to say in response to a prompt and assigning content to variables based on those responses. This makes the recognition process much more efficient, and gives much higher rates of accuracy. Many ASR Studio actions have built-in grammar files. You can also use custom grammar files, or grammars, for some actions. These are typically written in XML and saved as .grxml files. They should be compiled prior to use in your NICE inContact system.
  • Phrase list — Provides a simple list of phrases that callers can be expected to say in response to a prompt, one per line. Phrase lists are typically entered using the PhraseList property of a Studio action.
  • Confidence percentage — Also referred to as recognition percentage. When the ASR engine recognizes a phrase spoken by a caller, it also returns a percentage that indicates how confident it is in interpretation, or matching the utterance to the phrase list or grammar file. The confidence percentage can be used to route calls to different branches in your ASR-enabled IVR script. Confidence levels used in the NICE inContact are:
    • High — Confidence percentage is high; typically, 75% or greater. The contact can be routed through the OnHighConfidence branch without any further confirmation of the utterance.
    • Medium — Confidence percentage is mid-range; that is, somewhere between high and minimum. The contact can be routed through the OnMedConfidence branch and asked to confirm the utterance.
    • Minimum — Confidence percentage is at the minimum acceptable level. This value is typically used to set a bottom number for the OnMedConfidence branch.
    • No Confidence — The utterance was unrecognizable and the ASR engine cannot interpret it. The contact can be routed through the OnNoConfidence branch and asked to repeat the utterance.

ASR Actions

For production IVR scripts, Studio offers seven ASR actions designed for specific types of prompts, as well as two more general actions. All of these actions allow you to capture and interpret an utterance, populate a variable based on the utterance, and route the contact based on the variable value, the confidence percentage, or both. Choosing the best action for each prompt will help your scripts process speech effectively. Here's a brief description of each action:

  • ASR — Accepts any type of utterance and interprets it based on a custom phrase list or grammar file you provide. This action offers a great deal of flexibility but is also more complicated to set up.
  • ASRALPHANUM — Accepts an utterance of a combination of letters, numbers, or both (for example, a password or email address). This action comes with a built-in grammar file.
  • ASRCURRENCY — Accepts an utterance of a monetary value (for example, a payment amount). This action comes with a built-in grammar file for one or more currencies, based on the language pack for your business unit.
  • ASRDATE — Accepts a variety of utterances related to dates, based on its built-in grammar file. This includes full dates, days of the week, relative date references (such as yesterday), and more.
  • ASRDIGITS — Accepts an utterance of a string of digits (for example, a phone number or social security number). This action comes with a built-in grammar file.
  • ASRMENU — Accepts utterances that you define to create a speech-enabled menu. This action can use a custom phrase list or grammar file, or you can use the branch variables you create for the menu itself as a basis for interpreting the caller's utterance.
  • ASRNUMBER — Accepts utterance of numeric values. For example, an utterance of "five six" would be interpreted by this action as "fifty-six", whereas ASRDIGITS would interpret the same utterance as two separate digits, "five" and "six". This action comes with a built-in grammar file.
  • ASRTIME — Accepts a variety of utterances related to time, based on its built-in grammar file. This includes durations (such as "twelve hours") in addition to specific times (such as "three p m").
  • ASRYESNO — Accepts positive or negative utterances based on its built-in grammar file. For example, there are multiple variations on how a caller might say "yes" (yes, yeah, yep, yup, okay, and so on). This action recognizes such variations.

Studio also offers two actions that can be used to build a custom grammar file from an existing database. For example, your IVR might ask a caller for a part number. Or you might want to let the caller select an extension by giving an employee's name. In either case, you likely already have a database that contains the possible values a caller might utter, and it makes sense to build your file using the data you already have. The two actions used for this purpose are:

  • ASRCOMPILE — Used to compile custom grammar files into the .gram format used by the Nuance ASR engine. This action is used in scripts that are run once, or at most, on an occasional basis. The script can be used to process existing .grxml files or in combination with ASRSQL to create a new custom grammar file.
  • ASRSQL — Works with the DB Connector feature to pull a file of values from an existing database. This file can then be formatted and compiled into a grammar file for your ASR-enabled IVR.

Best Practices

As you develop ASR-enabled IVR scripts, keep the following in mind:

  • Familiarize yourself with the ASR actions so you can choose the right action for each prompt.
  • Several actions offer a choice between spoken and DTMFDual-Tone Multi-Frequency; DTMF signaling tones are generated when a user presses or taps a key on their telephone keypad. input. In some cases, DTMF might actually provide a better caller experience. For example, keying a social security number is just as easy as speaking it, and may be easier for the system to interpret.
  • Languages available for speech recognition vary depending on where your business unit is housed. Ask your account manager for more information.
  • You can use phonetic spellings in your phrase lists or grammar files to increase accuracy. This can be especially helpful if the prompt may elicit responses that are often mispronounced (for example, some city or county names).
  • Scripts should include routing in case there is a failure in the ASR functionality, such as reverting to DTMF-only mode or playing a failure message before terminating the interaction.
  • You can engage Professional Services to assist you in developing ASR-enabled IVR scripts and their components, such as custom grammar files built from your existing database. Contact your account manager to learn more.