Speech recognition has made significant progress over the past decade, so much so that its transcription capabilities are now surpassing those of humans transcribing audio. This advancement is a key component for the next generation of mobile interactions, where your voice will be the interface with your device. Although speech recognition is cool, it is only one of three critical components that will create the next-generation customer experience.
Lost in the current wave of speech recognition noise and marketing are two key elements that are necessary to deliver a richer customer experience: natural language understanding and context. We can think of the three components working together like they do in a human. We have eyes, ears and a mouth for communicating, but without the brain’s understanding, our interaction is meaningless. Likewise, speech recognition acts as a device’s ears and mouth, but without natural language understanding – the brain – the application can’t do anything with its transcribed text.
Natural language technology has the capability of making unstructured data—such as a conversation or input from a user—understandable by computers and other smart devices, allowing them to “know” the user’s intent and interact appropriately. While technically impressive, combining speech recognition with natural language processing capabilities is still not enough to build a smart, next-generation solution. To move the needle on the user experience and really emulate human conversation, the ability to engage in mixed initiative dialogue is necessary. Mixed initiative dialogue—the ability to engage in interactions such as prompting the user to clarify their intent or mining important pieces of data from their input to complete an action on the user’s behalf—allows the system to move beyond just processing text, achieving a new paradigm where conversation becomes the centerpiece.
The final leg of the stool is context. This invisible element in human conversation allows us to assume aspects of the conversation that are not spoken. This functionality must be present in the next-generation experience simply because we, as humans, expect this from our daily conversations. Contextual elements in a smart interaction can be as simple as what was last said in a conversation or more complex, like an array of information from an enterprise CRM system.
By looking beyond the surface and making voice recognition smart, businesses have an opportunity to deliver an interface that’s incredibly natural and that’s more effective than traditional customer-experience voice-recognition offerings.