Editor's Note: The following is a guest article from Werner Goertz, a research director in Gartner's personal technologies research team, where he covers personal devices and IoT.
Consumption of audio content, management of connected home services and shopping bots have spawned a multibillion dollar market for virtual personal assistant (VPA) speakers.
Today, the market for VPA-enabled wireless speakers such as Amazon Echo and Google Home is expanding with more vendors, device types and use cases. The addition of Apple, Samsung, Alibaba and others to the vendor portfolio — plus a new generation of video-enabled products — should further entice buyers.
While VPA devices are thought to be primarily for consumer consumption, that has begun to change as enterprises begin to adopt the technology via workplace applications and professional user experiences.
Enterprise VPA adoption will accelerate in 2019 following market drivers including IT architecture integration, biometric authentication abilities and privacy and security features. AI-enabled VPA speakers with edge-device machine learning capabilities can decrease response latency and address privacy and security issues, further accelerating adoption.
Speakers built for business
VPA speakers with custom-made hardware and software configurations will roll out in retail applications in 2020, enabling new brand experiences. Healthcare and elder-care applications will be facilitated by VPA speakers, and the cost of hardware and services will be at least partly subsidized by healthcare ecosystem partners that are likely to gain efficiencies and encourage patient adoption.
By 2019, virtually all required languages will be supported. Gartner expects coverage of the languages spoken in the major markets to improve in the near term. By 2019, any remaining missing language support will not measurably impact growth.
Note, however, that the term "language support" implies more than just speech-to-text and text-to-speech capabilities: The ability to contextualize and provide cultural and historical context is a necessary prerequisite for language and regional support.
Today's VPA interactions are still practically limited to single-command and fact-based interactions. The ability to link together multiple statements into a single command and the ability to entertain threaded conversations are improving daily but are still lacking in quality.
Gartner believes that, by 2020, systems will have improved — and more than half of all interactions will indeed be conversational, as opposed to single-command.
Artificial intelligence at the edge
Today's VPA speaker architecture only processes the trigger-word recognition locally. All other features are executed in the cloud, thereby requiring an active broadband internet connection.
Natural language processing is subject to latency, and the user experience suffers if response times of 400 to 600 milliseconds are not met. Processing certain AI functions at the edge or in the device can alleviate concerns around latency, network availability and privacy, and the user experience will improve to drive sales growth.
The other primary driver for AI processing at the edge comes from enterprises that, for regulatory, confidentiality and legal reasons, are still insisting on on-premise AI training and modeling and are intolerant of cloud-based AI.
To address these inhibitors, VPA speaker architectures will progressively implement processing and storage capabilities. By 2021, Gartner expects more than one-third of new shipments will support some form of local AI support.
By 2019, more than half of the enterprise VPA speakers shipped will be enabled for biometric authentication. The advancement of biometric authentication technologies — namely voice authentication and facial recognition — will allow for acceptable multifactor authentication methodologies to deliver superior false acceptance rates and false rejection rates.
Finally, ensuring that cloud-based processing of voice commands is not in violation of regulatory and shareholder interests is key for any enterprise.
While concerns over privacy have little technical merit (the device has no processing capability unless the trigger word is recognized), market psychology is a current obstacle. Gartner believes that, by 2020, such concerns will have largely been mitigated through educational efforts, adoption by peers and regulatory approvals of the device category.
VPAs in the enterprise
Hospitality, higher education and financial technologies are applications in which early external, customer-facing VPA speaker adoption is becoming apparent.
To improve customer experience, retention and efficiency, hotel brands such as Marriott are deploying in-room VPA speakers to give guests voice-driven access to amenities, services and account management.
Colleges such as Saint Louis University and Arizona State University are expanding their services offerings to students, parents and faculty through voice-enabled, interactive offerings. Many financial services companies are now expanding traditional interactive voice response systems (IVR) with natural language interfaces and multimodal UIs that incorporate screens in addition to voice.
Internal enterprise use cases for VPA speakers come in the form of extending companies' Unified Communications (UC) strategies. Integration with enterprise assets such as email, ERM and Active Directory Services will allow use cases such as "dial me into my 10 am meeting" or "pull up the Southwestern sales numbers for Q3."