What the Voice-Recognition Industry Needs Most

I'm a big believer that voice-recognition technology will play an increasingly prominent role in how we interact with technology -- so much that I've made a bet on Nuance Communications (Nasdaq: NUAN  ) accordingly as the clear technological leader in the field.

So when the CEO of a small voice-recognition software company, Datria, reached out to me for a conversation, I jumped at the chance. Datria is a small private player with about 50 employees, and it resells Nuance's speech engine while also counting software giant SAP (NYSE: SAP  ) as an investor.

Jim Greenwell has been CEO for 12 years, and he provided valuable insight into the industry at large, as well as what trends users and investors should be on the lookout for.

Who will rally the troops?
If there's one primary takeaway from our conversation, it's that the voice-recognition industry needs leadership more than anything. I did just call Nuance the clear leader, but let me offer some additional context.

There are two primary layers to serving up voice-recognition: the speech engine and the application software that taps into it. And as Greenwell says, "Nuance clearly has the best [speech] engine out there." He notes that it works in more than 60 languages and has underlying linguistic algorithms to recognize small "phonetic sound bites," which vary greatly between languages.

Datria is one of many voice application software providers that taps into the engine and puts it to use. That being said, Datria uses a plug-and-play model, so it could theoretically swap out the engine if needed, but Datria has been reselling Nuance's engine for 14 years.

Please don't make me call the cable company
It's this application software layer that can cause a negative perception of voice technology in general. Let's say you call up one of your various service providers, be it your cable provider, discount broker, or favorite airline. Many companies use automated voice now, but sometimes those systems come up short on actually interpreting what you mean.

Many of those companies license Nuance's engine, and the recognition side works great, but if the application (which is frequently built in-house) isn't programmed to interpret all the ways you can say "yes," then it might not realize that "absolutely" means the same thing. This leaves many users with an impression that voice-recognition technology is broken, when it's really the application software layer that needs improvement.

That disjointed experience is what the industry needs to address.

Maybe the big boys can help
Enter Apple (Nasdaq: AAPL  ) . The Mac maker is doing wonders in terms of consumer awareness with Siri, which serves as the app software layer while Nuance's engine runs on the back end. Siri used Vlingo's engine in the beginning but eventually switched to Nuance, while Nuance has since acquired Vlingo.

On the other hand, Google (Nasdaq: GOOG  ) has Voice Actions built into Android and is likely to be working on a Google Assistant to boot. In contrast, Big G builds its own speech engine in-house, with the help of Nuance co-founder Mike Cohen leading its speech technologies group.

Microsoft (Nasdaq: MSFT  ) also has a back-end speech engine that competes with Nuance's that it built after purchasing Tellme five years ago, but this isn't an area where Mr. Softy has dedicated a lot of attention.

Nuance has different strategic approaches to its different markets. The health-care and medical-transcription market is one segment where it builds the app software running on its engine, which is important as its largest segment (38% of revenue last quarter). On the mobile and consumer side, Nuance also has its popular Dragon suite of apps.

Whereas on the enterprise side, which barely put up any growth last quarter, Nuance just sells its engine while leaving the app software layer to third parties such as Datria.

Keyboard and mice are so last century
Greenwell spends a lot of time evangelizing for the benefits of voice recognition and debating with naysayers, whose perceptions have been skewed by limited exposure to quality applications. This is where the industry needs leadership: driving awareness to uncover more useful applications. Greenwell is mostly indifferent as to where this leadership could come from -- be it Apple, Google, Microsoft, or Nuance.

He even envisions a corporate virtual assistant that could do wonders for the enterprise. While Siri helps you manage your personal affairs and taps into public content sources such as Yelp or Wolfram Alpha, imagine an enterprise doppelganger that can access HR files and proprietary corporate databases through voice interaction, complete with advanced biometric security (which Nuance already offers) to prevent unauthorized access.

One of the biggest challenges is uncovering all the niche uses of voice technology and subsequently monetizing them, which just makes leadership in awareness that much more crucial. There are plenty of ways to boost productivity to increasingly mobile workforces with voice, and Greenwell says it's "impractical to perpetuate the legacy UI of keyboards and mice" for mobile workers.

A twofer
Enterprise voice applications are a huge market that Datria is tapping into with help from Nuance. It's further evidence of the consumerization of IT and the mobile revolution, which are two of the largest technological shifts in generations -- two trends that investors won't want to miss.

The mobile revolution promises to be The Next Trillion Dollar Revolution. In this special free report, The Motley Fool names one company that's powering the revolution from the inside out and has exposure to China's explosive growth. Get the free report now.

Fool contributor Evan Niu has a synthetic long options position in Nuance Communications and owns shares of Apple and Nuance Communications, but he holds no other position in any company mentioned. Check out his holdings and short bio. The Motley Fool owns shares of Microsoft, Apple, and Google. Motley Fool newsletter services have recommended buying shares of Apple, Google, Microsoft, and Nuance Communications and creating bull call spread positions in Apple and Microsoft. Try any of our Foolish newsletter services free for 30 days. We Fools don't all hold the same opinions, but we all believe that considering a diverse range of insights makes us better investors. The Motley Fool has a disclosure policy.


Read/Post Comments (5) | Recommend This Article (7)

Comments from our Foolish Readers

Help us keep this a respectfully Foolish area! This is a place for our readers to discuss, debate, and learn more about the Foolish investing topic you read about above. Help us keep it clean and safe. If you believe a comment is abusive or otherwise violates our Fool's Rules, please report it via the Report this Comment Report this Comment icon found on every comment.

  • Report this Comment On March 20, 2012, at 8:20 PM, tikwart wrote:

    Nice article...

    check out Verbble, the first speech recognition app for the enterprise, available on all major platforms, including iOS, Android, Blackberry, and Windows.

    Here is a great demo w/ Salesforce - http://www.verbble.com/tour/#ipad

    Verbble can work with any form, webpage, PDF or anything that has a data input and optimize it so that you can fill it out with Verbble’s Talk, Type, or Click capabilities (preserving all of the native inputs) on any mobile device or your desktop), providing up to a 5x increase in data input on mobile devices. Verbble is able to send that data to virtually any endpoint(CRM, CMS, website, e-mail, etc)…this type of functionality adds a whole new dimension to mobile productivity. The Company is currently in trials with a number of SMBs as well as Fortune 500 companies, who are looking for ways to increase productivity with their current mobile devices, including iPads and Android devices.

  • Report this Comment On March 20, 2012, at 10:57 PM, Flaksman wrote:

    You guys should do your homework. Microsoft has had a speech recognition engine since the early 1990's. TellMe in fact relied on Nuance until purchased by Microsoft upon which they switched to the Microsoft Speech Engine. TellMe provided an application layer on top of the speech recognition engine. And speech has been a major component of Microsoft's strategy for some time. This includes Ford Sync, Kinect, and Speech in Windows. Please get your facts straight ... this is just irresponsible reporting.

  • Report this Comment On March 21, 2012, at 12:18 AM, wmeisel wrote:

    Datria adds particular value, making Nuance technology work in a warehouse environment, with noise and some workers whose native language isn't English. It turns out that stock picking applications using voice to allow hands-free operation and realtime reporting of stock outages, etc., has made this a quiet killer application.

    The call center applications everyone hates are not a limitation of the technology, but a limitation of what companies are willing to spend to make it better. This has historical roots in a blindness to customer service being part of marketing and brand building. Instead of treating it as such and being willing to spend part of the ad/marketing budget to make it like Siri (which the technology allows), companies justify it on the basic of cost avoidance--how much agent time the automation saves and how short they can make the call. How many companies would try to drive people from their web site as quickly as possible. Until this organizational blind spot is eliminated, it will continue to make the technology look bad. With Siri as an example of what it can be, the companies won't be able to hide behind call center budgets for long.

    - Bill Meisel

    Speech Strategy News (www.tmaa.com)

  • Report this Comment On March 21, 2012, at 5:33 PM, PaulH7171 wrote:

    The big problem I have with voice recognition technology when calling the cable company, the phone company, etc., is that I often have my one-year-old or my three-year-old nearby at the time. You see, normally I don't see a need to go to a quiet part of the house simply for navigating through a menu and waiting on hold for ten minutes. I'll wait until I'm finished waiting on hold before I go somewhere quiet, and in the meantime I can help my wife with keeping an eye on the kids. But the problem is that with little kids crying or yelling in the background, the voice recognition often just doesn't pick up what I say.

  • Report this Comment On March 22, 2012, at 9:17 AM, rafuse wrote:

    "To learn german language I am using Speechtrans and I think speechtrans aquires more advance technology then what microsoft is offering. Speechtrans is most accurate app with most organic output voices, works on all versions of the iPhone, 3rd Generation iPod Touch, iPads and Android devices. The app helps me alot while travelling because Speechtrans supports total 28 languages and also upgrades new languages to existing users for free.

    The InterprePhone service is the latest innovation and lets the users communicate, without the need of an interpreter, via a telephone conference call.

    SpeechTrans apps can be used as your personal portable interpreter Facebook chat service integration allows users to communicate in different languages with outstanding clarity and minimal translation processing delay. Learn more at http://speechtrans.com"

Add your comment.

Sponsored Links

Leaked: Apple's Next Smart Device
(Warning, it may shock you)
The secret is out... experts are predicting 458 million of these types of devices will be sold per year. 1 hyper-growth company stands to rake in maximum profit - and it's NOT Apple. Show me Apple's new smart gizmo!

DocumentId: 1831417, ~/Articles/ArticleHandler.aspx, 11/27/2014 2:53:04 PM

Report This Comment

Use this area to report a comment that you believe is in violation of the community guidelines. Our team will review the entry and take any appropriate action.

Sending report...


Advertisement