You are here: silicon.com > Comment & Analysis

Comment & Analysis

Through the fog... Automated speech recognition

"Computer - take away two wrong answers..." - only much better than that

By Quocirca

Published: 21 February 2003 09:35 GMT

Quocirca

It seems every sci-fi film assumes we'll be talking to computers in the not too distant future. What's the reality? Quocirca analyst Dale Vile sees through the hype...

When it comes to talking to machines, most people's dialogue is restricted to a one-way flow of unprintable expletives directed at their PC, often calling into question the parenthood of a certain multi-billionaire. If they have experience of voice recognition, through trying dictation software or voice dialling on mobile phones, the chances are it has been pretty unimpressive. Which means there are many people out there who have written off automated speech recognition (ASR).

In the meantime, ASR vendors and implementers have been quietly getting on with it. Having come to terms with the fact that the market was unlikely to explode and make ASR the next big thing, they directed their efforts at specific problems. This has allowed them to deliver real business benefit, albeit in niche areas, that has in turn generated income and allowed them to continuously improve the technology. Those who have not experienced speech recognition recently might be surprised at the level of accuracy and functionality that can now be delivered.

In order to understand the ASR space, we need to bear in mind that there are some quite different categories of solution out there. At the highest level, we can consider these to be embedded command and control systems, desktop dictation systems and server-based speech applications.

The first category is concerned with embedding relatively simple ASR capability into a piece of equipment so it may be controlled in a hands-free manner by voice. The most familiar incarnation is voice dialling on mobile phones. In its current form, this entails you recording voice labels for contacts then saying the name to dial the relevant number. Sounds simple but it is actually a lot of hassle to set up and get working reliably for most users, so rarely gets used.

Companies like ART, however, have moved this kind of technology forward to deliver speaker independent dialling. What we mean by this is that the little speech engine embedded in the phone or PDA does not now need to be trained – it will interpret voice commands from any user straight out of the box. Furthermore, the user may issue commands using phrases like "Call Clive Longbottom on his mobile". This is known as a natural language approach, allowing the user to speak normally and have the system pick out key words to form a precise command behind the scenes.

This approach is likely to make voice control of communication devices actually useful. ART, along with other embedded systems vendors like IBM and Scansoft, are also working with automotive manufacturers to put similar capability into cars, where the safety and convenience of hands free control of climate, entertainment and navigation systems is attracting significant investment.

These embedded command and control systems sound very clever but in fact the number of commands they need to recognise is relatively small, so data storage and processing power is not a big issue. Small footprint speech engines are therefore likely to find their way into domestic appliances such as video recorders and central heating systems in the near future. At the other end of the spectrum, we have desktop dictation systems such as Scansoft's NaturallySpeaking and IBM's ViaVoice. These are very clever indeed, allowing a user to dictate directly into Microsoft Word, for example. There was an attempt to distribute these into the mass market but the reality is that unless they have been trained to understand not only an individual user's voice but their writing style and vocabulary too, they do not work well enough to be useful to most.

Today, there is a good market for such products but only in specialist areas such as the legal and medical professions where there is a culture of dictating client and patient notes. Vendors now produce specific vocabularies of legal and medical terms to speed their customers to productivity in these areas.

The most interesting ASR solutions for mainstream businesses are server-based. Imagine a larger multi-user version of the natural language embedded systems described above that runs on a serious computing platform and can be accessed by any user over a telephone line. With such a system we can implement exactly the same functionality as our phone-based voice dialling but based on the corporate phone directory rather than the contact book. This is an area in which vendors such as IBM and Telephonetics have pre-built solutions.

Taking the idea a step further, the underlying raw speech engines that drive these systems (from the likes of Nuance, Scansoft and IBM) can also be used to build fully functional customer service solutions. These may be used to offload work from a call centre in a much better way than those detested touch tone systems that callers try to 'zero out' of at the earliest opportunity to speak to that expensive live agent. A single natural language command such as "I would like to know the balance of my savings account" can bypass quite a few prompts from tradition touch tone menus.

Organisations like the BBC with its Freeview digital TV enquiry line have made use of ASR to avoid the massive cost of setting up or outsourcing additional call centre facilities. Other examples are the British Airways flight information system and Powergen's direct debit management. The reason such big players have taken ASR seriously is simply because ROI of six months or less can be achieved while simultaneously enhancing customer experience and thus the customer's perception of service quality. ASR systems are available 24x7 and you don't have to wait in long queues for the next available agent.

From an implementation perspective, the technical aspects of ASR projects are becoming easier as components have become more capable and open. Speech solutions nowadays should snuggle quite nicely into a company's existing web/internet architecture allowing reuse of existing application components and content. What's more critical to success is the design of the interaction between the customer and the system – the so called 'human factors' element. This determines how many users stick with the dialogue rather than zeroing out and undermining the business benefit. Looking ahead, there is no doubt ASR can deliver a great deal of value across the embedded, desktop and enterprise marketplaces. Some of the above players, along with specialists like Telisma, will also help telcos deliver voice-driven information services to their subscribers. In addition, we are beginning to see off-the-shelf applications emerging such as Avaya's voice access to Microsoft Exchange. Such pre-packaging is important to the development of the market.

There is unlikely to be a big explosion in activity, however, more of a steady proliferation. Nevertheless, there will come a day in the not too distant future when we look up and realise that it is actually quite normal to be talking to machines rather than just cursing them.

Quocirca will be producing a full report on speech recognition that can be obtained by sending an email to info@quocirca.com.

**Quocirca is a leading, user-facing analyst house known for its focus on the 'big picture'. For a full summary of its activities see www.quocirca.com, or reach the company's founding directors by emailing quocirca@silicon.com.

Also in this series: Through the fog... PKI Through the fog... Vendor-channel relationships Through the fog... What future photo messaging?

For Quocirca's 'What's the fuss about...?' series for silicon.com, see this page

And for their earlier 'Surviving the Recession' series, see this page

A leading user-facing analyst house known for its focus on the 'big picture', Quocirca is made up of a team of experts in technology and its business implications, including Clive Longbottom, Bob Tarzey, Rob Bamforth, Elaine Axby, Louella Fernandes, Sharon Crawford and Dennis Szubert. Their series of columns for silicon.com seek to demystify the latest jargon and business thinking. For a full summary of the consultancy's activities, see www.quocirca.com.

  1. Zones
  2. Management
  3. Networks
  4. Software
  5. IT Services
  6. Hardware
  1. Verticals
  2. Public Sector
  3. Financial Services
  4. Retail & Leisure

  • Jobs
Test Manual Test Engineer/BC/SC cleared/Doors/Rational experience

Aerospace, Automotive, Chemical and Natural Resources or Defence Knowledge of command and control systems Data warehousing experience Familiar ...

EXPERIENCED SALES PEOPLE URGENTLY REQUIRED!!! - UNCAPPED COMMISSION

As proof of this we have recently won an award in recognition for our contribution to training as demonstrated through our 0-12 month training ...

Network Security Administrator Level 2 (CCNA, CCNP)

Knowledge of Cisco IDS configuration and tuning - Ability to create and perform simple command line scripts, knowledge of PERL a plus - Proficient at ...

CIO50 2008
The silicon.com CIO50 2008 profiles the most influential and innovative tech chiefs in the UK across all industries and organisation size, from the biggest FTSE100 companies to high growth dot-com start ups and the public sector. The list was voted on by the UK CIO community and a panel of experts. Find out more in our latest special report.





Quick Sitemap Links: