You are here: silicon.com > Comment & Analysis

Comment & Analysis

The Ovum View: Speech - IT meets telephony?

Check out the robot debt collectors...

By editorial@silicon.com

Published: 7 January 2002 14:43 GMT

Speech input and output to IT systems has passed the credibility threshold. Systems that can make sense of punters placing on-course bets prove the resilience of the technology. Graham Titterington, senior analyst at Ovum, reviews the prospects for speech interaction with IT systems...

The advantages of speech are not all obvious. Most of us are getting used to the experience of dialling into a conversation with a robot, but the converse of being called by a robot is not so common.

Hotel wake-up calls are an early example of such a service that has already tarnished the image of the concept through their unreliability. However, other applications are making ground.

The US has a one-year lead over Europe in deploying these systems. A debt collection service has found that robots are now achieving higher rates of collection than human-based systems because most debtors are people who are embarrassed by their shortage of cash rather than people who deliberately avoid payment. These people find the lack of emotion in the robot-based dialogue conductive to working out a repayment plan.

The call centre automation market is leading the take-up of speech recognition services. The economic case for this is clear. A typical call costs between $2 and $15 to handle using a human operator (depending on the level of expertise needed by the operator), whereas automated systems can cut this to 20 cents. For example, UPS said that it had reduced its call-handling cost to just one-fourteenth of what it had been on the manual system.

A European telco said that it saves $1m per year by cutting just one second off all of its directory assistance calls.

The company Nuance has conducted a survey comparing touchtone menu systems with speech input. This reported that as average call duration fell by 50 per cent (giving economies to both the enterprise and the customer).

Usage of the systems increased by between 20 per cent and 60 per cent, giving more custom, and 80 per cent of respondents preferred the speech system. More surprisingly, when the survey was extended to compare speech-based systems with human call centre operators 84 per cent of respondents said they preferred the automated system.

This was largely because the automated system was available 24 hours per day and because users had more confidence in the information they received. There is evidence that in repetitive tasks, speech recognition systems make fewer errors than humans do. Call centre automation also cuts queuing time.

Enterprises can extend business applications and information to their mobile workforce without incurring the expense of new client infrastructure for these users. For example, any handset can be used for limited interaction with a CRM system. This can later be integrated with other channels when the mobile employees have more powerful devices.

The fundamental levels of speech recognition can be performed on a server or embedded in a user device, such as a PDA. Clearly, the more sophisticated applications with large vocabularies will need to be server-based.

However, basic command and control can be programmed into a small device of modest power. We can expect to see embedded speech processors becoming commonplace in automobiles - many up-market manufacturers are already providing such devices.

In addition to their economy, embedded systems have some advantages over server-based systems, such as responding to, and compensating for, the noise level in the user's environment.

A distributed processing approach is valuable for mobile applications so that the embedded system can continue processing when the user passes through areas of poor radio network coverage. This is particularly relevant to the automobile market.

It is essential that automated systems can understand what the user is telling them in nearly all cases. This is not just a problem of understanding dialects and accents - it also requires the system to recognise ways of asking the same question. For example, Charles Schwab's share-trading system caters for 3.5 billion ways in which callers can say what they want to do, and claims a 95 per cent accuracy rate in its service level.

Vendors have used a wide range of techniques, allowing enterprises to choose from a range of products with significantly different approaches and economics. Many products on the market are large server-based applications that give the impression of using a 'brute force' approach to cracking this problem.

Nuance's approach to the problem is to allow users to define 'grammars' for dialogues and to have a backup word recognition system to try to understand queries that fail to match the grammar.

On the other hand, UK-based Vocalis - which started its interest in the field by developing systems for the Eurofighter project - prefers to focus on words to establish the context of a question.

However, all vendors are moving forward and the technology - both hardware and software - has reached the point where impressive business returns can be achieved.

As with speech recognition, voice authentication is an emerging field with many competing technological approaches. Although figures for the accuracy of these methods are available, there are so many parameters to tests that prove comparison is difficult. Most of the vendors claim that their technology is always improving.

Smaller companies can still find a role in the authentication market in particular. For example, Israeli company Configate believes it has identified approximately 200 significant parameters describing human voice, out of about 500 that are measurable. This enables it to implement faster and more lightweight processes to authenticate users.

The future speech recognition products may be able to interpret moods and intonation, as well as words - notably, to detect the 'yes' that means 'no'! But, don't expect speech products to give the full richness of human-to-human interaction.

Psychologists have found that the strongest signals come from body language, followed by words and followed by the mode in which the words are spoken. In other words, when a speaker emits conflicting signals the recipient normally reacts to the signals in this order.

There has already been much progress in identifying vocal characteristics as a means of authenticating the identity of the speaker. It is conceivable that by 2004 systems will be able to use the first 30 seconds of a conversation to authenticate the user for access to secure applications without the need for specific identification actions by the user, such as giving a password. This information can also be used to configure the portal for the session.

Speech will be an integral part of the IT applications of the future - things will never look, or sound, the same again.

  1. Zones
  2. Management
  3. Networks
  4. Software
  5. IT Services
  6. Hardware
  1. Verticals
  2. Public Sector
  3. Financial Services
  4. Retail & Leisure

  • Jobs
Pharmaceutical Company - Trial Data Manager - Contract Position

Key Words: Data Management, Trial Data Manager, Clinical Trials, EDC, Clinical Data Manager, Pharmaceutical, CRO, Clinical Research, Clinical Provide ...

NHS Clinical Coder North West

Maintain accuracy of clinical coding which has direct impact on the finance department and funding for the Trust. The post holder will be responsible ...

Head of Medical Affairs - South East - 100k

Key aspects of the role * Key opinion leader development and relationship management * Ensuring all promotional materials adhere to ABPI/MHRA/local ...

CIO50 2008
The silicon.com CIO50 2008 profiles the most influential and innovative tech chiefs in the UK across all industries and organisation size, from the biggest FTSE100 companies to high growth dot-com start ups and the public sector. The list was voted on by the UK CIO community and a panel of experts. Find out more in our latest special report.





Quick Sitemap Links: