Loud and clear Advances in speech recognition technology to rapidly changing the channels of business
by Anne Schmitt Daily Herald Business Writer
Posted on Monday, December 14, 1998

Anyone who has ever tried to punch the name of a person, movietitle or stock listing on a telephone keypad will appreciate thisnews: soon you may not have to.

For years, television and movies have dangled before us thepossibilities of a Star Trek-like dialogue with our computers andappliances.

We have had glimpses of speech technology through computerdictation programs, particularly for the legal and medicalprofessions, and through the phone companies' collect callservices.

But speech-based interaction with computers now is poised totake off in a broad array of uses thanks to the convergence ofimproved speech recognition technology, better sound cards andmicrophones, and faster, more powerful and less expensivecomputers, say people in the industry.

"We've just hit that threshold where it's real," said DavidBarnes, who holds the lofty title of lead evangelist for speechtechnology for International Business Machines. IBM, one of thebiggest players in speech recognition, has been investing in thetechnology for 25 years. "It's going to happen," Barnes said.

What does that mean for most people? It means being able tocall MovieFone and tell the computer answering your call that youwant to see "A Bug's Life" at a theater near your home inNaperville. It means telling a Fidelity Investments' computer to"Buy 100 shares of AOL." It means practicing Spanish at your homecomputer with a program that lets you know if your pronunciationis wrong.

The market for speech technology today remains relativelysmall. TMA Associates, a speech technology consultancy inTarzana, Calif., estimates worldwide sales of the technology willreach $1.1 billion this year and take off during the next coupleyears.

By 2002, total worldwide revenue from speech recognitionsoftware, hardware, services and license fees for telephony usescould be more than $22 billion, according to TMA estimates. Otherindustry estimates place the market for speech technologyapplications at about $3.4 billion by 2002.

How does it work? The computer breaks up the words intosegments, comparing them to its database of real words. Then thecomputer calculates the probability that a speaker is saying aparticular word or sentence. Once it has identified the words,the computer throws out the words that don't advance the meaningof the sentence and compares what is left to words and commandsin its database.

Over the phone

To date, companies have shown the most interest in developingspeech recognition applications for telephone systems, saidWilliam Meisel, a consultant with TMA Associates. The reason?Most people don't like using the touch-tone phone keypads foranything but the simplest entries, such as a phone number orsocial security number, he said. The keypad systems also can beunforgiving if a caller presses a wrong button or changes hismind about a request.

Speech-based systems remedy some of those problems, whilestill diverting simple requests away from more costly liveoperators, Meisel said.

Like other financial services companies, Fidelity Investmentsis moving gradually to offer more speech-based services to itscustomers. Today, Fidelity overlays its speech recognitionsoftware over its current "press-one-for-your-account-balance"menu. Callers still have to navigate the touch-tone menu, butthey are able to speak the name of the security or fund they wantto order, rather than take the time to spell it, said JudithMcMichael, Fidelity's vice president of marketing.

Starting early next year, Fidelity will broaden the service toprovide an interaction more closely patterned after the waypeople talk to each other, McMichael said. "The systemessentially says, 'What would you like to do?' " she said.

Customers will be able to get stock quotes and accountbalances, set up personalized portfolio information, place ordersfor stocks, options, and mutual funds, get information, or changepending orders. A selected group of customers will begin testingthe system later this month.

"The payback for us is more of our customers will be able touse an automated system happily," McMichael said.

Lucent Technologies has developed a speech-based system forMovieFone, the nationwide movie guide and ticketing service, thatwill let callers to the service say the names of movies or giveother information, such as a zip code or credit card number,rather than punching it into the touch-tone keypad.

Over time, speech-based systems will simply ask callers, "Howcan I help you," predicts David Thomson, technical manager forLucent's speech processing group.

The company already has been testing such systems.

Lucent tried out a speech system that let Naperville residentscall and ask, for example, "Where can I see 'Home Fries' inNaperville?" rather than answering questions from a prepared menu.

"We were trying to answer 'Are people able to construct ameaningful sentence that reflects what they want?' " Thomsonsaid. They were. "People really liked it," he said.

Another natural language test with an insurance company showedthat a system could be tailored to take and respond to aspontaneous call with an error rate less than a live operator,Thomson said.

"We're getting better at the natural language speech," hesaid. "Ultimately we would like to completely eliminate menus.That menu system is very tedious."

Schaumburg-based Motorola Inc. has introduced its VoxML, orVoice Markup Language, that ultimately will let people getinformation and online content from the Internet over the phone.The language translates the voice request to the language of theWorld Wide Web. Once the system retrieves the information, thecomputer provides it in VoxML, and the answer is translated fromtext to speech for the caller.

Motorola said Internet content providers, including TheWeather Channel, CBS MarketWatch.com, Biztravel.com andSmartRoute, already have used VoxML to create voice interactiveservices.

TMA's Meisel said the technology is potentially verysignificant because it would give people Internet access throughthe most pervasive technology we have - the telephone.

"If you can call an 800 number and talk, immediately everyphone becomes a new device," Meisel said. "There's room forexplosive growth."

VoxML's success depends on the creation of so-called voiceportals, say a Yahoo! or America Online service that works byvoice command instead of through the PC, Meisel said.

It is an attractive concept to companies that already have Webpages because it could increase their site's exposure with littleadditional cost because the VoxML uses the same database as theexisting site, Meisel said.

On the desktop

As speech technology improves, more applications are beingdeveloped for desktop computers, too.

A recent speech recognition conference in China hosted byIntel drew 600 software developers, said IBM's Barnes. At thefall Comdex show in Las Vegas, speech recognition software makerDragon Systems had one of the most popular booths, said WilliamSell, general manager of the computer industry trade show.

"The power of the equipment that is out there has advancednicely so people can readily use the technology," Sell said.

Viable speech recognition systems for desktop computers willhelp computer makers reach a population that believes computersare too hard to use, Barnes said. Businesses that use computerslike the idea of voice-activated commands because they willrequire less employee training to use, he said.

IBM, Dragon Systems and Belgian firm Lernout & Hauspie havethe best known off-the-shelf versions of speech recognitionsoftware. The newest speech dictation programs let people talk ata pace closer to their natural speed, said Kristin Wahl, an IBMspokeswoman. They are also better able to pick up the subtletiesof human language, she said. So if you say "twenty-five dollars,"the machine understands "$25."

Now, someone with speech recognition software can direct theircomputer to, say, "Start program Lotus Notes," Barnes said.Within the next two years, computer users will be able to givethe more general command, "Check my e-mail," he predicts.

"At some point we'll be able to do that with all of thoseapplications and, eventually, we'll build applications with thevoice in mind," Barnes said.

Aurolog, a French company that makes the "Tell Me More"language learning software series, uses speech recognitiontechnology as the core of its instruction model.

"It provides real interactivity. Without the speechrecognition, what you end up having is a book or a tape on CD. Itdoesn't provide the dialogue capability speech recognition has,"said Bradley Holcomb, vice president of marketing and sales forAurolog's U.S. operations. "It allows for interactivity betweenthe student and the computer and it allows for natural learning,which is mimicking."