Loud and
clear Advances in speech recognition technology to rapidly changing
the channels of business by Anne Schmitt Daily
Herald Business Writer Posted on Monday,
December 14, 1998
Anyone who has ever tried to punch the name of a person,
movietitle or stock listing on a telephone keypad will appreciate
thisnews: soon you may not have to.
For years, television and movies have dangled before us
thepossibilities of a Star Trek-like dialogue with our computers
andappliances.
We have had glimpses of speech technology through
computerdictation programs, particularly for the legal and
medicalprofessions, and through the phone companies' collect
callservices.
But speech-based interaction with computers now is poised totake
off in a broad array of uses thanks to the convergence ofimproved
speech recognition technology, better sound cards andmicrophones,
and faster, more powerful and less expensivecomputers, say people in
the industry.
"We've just hit that threshold where it's real," said
DavidBarnes, who holds the lofty title of lead evangelist for
speechtechnology for International Business Machines. IBM, one of
thebiggest players in speech recognition, has been investing in
thetechnology for 25 years. "It's going to happen," Barnes said.
What does that mean for most people? It means being able tocall
MovieFone and tell the computer answering your call that youwant to
see "A Bug's Life" at a theater near your home inNaperville. It
means telling a Fidelity Investments' computer to"Buy 100 shares of
AOL." It means practicing Spanish at your homecomputer with a
program that lets you know if your pronunciationis wrong.
The market for speech technology today remains relativelysmall.
TMA Associates, a speech technology consultancy inTarzana, Calif.,
estimates worldwide sales of the technology willreach $1.1 billion
this year and take off during the next coupleyears.
By 2002, total worldwide revenue from speech recognitionsoftware,
hardware, services and license fees for telephony usescould be more
than $22 billion, according to TMA estimates. Otherindustry
estimates place the market for speech technologyapplications at
about $3.4 billion by 2002.
How does it work? The computer breaks up the words intosegments,
comparing them to its database of real words. Then thecomputer
calculates the probability that a speaker is saying aparticular word
or sentence. Once it has identified the words,the computer throws
out the words that don't advance the meaningof the sentence and
compares what is left to words and commandsin its database.
Over the phone
To date, companies have shown the most interest in
developingspeech recognition applications for telephone systems,
saidWilliam Meisel, a consultant with TMA Associates. The
reason?Most people don't like using the touch-tone phone keypads
foranything but the simplest entries, such as a phone number
orsocial security number, he said. The keypad systems also can
beunforgiving if a caller presses a wrong button or changes hismind
about a request.
Speech-based systems remedy some of those problems, whilestill
diverting simple requests away from more costly liveoperators,
Meisel said.
Like other financial services companies, Fidelity Investmentsis
moving gradually to offer more speech-based services to
itscustomers. Today, Fidelity overlays its speech
recognitionsoftware over its current
"press-one-for-your-account-balance"menu. Callers still have to
navigate the touch-tone menu, butthey are able to speak the name of
the security or fund they wantto order, rather than take the time to
spell it, said JudithMcMichael, Fidelity's vice president of
marketing.
Starting early next year, Fidelity will broaden the service
toprovide an interaction more closely patterned after the waypeople
talk to each other, McMichael said. "The systemessentially says,
'What would you like to do?' " she said.
Customers will be able to get stock quotes and accountbalances,
set up personalized portfolio information, place ordersfor stocks,
options, and mutual funds, get information, or changepending orders.
A selected group of customers will begin testingthe system later
this month.
"The payback for us is more of our customers will be able touse
an automated system happily," McMichael said.
Lucent Technologies has developed a speech-based system
forMovieFone, the nationwide movie guide and ticketing service,
thatwill let callers to the service say the names of movies or
giveother information, such as a zip code or credit card
number,rather than punching it into the touch-tone keypad.
Over time, speech-based systems will simply ask callers, "Howcan
I help you," predicts David Thomson, technical manager forLucent's
speech processing group.
The company already has been testing such systems.
Lucent tried out a speech system that let Naperville
residentscall and ask, for example, "Where can I see 'Home Fries'
inNaperville?" rather than answering questions from a prepared menu.
"We were trying to answer 'Are people able to construct
ameaningful sentence that reflects what they want?' " Thomsonsaid.
They were. "People really liked it," he said.
Another natural language test with an insurance company
showedthat a system could be tailored to take and respond to
aspontaneous call with an error rate less than a live
operator,Thomson said.
"We're getting better at the natural language speech," hesaid.
"Ultimately we would like to completely eliminate menus.That menu
system is very tedious."
Schaumburg-based Motorola Inc. has introduced its VoxML, orVoice
Markup Language, that ultimately will let people getinformation and
online content from the Internet over the phone.The language
translates the voice request to the language of theWorld Wide Web.
Once the system retrieves the information, thecomputer provides it
in VoxML, and the answer is translated fromtext to speech for the
caller.
Motorola said Internet content providers, including TheWeather
Channel, CBS MarketWatch.com, Biztravel.com andSmartRoute, already
have used VoxML to create voice interactiveservices.
TMA's Meisel said the technology is potentially verysignificant
because it would give people Internet access throughthe most
pervasive technology we have - the telephone.
"If you can call an 800 number and talk, immediately everyphone
becomes a new device," Meisel said. "There's room forexplosive
growth."
VoxML's success depends on the creation of so-called
voiceportals, say a Yahoo! or America Online service that works
byvoice command instead of through the PC, Meisel said.
It is an attractive concept to companies that already have
Webpages because it could increase their site's exposure with
littleadditional cost because the VoxML uses the same database as
theexisting site, Meisel said.
On the desktop
As speech technology improves, more applications are
beingdeveloped for desktop computers, too.
A recent speech recognition conference in China hosted byIntel
drew 600 software developers, said IBM's Barnes. At thefall Comdex
show in Las Vegas, speech recognition software makerDragon Systems
had one of the most popular booths, said WilliamSell, general
manager of the computer industry trade show.
"The power of the equipment that is out there has advancednicely
so people can readily use the technology," Sell said.
Viable speech recognition systems for desktop computers willhelp
computer makers reach a population that believes computersare too
hard to use, Barnes said. Businesses that use computerslike the idea
of voice-activated commands because they willrequire less employee
training to use, he said.
IBM, Dragon Systems and Belgian firm Lernout & Hauspie
havethe best known off-the-shelf versions of speech
recognitionsoftware. The newest speech dictation programs let people
talk ata pace closer to their natural speed, said Kristin Wahl, an
IBMspokeswoman. They are also better able to pick up the
subtletiesof human language, she said. So if you say "twenty-five
dollars,"the machine understands "$25."
Now, someone with speech recognition software can direct
theircomputer to, say, "Start program Lotus Notes," Barnes
said.Within the next two years, computer users will be able to
givethe more general command, "Check my e-mail," he predicts.
"At some point we'll be able to do that with all of
thoseapplications and, eventually, we'll build applications with
thevoice in mind," Barnes said.
Aurolog, a French company that makes the "Tell Me More"language
learning software series, uses speech recognitiontechnology as the
core of its instruction model.
"It provides real interactivity. Without the speechrecognition,
what you end up having is a book or a tape on CD. Itdoesn't provide
the dialogue capability speech recognition has,"said Bradley
Holcomb, vice president of marketing and sales forAurolog's U.S.
operations. "It allows for interactivity betweenthe student and the
computer and it allows for natural learning,which is mimicking."
|