Speech recognition initially began in the 1950s with “Audrey,” which could only understand digits and a single voice. Advancements in the 1970s and 80s gave the technology a much larger vocabulary and the ability to recognize multiple voices, but a commercially-available product was not released until 1990, when Dragon designed Dragon Dictate and its more advanced version, Dragon Naturally Speaking. The software was prohibitively expensive, however, which is why it was never commonly used.
By the millennium, voice recognition technology could function at 80 percent accuracy, but consumers were still frustrated by the software because the keyboard and mouse were still easier to use. It was only until Google designed the Google Voice app for the iPhone that users began to see that voice recognition was a more useful alternative than punching words into a tiny virtual keyboard. Google made further improvements when it switched to the Android, adding a Voice Search that could record users’ voices for use with its Chrome browser.
Siri offers similar features to Google Voice Search in that it is based off cloud-computing technology, which draws upon a variety of shared resources to retrieve data. This enables the program to gather what it knows about the user to generate a more accurate reply. Siri also is the first voice recognition technology with a “personality,” as it sometimes responds to users in an amusing way. For example, if a user asks Siri for a good place to hide a body, it recommends nearby garbage dumps. If you ask it the meaning of life, it responds with, “42,” a reference to the popular science fiction novel, The Hitchhiker’s Guide to the Galaxy. Apple has made voice recognition both fun and relevant with the arrival of Siri.
How does Siri really work? When a user speaks into the iPhone, the speech is encoded into a digital form, which then gets relayed to the user’s Internet Service Provider, allowing it to communicate with a cloud-based server designed to comprehend language. The iPhone, itself, also analyzes the user’s speech in order for Siri to decide whether the command can be handled on the device (if they want to play music, for example) or if it needs to connect to a network. The user’s speech (all aspects of it) is then compared to a model to generate a list of interpretations. Siri chooses the most likely result, but will prompt a user for clarification if any ambiguity remains. Computing and IT news sources have claimed that this method is incredibly accurate and becomes even more so as Siri “learns” the user’s voice.
Siri is expected to become extremely popular for iPhone users because they can use it to access their calendars, gather information from their Contacts, set reminders and alarms, send e-mails and texts, get directions, play music, and even search the web – all without touching a single button.