Siri, where are you?

When I was on the plane from San Fran to Toronto, I managed to watch a Nova episode called “The Smartest Machine on Earth” about the development of Watson, IBM’s computer that bested Ken Jennings and Brad Rutter in the 3-day Jeopardy challenge.

It documented the challenges of computationally interpreting the English language. For those unfamiliar with Jeopardy, contestants are confronted with a factual statement. They must correctly provide the question (e.g., For $200, a contestant will see “This device lightly burns slices of bread.” The contestant must answer, “What is a toaster?””).

It immediately reminded me of the challenges that Apple has gone through in the development of Siri and why Siri is in “Beta”. It’s not the voice recognition algorithms, it’s all about machine learning and the gathering of voice data.

Generally, I have a hard time using any type of voice recognition software. I attribute it to my rather monotone and low-pitch voice. It just doesn’t register. Everytime I have to go through a voice controlled menu, I cringe. It just doesn’t work.

The only success I’ve ever had is with Google Android’s voice transcription. Similar to Siri, it uses a data connection to process the sound in the Google cloud. The reason why this is so accurate is Google has a huge corpus of voice data collected through a short-lived mobile service called Google 411:

GOOG-411 (or Google Voice Local Search) was a telephone service launched by Google in 2007, that provided a speech-recognition-based business directory search, and placed a call to the resulting number in the United States or Canada. The service was accessible via a toll-free telephone number. It was an alternative to 4-1-1, an often-expensive service provided by local and long-distance phone companies, and was therefore commonly known as Google 411. This service was discontinued on November 12, 2010.

Similarly, it allowed for Google to:

…build a large phoneme database from users’ voice queries. This phoneme database, in turn, allowed Google engineers to refine and improve the speech recognition engine that Google uses to index audio content for searching.

So I’m sure when Siri comes out of beta, Apple will have built a similar phoneme based on all the people using the service right now. It will only get better and I am sure that this is the next interface revolution coming in to mobile.

Pixels and Widgets

Comments

Leave a Reply Cancel reply