A very brief history of talking to computers
Possibly the first voice activated machine was a 2$ toy called Radio Rex produced in 1920s. It was a spring activated dog that popped out of a little kennel whenever it detected a sound in the 500 Hz range. It wasn’t Siri, Cortana or Alexa, but quite impressive for it’s time. Later in the 80s, IBM introduced a dictation taking computer which didn’t really catch on. It wasn’t until recently with the introduction of Apple’s Siri that the fiction of Star trek or HAL started approaching reality. A little while before Siri, in the 90s there existed voice dialers which allowed people to dial unto ten different phone numbers by voice. By 2000s, Interactive Voice Response (IVR) systems became fairly commonplace. Apple unveiled Siri in 2011 on the iPhone 4S, but it was the introduction of Amazon’s Alexa in 2014 that brought voice interfaces to the forefront. Every major tech company followed suite later.
Hey! But I already use my fingers to swipe on Tinder and scroll on Instagram. Why should I care about voice?
Voice has some clear advantages
- In many cases, it is faster to say something than perform complex actions to complete a task. It is easier to say ‘set an alarm for 7 AM’ than finding your alarm app, click on the set alarm button, select 7, then select AM and finally click on the save button.
- In cases where users can’t use their hands like driving or cooking, speaking is much safer and practical.
- There is little or no learning required to interact with machines by voice. Talking is intuitive. Humans already know how to talk. Communication by speech is ubiquitous.
- Voice can convey much more information than text alone. Speech consists of tone of voice, intonation, rate of speech. Embedded in voice is information about emotions. A computer can respond much better to speech than text alone.
That said, it’s not always appropriate to use voice as the only medium of communication with computers
- It is downright irritating to operate computers with voice in public spaces. Imagine everyone in an office environment interacting with their computers through voice. It would be chaotic.
- Sometimes it is preferable to text. Right now, it is difficult to undo what you’ve said. It’s easier to delete text when communicating something that requires thought and introspection.
- Privacy is a major concern. You wouldn’t want to speak your health issues to a computer in a crowded environment. It would be a violation of privacy if the computer reads out your messages without confirming.
The most important thing is to start thinking of voice interface as just another mode of interacting with machines. Talking to your computer in many cases will be cumbersome or downright inappropriate. Saying ‘scroll up’ every time you want to scroll your facebook feed doesn’t sound very intuitive but ‘share my trip photos with my girlfriend’ is.