Public attention to the voice industry has centered primarily on smart speakers. Dubbed “voice first” devices by marketers, these are cylinders (or more recently other shapes) that sometimes come with screens. Ask a question or make a request, and the devices can access a huge number of information sources including app-like add-ons contributed by thousands of companies, nonprofits, and even individuals. Owners most typically use the devices to check the weather, set timers, learn recipes, listen to music, play games, ask for facts, and buy things. In the United States the explosion of smart-speaker sales began around 2014 with the introduction of Amazon’s Echo and its assistant, Alexa. The Google Home came out almost exactly two years later, and then smart speakers from other fi rms came tumbling out. Apple and Samsung used preexisting assistants (Siri for Apple, Bixby for Samsung), and companies like Sonos built speakers that link to Alexa or Google Assistant, or both. Press attention during this period has see-sawed between the latest capabilities built into these devices and the new social dangers they represent. Many stories center on the smart speaker’s ability to “listen” and then answer. The gizmo starts recording whenever it hears the wake word (“Alexa,” “Hey Google,” “Siri”), and it tracks sound for up to sixty seconds each time. Ask “Alexa, what’s the temperature in Chicago?” and a (so far) immutable woman’s voice will provide a direct response. Try “Hey Google, how many plays did Shakespeare write?” and a female voice (which in this case you can change to male) will give a concise (and correct) answer (thirty-seven), along with two sentences that elaborate.
But there have been incidents. A six-year-old girl in Texas used Alexa to order a $170 dollhouse and four pounds of sugar cookies—by simply asking for them. An Amazon user in Germany requested data about what he had said to the device, and he instead received 1,700 audio recordings of someone he didn’t know. Alexa users have reported bursts of unexpected and scary laughter coming from the cylinder. A woman in Portland, Oregon, found out that the Echo had recorded a conversation she had had with her husband without the couple’s knowledge, then sent the recording to a random person on their contacts list. Amazon had rational explanations for all these issues—a parent not setting the purchase-protection code, an Amazon employee’s error, Alexa’s mistakenly hearing a command to laugh, a rare combination of speaking-and-listening accidents. “As unlikely as this string of events is, we are evaluating options to make this case even less likely” was the official comment on the Portland couple’s case, but it could have been the same for any of them. The voice firms are also playing whack-a-mole with hackers and trying to rid the system of bugs that could open the data to snoopers. A writer wryly described one of the privacy intrusion incidents as “the latest nightmare scenario for the tech-phobic.”
The real difficulties with the smart speakers and the voice intelligence industry, however, have yet to emerge. The unwanted incidents will come not from bugs, hacks, or glitches, but from features of technology that work properly. That’s because the system is evolving into a blueprint for marketers to use your body’s signals for gain. Consider:
- A cartoon drawing accompanying an Amazon patent depicts a woman “coughing” with a “sniffle” as she tells one of Amazon’s smart speakers, “I’m hungry.” The device picks up speech irregularities that imply a cold (“based at least in part on an analysis of pitch, pulse, voicing, jittering, and/or harmonicity of a user’s voice, as determined from processing the voice data”). Based on that conclusion, Alexa asks if the person wants chicken soup, and when she says no, offers to sell her cough drops with one-hour delivery. The scenario may sound helpful, but learning how often someone will need to drink chicken soup and agree to buy cough drops can lead to an AI program drawing inferences about a person’s short- or long-term health. These conclusions would have marketing value. Knowing via voice if someone is sick could benefit Amazon Pharmacy, the firm’s advising, ordering, and delivery service for prescription medicines.
- Another patent has Alexa listening through a smart speaker for “keywords” such as enjoyed or love. When it hears a trigger word, it “captures adjacent audio that can be analyzed on the device or remotely,” to figure out what the person enjoyed or loves; the individual might say I enjoy traveling to San Francisco or I love hip-hop or I love Judy. Tracking the keywords would allow Amazon to add information to people’s profiles so it can sell them items related to what they like and not what they dislike, and sell advertisers the ability to reach people with messages that reflect those sentiments. Amazon and its advertisers may also avoid making offers to people who say they love or enjoy what the advertisers disapprove of, or who for personal or cultural reasons don’t use those specific words to express happiness.
- A Google patent application describes the firm’s ability to use “characteristics of audio signatures, such as speech patterns, pitch, etc.,” to figure out who is in a room, whether they are “moving or performing other actions,” and how quietly they are doing it. Google describes a situation where parents can know from afar whether their children are sleeping or whispering. The latter, says Google, would indicate “mischief . . . is occurring,” and the system would notify parents and other adults so they could “exercise control.” The patent clearly aims at building the fi rm’s “smart home” business, an enterprise centering around devices like lamps, thermostats, and locks that respond to an owner’s commands through voice and touch.
Two Amazon representatives who wanted anonymity told me it is company policy not to comment about patents. Both also said patents take a long time to bear fruit. That should not prevent our discussing them. Amazon, Google, and other voice intelligence firms are in business for the long term, and our society will likely continue to be influenced by their innovations for generations to come. In fact, as if to underscore the utility of those patents, Amazon announced during fall 2020 that its just-released Halo health and wellness band is able to analyze the tone of its owner’s voice for “qualities . . . like energy and positivity.”Amazon declared that getting people to consider the emotions that their voice emits will encourage them to adopt healthier communication practices with their loved ones and bosses. The company asserted that the Halo’s security features would make its analysis off-limits to anyone but the person speaking; the voice profile, too, is explicitly not for use by third parties. Yet in the face of all the developments you’ll see in this book, it is hard not to understand the Halo’s professed capability as a proof of concept. The entire voice profiling idea demonstrated here can, as the patents suggest, easily be ported to the marketing realm and beyond.
From The Voice Catchers by Joseph Turow. Published by Yale University Press in 2021. Reproduced with permission.
Joseph Turow is the Robert Lewis Shayon Professor of Communication at the University of Pennsylvania’s Annenberg School for Communication. He is the author of numerous books, most recently The Aisles Have Eyes.