There’s been a lot of angry chatter about our digital assistants responding to commercials lately. A commercial saying, “Hey Siri” or “Okay Google” or “Alexa” invokes our now-omni-present “friends”. Many of these scenarios can be avoided due to foreknowledge that the “wake word” will be spoken but the AI doesn’t know this. So what can we do? One obvious solution has existed for decades and comes from a non-obivious place: amateur radio. I propose a human-imperceptible “suppression tone”.
The digital assistants we love listen for their “wake word”. They’re capable of recognizing it without asking “The Cloud” for help. In order to recognize their wake word in all kinds of noisy environments, with all kinds of accents, from all kinds of voices, they need to have rather “loose” requirements for what qualifies. This means, for example, Alexa hears “Alexa” in some cases where nobody said “Alexa”. Modern AI is pretty good at filtering out those situations but since we’re not yet at a point where these AIs can be both reliable at language recognition and good at identifying a user by voice, we’re not yet ready to establish protocols for what an AI should be willing to do for “just anybody”.
What, then, about unintentional invocations of these digital djinn? There’s been a lot of justified bitching about the rudeness of commercials and even television shows that feature people using the technology by invoking the wake words. Since this is all relatively new to society, there is, of course, very little awareness (or established rules) of conduct here. So what do we do?
What we need is some sort of non-intrusive “mulligan” signal that tells the AI to ignore even its wake word. Turns out this problem has been solved before.
It’s not commonly known that amateur radio enthusiasts are responsible for developing much of the underlying technology we enjoy today (from the hardware that powers the Internet to cellular radios in our phones) but these multi-generational electronics and signaling tinkerers have come up with solutions to a variety of problems.
Take, for example, a “repeater”. To a ham (an amateur radio enthusiast), a repeater is a radio communications machine that has a receiver listening to an “input frequency” and a transmitter that directly transmits what it hears on its input to its “output frequency”. This lets us use shorter-range radio frequencies (such as the “2-meter” wavelength — let’s say 144MHz to 148MHz) to communicate over a much wider area than is normally possible on such a “line-of-sight” band. It also lets us use much lower-power (handheld) radios much like the Citizens Band (CB) radios in the same way – to cover a much larger area. To do so, you configure your 2-way radio to transmit on the repeater’s input frequency and to listen on the repeater’s output frequency. If everyone’s listening to the (known and documented) output frequency of a repeater whose antenna is very high up (such as a mountain peak or a commercial radio tower), little Joe or Mary Ham with their tiny 8-watt handheld can be heard 40 miles away even though their own radio’s line-of-sight might only be useful up to 5 miles from their location.
Let me unpack this. First, the handheld 2-meter-band radio, like a walkie-talkie (but a bit more powerful for a ham) may only be able to be “heard” by another handheld from just a few miles away because of local terrain (ie, both antennas are “down in a ditch”). On the other hand, if a repeater’s antenna is up much higher, it can “hear” that little handheld from much farther away (like cellphone towers) and, due to its height and much higher stationary power (think 50 watts vs. 8 watts), repeat that signal up to 50 miles (or more, if it’s on a mountain peak). That lets this tiny little voice be heard as if it’s the Booming Voice of God from On High to anyone within that repeater’s “footprint”.
There’s just one problem: stray signals. Should the repeater repeat just any old radio signal it hears, including noise? Sure, a lot of noise can be filtered out with “squelch” settings, but a sufficiently strong signal (because it’s either close to the repeater or far away and strong) will still make it through and “open up” the repeater.
So what’s the parallel? How many times have you lowered your voice while saying a known wake word so as to avoid invoking the AI you know is nearby? Maybe you’ve used the cover of background noise, hoping it couldn’t hear you. Maybe you tried saying it funny. Regardless, you tried masking your “noise” that you didn’t want the AI to recognize as a “valid signal”. Sound familiar?
Amateur radio’s got’chu, fam! This was solved long ago using sub-audible tones. They’re like telephone touchtones (“DTMF”) humans can’t hear because they’re outside our perceptible audio range.
Most repeaters require an “access tone” to cause the repeater to re-transmit what it hears on its input frequency. The inaudible tone is mixed in with the user’s transmission and, as long as the tone is present, the repeater will re-transmit (repeat) on its output frequency the signal it hears on its input frequency. Users must know the repeater’s access tone and program their radio to mix it in with their transmission. The access tone might be 100MHz or 88.5Mhz or any of a number of other common (or uncommon) tones. You can’t hear it, but the repeater can. It’s just a constant sine wave at the specified frequency.
Such tones are used for receiving as well. For example, a police and emergency scanner (a receive-only radio) will stay quiet until it hears a valid signal on one of its monitored channels. This is useful since there is always background noise at any radio frequency, you’d hear nothing but static until a signal came through. You can adjust for the (always changing) noise floor using a squelch knob but that’s not so convenient. Using a “squelch tone”, you can “open up” the receiver only if the signal it’s hearing is accompanied by a constant sub-audible tone at an agreed-upon frequency. But that’s kind of the opposite of what I’m proposing.
I’m proposing a “suppression tone”. That is, a universal standard sub-audible frequency all AIs use that will cause them to ignore their wake words if present. Let’s say 100MHz. If any AI hears their wake word in the presence of a constant 100MHz tone (which we can’t hear but speakers can easily reproduce), it should ignore its wake word.
This would allow any commercial, movie, television show, etc. to mask their use of a wake word without pissing off their AI-owning viewers. Obviously there should be guidelines. They should only emit the tone while the wake words are being uttered. To do otherwise would render AIs useless as long as the content is playing.
This brings me to my final point.
Like those TV zapper devices that let bar patrons turn off annoying TVs by sending every known “TV off” infrared code, this could be easily abused by assholes. The FCC already has hefty fines for cell phone signal jammers so the same thing should exist for AI suppression tones. Otherwise a simple app for iOS, Android, etc. could easily emit this tone and annoy anyone gleefully shouting “Okay Google!” This may or may not be desirable but it’s certainly an asshole thing to do.
Conclusion / TL;DR
If we standardize on an inaudible (and so, unobtrusive) suppression tone and require its use in any content that uses known wake words, we can mitigate one of the biggest complaints about – and the potential abuse of – the ongoing proliferation of suggestible AI devices.