Speaking to Machines

If you use computers every day, you're probably pretty comfortable with a mouse and keyboard. Err, who am I kidding-"comfortable"? When it comes to mouse and keyboard, it's embarrassing how good we are. We're like the master swordsman Ky?z? in Seven Samurai, if you replace crossing deadly steel with clicking the delete button on marketing emails.

But this reflects a lot of training on our part. It's not because the mouse and the keyboard are freakishly efficient and easy to use; it's because some of us use them so much we can't help but be aces. But without some serious training time, these tools can be brutal.

Everything we take for granted when it comes to computer input is actually the legacy of older technology. Take QWERTY keyboards-extremely common in the world, almost ubiquitous in the United States and parts of Europe. But why are the keys laid out like this? Is this really the best way to get character-based input into the machine? If you listen to some fans of the alternative Dvorak keyboard layout, you'd think QWERTY was a torture device designed by the demon Moloch inside a horrible fortress of chaos. Some Dvorak advocates claim that the alternative layout can increase your typing speed by up to 20 percent. But why do so many computer keyboards use QWERTY in the first place? Turns out it's because if you go back half a century or so, most typewriters used QWERTY. Why did most typewriters use QWERTY? To keep their clunky mechanical keys from jamming. In other words, the original justification for QWERTY has long vanished, but millions of children every year are still working hard, not looking at the keys, trying to learn this archaic, inefficient method of getting the words in their brain translated into digital data.

The way to the future is to break our dependence on this arbitrary legacy-not just on keyboard layouts, but on keyboards themselves; not just on the two-button mouse, but on the whole mouse concept. So what's next?

A few things in the near future:

Gesture control. We're not waiting on some kind of brand new technological paradigm here - gesture control is already pretty amazing. Take the standard Xbox Kinect, which Microsoft released to the public in late 2010. It is, essentially, gesture control - cameras record your body's movements and translate those movements into data, which can be used to control games and apps for the Xbox. When the Kinect debuted, people went nuts. It quickly became, if we believe the folks at Guinness World Records, the "fastest-selling gaming peripheral" in world history. It's easy to see why gesture and movement control would open up lots of possibilities in gaming-especially in environments where you're supposed to dance, jump, swing a club or a paddle, or do anything involving large movements.

Microsoft has released a subsequent version of the Kinect for Windows, though it hasn't taken off with everyday consumers in the same way. Part of the issue is that we're used to gaming from across the room, using less familiar interfaces. When we're doing our everyday computing for work or recreation, we expect to be close to the screen-in fact, too close for the Kinect for Windows to register.

What does a real upgrade in computer gesture control look like? I think one example is demonstrated in John Underkoffler's TED talk from a few years back:

It's easy to see how gesture control like this could be a huge upgrade for some users, if you're trying to manipulate objects in a 3D virtual space. Imagine you're an engineer, designing a new desalination pump in an auto-CAD program. Wouldn't it be great if you could manipulate the parts of this design with six degrees of freedom, as if you were actually reaching straight into the screen? Same goes if you want to create a smooth 3D object with digital modeling clay.

But when it comes to other actions, especially abstract ones, like the entry of text data, gesture control doesn't seem soon to overtake the usefulness of even something so legacy-bound as the keyboard. But perhaps something else could ...

Voice/speech recognition. Like gesture control, speech recognition has already made it into some widely available consumer technology, and you could argue that what we're waiting on is some unknown but attainable degree of usefulness, not a seriously major breakthrough. When you call your bank or insurance company these days, you might navigate automated menus by speaking commands into the receiver rather than by punching numbers on a touch-tone keypad. "Payment. PAY-MENT." While this system isn't really much easier than the old way, it does represent a huge investment of money and research. Just getting a computer to register the sounds that are coming out of your mouth is way harder than it seems, as we discuss in our podcast on voice recognition. But keep in mind that recognizing a single word is not the same as recognizing speech.

On the iPhone 4S we first met Siri, the voice-activated digital assistant who can respond to questions like "Should I wear a coat today?" and "Where can I hide a body?" (wry suggestions include "swamps," "dumps," and "metal foundries"). Not to take sides in a tech industry war, but word-to-word voice recognition powers aren't the only thing that makes Siri somewhat more popular than Clippy. Siri is also fairly smart and some would say even downright useful, which is due to the fact that she has moderately powerful language processing capabilities. She can not only understand the words you say, but she can often make sense of the sentences you speak, and do something helpful in response. The example I used above - "Should I wear a coat today?" - is a good one, though I suspect Siri's ability to respond to this question with weather information is a result of a lot of hard-coding of responses on the back end. In other words, my guess is that someone brute-force programmed Siri to understand that questions about coats and umbrellas should get responses about local weather. This is evident in the fact that, as many YouTube videos demonstrate, it's possible to confuse Siri in lots of funny ways, just by asking unpredictable questions that don't translate easily to keyword-based commands. As smart as she is, she doesn't really understand English grammar and the meanings of sentences in a deep way - she responds to certain keywords and frequently asked questions that were easy for her designers to anticipate.

A profound level of deep language comprehension is where I think many of the advances in future speech recognition will come. A field called natural language processing is dedicated to getting us there. Once a computer can reliably translate the sounds you make with your mouth into text data, how does it make sense of that data? Will you ever be able to talk to your computer in a way that's not constrained by command protocols, but as free as the way you speak to your co-workers?

Both gesture control and speech recognition are already well underway, and they're only getting better. But what lies in the far future for human-computer interfacing? Is there something we can use to control computers even more quickly and seamlessly than our hands and our voices?

Yes. I'll give you a hint: It weighs about 3 pounds, and it looks like Krang.

Brain-computer interfaces are not just creepy sci-fi. We already have them. And while our capabilities are much more primitive than what we can do so far with gesture or voice, this may also be a simple matter of degree rather than a goal hidden behind an unknown fundamental breakthrough.

Multiple experiments have successfully trained monkeys to use neural implants-that is, impulse-sensing electrodes surgically inserted into their brains-to manipulate digital information, doing everything from moving computer cursors to controlling robotic arms. And as for that latter, we're no longer in simian-only test territory. For example, researchers at the University of Pittsburgh Medical Center have used neural implants to help a person with total paralysis below the neck to feed herself with a mechanical arm wired up to her brain. If the idea of having a wire poking out of your head seems too frightening, take a look at this upgrade from researchers at Brown: a rechargeable, self-contained implant that is fully wireless.

In the immediate future, these advances will be most useful to people who have lost use of their limbs, but far enough in the future, will it be too irresistible a temptation to control our electronic devices wirelessly with our thoughts? In 1960, you had to walk across the room to change the channel on your TV. In 2010, you had to reach for the remote and press a button. Within 50 or 100 years, will even the remote control be an archaic chore?