Statistically Speaking, Robot Vision Is Hard

Jonathan Strickland

Next time a robot calls you a "feeble human," just ask it to pick out the one tangerine in a pile of assorted fruits by sight alone. Who's feeble now? | Credit: iStock/Thinkstock

You've probably seen -- or maybe even built -- robots that use sensors as a means of gathering information about the environment they're in. These robots can detect objects in their path and route around them (or sometimes plow right through them). But these basic robots aren't "seeing" anything -- they're sensing an obstacle rather than identifying it. As it turns out, teaching robots to see in a way that's similar to what we do naturally is tough.

Let's imagine a quick scenario: You're searching for a particular object amongst a clutter of other stuff. For me, it's nearly always my master remote control for my man-cave entertainment system. But I've accumulated so many whozits and whatsits (not to mention gadgets and gizmos) that it can take me a while to find the remote. Fortunately, I can distinguish between objects by sight pretty efficiently, no matter what orientation they might be in.

That's quite a trick, as it turns out. I can recognize my remote whether it's face up, face down, upside down or partially covered by my latest copy of Ukulele International. But imagine teaching that same skill to a robot. Suddenly this completely natural ability we humans possess becomes a head-scratchingly difficult problem.

Over at MIT, graduate student Jared Glover is exploring a new way to teach robots how to see. Glover's method uses an old statistical approach called the Bingham distribution. Working with former MIT student (and current Google employee) Sanja Popovic, Glover designed an algorithm that helps robots identify objects through a probabilistic approach.

Here's how it works -- first, you have to teach the robot about objects, like my remote control. You build a virtual model of the remote control, like a grid outline. The robot knows the proportions of the remote control, what its edges look like and how parts of it are flat and other parts are curved. Then, you show the robot a scene of clutter (like my coffee table).

The robot starts to look for objects that match the model it has learned. It compares surfaces, edges, corners and other distinguishing features of the model against the objects on the table. If an object matches the model at a high enough degree of certainty, the robot identifies that object as the one you were looking for.

This is different from other models that use high-resolution data imagery and complicated algorithms for robot vision. And the best thing about it is it seems to work! Glover and Popovic's algorithm successfully identified more objects more frequently than the leading methodologies used in the field of robotics right now.

Of course, teaching a robot to see is still a really arduous task. We can't just hold up a representative object -- like my remote control -- and expect the robot to identify all remote controls based off that one example. But every step forward brings us closer to a world where robots can navigate through and interact with environments similar to the way we do.