A human child is able to reliably grasp objects
after one year, and takes around four years
to acquire more sophisticated precision grasps.
However, networked robots can instantaneously
share their experience with one another, so
if we dedicate 14 separate robots to the job
of learning grasping in parallel, we can acquire
the necessary experience much faster.
Google Research Scienctists are working on
implementing this concept.
While initially the grasps are executed at
random and succeed only rarely, each day the
latest experiences are used to train a deep
convolutional neural network (CNN) to learn
to predict the outcome of a grasp, given a
camera image and a potential motor command.
This CNN is then deployed on the robots the
following day, in the inner loop of a servoing
mechanism that continually adjusts the robot’s
motion to maximize the predicted chance of
a successful grasp.
In essence, the robot is constantly predicting,
by observing the motion of its own hand, which
kind of subsequent motion will maximize its
chances of success.
The result is continuous feedback: what we
might call hand-eye coordination.
Neural networks have made great strides in
allowing us to build computer programs that
can process images, speech, text, and even
draw pictures.
However, introducing actions and control adds
considerable new challenges, since every decision
the network makes will affect what it sees
next.
Overcoming these challenges will bring us
closer to building systems that understand
the effects of their actions in the world.
If we can bring the power of large-scale machine
learning to robotic control, perhaps we will
come one step closer to solving fundamental
problems in robotics and automation.
