Think of a familiar face— maybe that of a
classmate, a family member, or even a
famous celebrity— and ask yourself how
you recognize it. This question, known as
the facial recognition problem, might
seem easy to answer.
However, it's surprisingly difficult to
say exactly how you use visual
information to identify someone! You are
not alone! Computer scientists struggle
to develop facial recognition software
using traditional computer programs,
which are essentially step-by-step
instructions. It would be near-impossible
for them to develop instructions that
are general enough for all faces, yet
specific enough to identify an
individual among thousands of others.
Therefore, we need a different approach,
one for tasks that are too complex or
subjective for someone to translate into
a series of steps. Often this approach is
a machine learning algorithm. This type
of algorithm, or problem-solving method,
learns from large sets of known correct
and incorrect examples to gradually
improve its effectiveness. To help
explain machine learning, let's consider
a simple problem:balancing a stick. At
first, we might try to balance the stick
using random movements. This approach
won't work, but it helps us develop a
feel for which movements will recenter
the stick as it starts falling. After
trying again and again, we become
excellent at balancing the stick. Drawing
from thousands of trials, we produce an
intuitive set of rules that maximizes
how long the stick remains upright. What's key is that we've learned to
balance the stick without explicit
instructions— we only knew that our goal
was to balance the stick. This is roughly
how reinforcement learning, a type of
machine learning, is used to find
solutions to difficult problems. To
approach the facial-recognition problem,
scientists often use artificial neural
networks. This machine learning method is
based on webs of simple computational
units that work together to solve
complex problems. Each unit has a simple
task, outputting true or false based on
several inputs. These units are grouped
into three categories: the input layer,
the hidden layers, and the output layer.
The input layer reads an image of
someone's face. Each unit in this layer
uses a single pixel's color as its input.
The hidden layers recombine the input
layer's outputs. The units in each hidden
layer draw from the previous layer's
units to create new outputs. The meaning
behind the hidden layers' outputs is
often too abstract for humans to
understand. Finally, the output layer
gathers the last hidden layers outputs
and translates them into understandable
results. For facial recognition, the
output could consist of a single unit
whose output says whether the image
matches a given person's face. Once it's
set up, the neural network can learn from
incorrect guesses by adjusting how its
units react to their inputs. This process,
called training usually unfolds over
millions of attempts. Training gradually
improves the network until it's accurate
enough to be used for real-world face
matching with unknown data. Since machine
learning requires little task-specific
expertise, it can be applied to a wide
variety of challenges. Tasks like
translating between languages,
identifying objects and images, and
transcribing speech are where machine
learning excels. Other emerging
applications include medical diagnosis,
financial decision-making and
self-driving cars. These applications are
likely the first of many more to come.
 
