AI today is being successfully
applied to image and video data,
to language data, to speech data,
to many other areas.
In this video, you
see a survey of
AI applied to these
different application areas
and I hope that
this may spark off some ideas
of how you might be able
to use these techniques
someday for your own
projects as well.
Let's take a look. One
of the major successes
of deep learning has
been Computer Vision.
Let's take a look at
some examples of
computer vision applications.
Image classification and
object recognition refer to
taking as input a picture like
that and telling us what
is in this picture.
In this case, it'd be a cat.
Rather than just
recognizing cats,
I've seen AI algorithms able to
recognize specific
types of flowers,
AI able to recognize
specific types of food and
the ability to take as
input a picture and
classify it into
what type of object,
and this is being used
in all applications.
One specific type of
image classification
that has had a lot of
traction is face recognition.
This is how face recognition
systems today work.
A user might register
one or more pictures of
their face to show the AI
what they look like.
Given a new image,
the AI system can then say
is this the same person?
Is this you?
Or is this a different person
so that it can decide a decision,
unlock the door or
unlock the cell phone,
unlocked the laptop or something
else based on the identity
of the person.
Of course I hope
face recognition will only
be used in ways that respect
individuals privacy,
we'll talk more about AI in
society next week as well.
A different type of
computer vision algorithm
is called object detection.
So, rather than just tried to
classify or recognize an object,
you're trying to detect
if the object appears.
For example, in building
a self-driving car,
we've seen how an AI system
can take as input a picture
like this and not just tell
us yes or no, is there a car.
Yes or no, is there pedestrian
but actually tells
us the position of
the cars as well as
the positions of the
pedestrians in this image,
and object detection
algorithm can also
take as input a picture
like that and just say,
no I'm not finding any cars
or any pedestrians in that image.
So rather than taking
a picture and labeling
the whole image which is
image classification,
instead an object
detection algorithm
will take us input
an image and tell us
where in the picture
different objects
are as was what are
the types of those objects.
Image segmentation takes
this one step further.
Given an image like this,
an image segmentation
algorithm we output,
where it tells us not
just where the cars and
pedestrians but tells us
for every single pixel,
is this pixel part of
this car or is this pixel
part of a pedestrian.
So it doesn't just draw
rectangles around
the objects and detects,
instead it draws
very precise boundaries
around the objects that it finds.
So, in reading
x-rays for example,
it would be an image
segmentation algorithm
that could look at
an x-ray scan or
some other image of
a human body and
carefully segment out,
where's the liver or
where's the heart
or where is the bone
in this image.
Computer vision
can also deal with
video and one application
of that is tracking.
In this example, rather than just
detecting the runners
in this video,
it is also tracking in
a video whether runners
are moving over time.
So, those little tails below
the red boxes show
how the algorithm
is tracking different people
running across several
seconds in the video.
So, the ability to track
people and cars and
maybe other moving objects
in a video helps
a computer figure out
where things are going.
If you're using a video camera to
track wildlife for example,
say birds flying around,
a tracking algorithm will
also be the helper track
individual birds flying across
the frames of your video.
These are some of
the major areas of
computer vision and
perhaps some of
them will be useful
for your projects.
AI and deep learning
specifically is also
making a lot of progress in
Natural Language Processing.
Natural Language
Processing or NLP
refers to AI understanding
natural language,
meaning the language
that you and I might
use to communicate
with each other.
One example is text
classification
where the job of the AI
is to input a piece of
texts such as an email and tell
us what is the cause or what
is the category of this email,
such as a spam or non-spam email.
There are also
websites that would
inputs a product description.
For example, you might write,
I have a secondhand cellphone
for sale and automatically
figure out what is
the product category in
which the list is product.
So, that would go
under cellphones
or electronics or if you write,
I have a new t-shirt
to sell then it
would list it automatically
under clothing.
One type of text
classification that has had
a lot of attention is
sentiment recognition.
For example, a sentiment
recognition algorithm can take as
input a review like
this of a restaurant,
the food was good and
automatically tries to tell
us how many stars
this review might get.
The food was good as
a pretty good review
maybe that's four over
five-star review.
Whereas if someone writes
service was horrible,
then the sentiment recognition
algorithm should
be able to tell us
that this corresponds maybe
to a one-star review.
A second type of NLP or
Natural Language Processing
is information retrieval.
Web search is perhaps
the best known example of
information retrieval
where you type
in the text query and you want
the AI to help you find
relevant documents.
Many corporations will also have
internal information
retrieval systems
where you might have
an interface to
help you search just within
your company's set of
documents for something relevant
to a query that you might enter.
Name entity recognition is
another natural language
processing technology.
Let's illustrate it
with an example.
Say you have this sentence
and you want to find
all the people names
in the sentence.
So, Queen Elizabeth the
second is a person,
Sir Paul McCartney as a person.
So, the sentence Queen Elizabeth,
the second night
to Paul McCartney
for a service of music at
the Buckingham Palace,
it would be a name entity
recognition system
to confine all the people's
names in the sentence like this.
If you want to find
all the location names,
all the place names in
a sentence like that,
a named entity recognition
system can also do so.
Name entity recognition
systems can also
automatically extract
names of companies,
phone numbers, names
of countries, and so,
if you have a large document
collection and you want to
find automatically
all the company names,
or all the company
names the occur
together or all the
people's names,
then a name entity
recognition system
would be the tool you
could use to do that.
Another major AI application
area is machine translation.
So, for example, if
you see this sentence
in Japanese, AI [inaudible].
Then hopefully a machine
translation system can input
that and output the translation
AI is in the electricity.
The four items on this slide:
text classification,
information retrieval,
name entity recognition,
and machine translation, are
four major categories of
useful NLP applications.
If you work with an NLP team
you may also hear them talk
about parsing and part of
speech tagging technologies.
Let me tell you what these are.
Let's take the example sentence,
"The cat on the mat".
A part-of-speech tagging
algorithm will go through
all the words and
tell you which of
these words are nouns,
which of these words
are verbs, and so on.
For example, in the
English language cat
and mat in the
sentence are nouns.
So, the part of
speech tagger we'll
label these two words as nouns.
According to the theory
of English language,
the word the is a determiner.
Don't worry if you've never
heard of a determiner before,
this is a word from the theory
of English language,
and the word on is a preposition.
So, part of speech
tagger will label
these words like that.
Well, why do you care?
If you're building
a sentence classifier
for restaurant reviews,
then a part-of-speech tagging
algorithm would
be able to tell you which
are the nouns, which
are the verbs,
which are the adjectives,
which are the adverbs,
and so on, and therefore,
help your AI system
figure out which of the words
to pay more attention to.
For example, you
should probably pay
more attention to the nouns
since those seem like important
words. Maybe the verbs.
Certainly the adjectives,
words like good, bad,
delicious are adjectives, and
your AI system may learn
to ignore the determiners.
Words like the which
may be matter less in
terms of how a user
is actually feeling
about the restaurant.
A part of speech
tagging system is
usually not a final application.
You hardly ever wake up
in the morning and think,
"Boy, I wish I could get
all the words in
my sentence tag."
There's often an important
pre-processing step.
There's often an important
intermediate step
in a longer AI pipeline,
where the first step is
part-of-speech tagging,
or parsing, which we'll
talk about in a second,
and then the later steps are
an application like
sentence classification,
or machine translation,
or web search.
Now, what is a parser?
Given these five words,
a parser helps group the
words together into phrases.
For example, the cat is a phrase,
and the mat is a phrase.
So, a parser will draw
these lines on top
of the words to say,
those words go together.
On the mat is another phrase.
Finally, the two
phrases, the cat,
as well as on the mat,
these two phrases
are then combined
to form the overall sentence.
So, this thing that
I drew on top with
the sentence tells you
what words go with what words,
and how the different words
relate to each other.
While a parsing algorithm is
also another final
end-user product,
it's often a commonly used step
to help other AI algorithms.
That's how classify tags
are translated, and so on.
Modern AI, specifically
deep learning
has also completely transformed
how software processes
audio data such as speech.
How is speech represented
in a computer?
This is an audio
waveform of one of
my friends saying the phrase
machine learning.
The x-axis here is time,
and the vertical axis
is what the microphone
is recording.
What the microphone is
recording is little variations,
very rapid variations
in air pressure
which your year and your brain
then interpret as sound.
This plot shows as a function
of time, the horizontal axis,
how the air pressure
is changing very
rapidly in response to
someone say the word
machine learning.
The problem of
speech recognition,
also known as speech-to- text,
is the problem of taking as
inputs a plot like this,
and figuring out what were
the words that someone said.
A lot of speech recognition's
recent progress
has been due to deep learning.
One particular type
of speech recognition
is trigger word detection
or wakeword detection.
You saw this in
the earlier video with having
an AI system detect
a trigger word or
the wakeword such as Alexa,
or Hey Google, or Hey devise.
Speaker ID is a specialized
speech problem where
the task is to listen to someone
speak and figure out
the identity of the speaker.
Just as face recognition helps
verify your identity
by taking a picture,
speaker ID can also help verify
your identity by
listening to you speak.
Finally, speech
synthesis, also called
text-to-speech or TTS is also
getting a lot of traction.
Text-to-speech is
a problem of inputting
a sentence written in text
and turning that
into an audio file.
Interestingly, whereas,
text-to-speech is
often abbreviated TTS,
I don't often see
speech-to-text abbreviated STT.
One quick example.
Let's take the sentence,
"The quick brown fox
jumps over the lazy dog."
This is a fun sentence that
you often see NLP people use
because this sentence contains
every single letter from A to Z.
So, that's ABC all the
way up to X, Y, and Z.
You can check all 26 letters
appear in this sentence.
Some letters appear
more than once.
If you parse this sentence
into a TTS system,
then you might get an
audio upwards like this,
The quick brown fox jumps
over the lazy dog.
Modern TTS systems
are increasingly
sounding more and more
natural and human-like.
AI is also applied to
many applications in
robotics and you've already seen
one example in
the self-driving car.
In robotics, the term
perception means figuring
out what's in
the world around you
based on the senses you have,
be it cameras, or
radar, or lidar.
Shown on the right is
the 3D laser scan or
the lidar scan of
a self-driving car as well
as the vehicles that
this self-driving car the
middle has detected in
the vicinity of your car.
Motion planning refers to
finding a path for
your robot to follow.
So, if your car wants
to make a left turn,
the motion planner might
plan a path as well
as a speed for the car to
make a left turn that way.
Finally, control refers
to sending commands
to the motors such as
your steering wheel motor,
as well as your gas pedal,
and brake motors in order to make
the car smoothly follow
the path that you want.
On this slide, I'll focus on
the software and
the AI aspects of robotics.
Of course, there's also a lot
of important work being
done to build hardware
for robotics as well.
But a lot of the work AI on
perception, motion planning,
and control has focused
on the software
rather than the hardware
of robotics.
In addition to these
major application areas,
machine learning is
also very broadly used.
The examples you've seen in
this video relate mainly to
unstructured data such as
images, audio, and text.
Machine learning is applied at
least as much to structured data,
and that means these tables of
data some of which you saw
in the earlier videos.
But because unstructured
data such as
images is so easy for
humans to understand,
there's something very universal,
very easy for any person to
understand and empathize with
when we talk about an AI system
that recognizes a cat.
So, the popular press tends
to cover AI progress on
unstructured data much more than
it does AI on structured data.
Structured data also tends to be
more specific to
a single company.
So, it's harder for people to
write about or understand,
but AI on structured data,
or machine learning on
structured data is creating
tremendous economic value today
as well as AI on
unstructured data.
I hope this survey of
AI application areas
gives you a sense that
the wide range of data
that AI is successfully
applied to today,
and maybe this even inspire
you to think of how some of
these application areas may be
useful for your own projects.
Now, so far the one AI technique
we've spent the most time talking
about is supervised learning.
That means learning
inputs, output,
or A to B mappings from
labeled data where you
give the AI system both A and B.
But that's not the only
AI technique out there.
In fact, the term supervised
learning almost invites
the question of
what is unsupervised learning,
or you might also have
heard from media articles,
from the news about
reinforcement learning.
So, what are all these
other techniques?
In the next video,
the final optional video
for this week,
we'll do a survey
of AI techniques,
and I hope that through that
maybe you'll see if some of
these other AI techniques and
supervised learning could be
useful for your projects as well.
Let's go on to the final
optional video for the week.
