my name is Chris or cinnamon social
professor here at the school computer
science
and I wanted to tell you about
some of the programs that might even
myself made on the
issue both artificial general
intelligence in the past
five six seven years so I have worked
with a number of people
over the past few years and a here at
least some of them
and you love course be able to
to look at my collaborators publication
list
Google any of these names I'll
and we've already the talked a little
about what
what is hey I mean a and I'll
there's a new GE in there that I would
like to
talk about very briefly before I show
you some
results so I'll you may be familiar with
Google's attempts at creating
self-driving cars
and frankly I'll if you've been paying
attention
you will probably know my opinion if
where this fits in the spectrum from
specific to general
obviously up this is a
more or less a one trick pony and
or one-trick PS a
and a a fundamental thing here which
actually
runs through all of the talks and is
that
we want their systems to learn now so
then the question comes up and it sounds
pretty a simple question you
how do we create a machine that works
well there are many ways to learn
turns out and one of the hardest once
but one of the most powerful ones
is learning by observation and you know
that
people if you're a person
I'm not a robot you know how you came to
acquired
the information that you have in your
head and you know that
a imitation is one of the strongest ways
that
students learn
and we probably have plenty of examples
up this
in nature although I'm not gonna I argue
that here
but are even the most advanced
skills are acquired
in part by observation now it turns out
that
we really don't have any systems that
can really learned properly by
observation I'll
not at least in the same sense says we
are familiar with from
from human learning so we I asked
ourselves
could we create a system that would be
able to learn a fairly complex task that
no system
yet to date has been demonstrated cable
or learning
purely by observations on
what we picked worth basically the task
over
TV style interview have
the observations material
that our system should be looking at is
not
trained actors but actually a average
human being sick
the unik who are talking
about particular topic and I'll get to
that in a minute
I'll with naturally occurring at bank
takes like
this phone why
how do you know it I'm not talking about
this phone and so on
I'll at least a hundred words
and a privilege non-scripted scenario
so we move this
interaction we new fairly early on that
we we're not gonna use computer vision
so we moved this interaction between
these two humans
into cyberspace essentially is like a
video conference
except you don't see the others video
feed you see the other
avatar but otherwise the situation is
exactly like Skype call you
are talking to each other own
in this case there's a table between you
and they're off to to the table
the aim here is to demonstrate that
it is possible to for a machine to learn
very complex temporal skill
a.m. that has to do with timings
on of events on many levels with very
little information up front
so this is unlike GDP in that
there's no way for us to write down all
the rules puree
a.m. it's a naturally occurring
interaction
between two human so there's a lot of
variation you can't really control it
scientifically at if
we want someone to learn interview style
from observations from serving two
people interacting like that
it's not enough to imitate movements
it's not enough to imitate surface
phenomena
the system has to figure out the reason
for why
people are doing what they're doing
otherwise it will
be more or less like a video recording
maybe cleverly indexed
so what we're after here are is a system
that can acquire some rules
about the task that its subsidiary
and do so
autonomously with very little knowledge
up front
and what we you know
really are talking about here is we r
trying to
move towards true machine understanding
at
to acquit to achieve this we had to do a
number of things
well use quite a bit of work
the infected addressing the whole stack
all the way down to the operating system
he did not
rewrite the we did not create a new
operating system but
we did implemented invent a new
programming language
a new kinda reasoning up with that
that was implemented a new kind of I'll
programming programming paradigm for
this new
programming language and logic and
on top of this we implemented an
architecture that
runs the system up looks at
data
streaming in real time and tries to
figure out
what's in it all basically
what we're talking about here is a new
methodology for
creating AI systems the scenario
the said for testing our system
sensually is like this
get this video conference iteration
where the actions of person be
are shown immediately on person A's
screen and vice versa speech and gesture
we're tracking the movements a with
special apparatus and we're weird
basically recordings speech with
microphones
the s1 agent
which is our first agent are in this
architecture is watching the data stream
a I'm not going to go into details about
what exactly we
we put into the agent in feel free to
ask me afterwards
and I can tell you all about it but a
and
their papers out there as well it's too
technical for now
but let me just give you some examples
of the kinds of sentences
every member the agent has
very little knowledge of fury it does it
is not given
any syntax whatsoever it doesn't even
know what words
are here are some examples of
utterances that S one has to observe and
figure out
why they are like they are where they
are produced
what the rules in fact
a govern the sequence of these tokens
which the words actually are and the
data stream
and soul the
as you can tell yet the interaction is
about recycling
so there are six objects on the table in
for the people
and plastic bottle
class Paul newspaper and so on and
the answers here are all true their true
statements about recycling and
comparison between the cost
at and pollution involved with recycling
some of these materials
so s1 service
in the interaction between humans
for 20 hours with Cheryl cells basically
the task or the goal having the system
up serve this interaction until
the performance was pretty good it's a
rough
a outline and then /url
once its observed this interaction
seen plenty of examples of questions and
answers and sequences
to weigh in and manipulation project and
saw
it actually replace one of the people
a in the stream that controls the Avatar
on one person screen directly
so now let me show you
some videos
so just the brief example of
the behavior of these to humans
this is what S1 sees as the humans
interact
Tell us about this object.
This is an aluminum can.
The main ingredient in aluminum is Bauxite.
What else?
All metals are recyclable.
Aluminum can be recycled virtually endlessly.
A recycled aluminum can cost 95% less than a new one.
Which takes more energy to make, an aluminum can
or a plastic bottle?
It takes more energy to produce a can of aluminum
so this is not our lecture and recycling
but I
you get the picture now
a after 20 hours
up serving real interaction
non-scripted all this is what
we see S1 doing as an interviewer
Tell us about this object
This, is a glass bottle.
Glass can be recycled by grinding into filler.
Most glass can be recycled endlessly,
as long as it has been sorted properly.
What else can you tell about it?
Glass bottles can be washed and refilled.
so you see here the machine is on
the right hand side
What is this thing?
This is a cardboard box.
the let's quickly go to the less one
which is
the the machine interviewed by the human
which is my favorite
What is this object?
This cardboard box is made from natural wood fibers.
Wood comes from the forrest, or from recycled fiber.
Fiber cannot be recycled endlessly, like metals.
Cardboard boxes are biodegradable.
Cardboard box takes two months to disintegrate in the sun.
What is this?
This newspaper is made from natural wood fibers.
Recycled paper can be used to make new paper.
So nothing about this behaviour S1 here was scripted or
in given our puree sensually all of the
grammar rules
if you could call it that are acquired
by the system
or autonomously the system did not know
anything about
a an interview accept that the
goals an interviewer is to get the
interviewee talk
and the colour the interviewee is to
please the interview
so very high level goals
a and then plenty of observation time so
that
is essentially I'll
the scenario and results we got
and now what is this really what is
learned here
up well the interleaving of questions
and answers
as an interviewee what and went to
answer when asked the particular
question
the timing of that the timing of the
word
output how to formulate an answer how to
refer to objects with a tactic gesture a
manipulation
up this sort and as an interviewer what
to ask when to ask
how to formulate a question how to use a
particular technique to keep the
interview with in time limits
have to seek work when choice questions
in an interview
multimodal coordination take turns when
speaking
a silence from the interviewer probably
means go on
and not means continue a
and so on and language what to answer
what to put into an answer when asked to
particular question with a particular
set of words
or a tactic tissue weren't tell me about
this
I'll and maybe
the most astonishing fact here to me at
least this was surprised s1 makes no
mistakes
there are no there's no way to tell that
this machine actually doing the
interview except for the voice
so we have a new kind of learning
go level imitation learning and we're
very excited about it
well some features so this which we
won't go into details on
a essentially these are model-based
autocatalytic
and attention dreamin' time-sensitive a
knowledge acquisition based on
observations a
and we have here an example of bounded
to tie me
we're very excited about what can happen
in the
in the the next few years with this
technology we have a small team
a so we are working very hard to
explore some of the more most exciting
possibilities so
that's it for me thank you
Ward
when we did the experiment that the
other person know that
there was no machine on the other and
yes first of course they heard its but
be due to the experimental setup this
was
an unavoidable side-effect of just
the the methods we used but that would
be an interesting
way to explore lucien here
yep did you translate the
voices text before you for yes there is
a
we used the best speech to text I
recognize we could find it
time and we primes them to work this was
something
that that Eric naval my close clever it
had to do
because
I'll most Beach recognizes actually
don't understand the words
but because s1 are our system is very
a has a very keen to sense of time and
this is the essential property of
of the of the system and a and quite a
novelty actually
on we had to ensure that this is the new
at what time things happened otherwise
it could not learn
properly so yes it the words are
essentially
piped through a speech recognizer and we
try to time step them
with a timestamp at that indicates
when it was art other ways
this system ill but the system knows
nothing about
words per se it's just the data typed
data
yet good
that him
good question so
what is the system no I how to sit
the differentiate actions
industry space I'll the
objects are so so the the whole geometry
is inspect the ball
by the AI so they I asked actually sees
everything that happens
ass labeled objects 3d geometric objects
graphical objects a labeled
with a few tax like and arm
and color something like that so
a the plastic bottle past the tag in
this data structures or
all okay and it has a collar colon and
the color
called but there's no knowledge in the
system that
you know blue refers to something called
color it has to figure this out
it has to figure out that something in
the data stream
up coming from speech recognizer
actually may map
to some tax on objects effect it could
be that
you know these text freshly on the head
on the avatars head
but this is how we actually link also
the coals to the avatars because
we tell the system are purely with the
top level also the interviewer and
interviewee
rule are and about the whole object the
after optic disc label avatar so that's
what we refer to it in the goal
structure
in the in the in the seed that we give
Knology give this and so
40 for our manipulation there's nothing
that says
you know when I move something or when
the XYZ of an object change
due to action of the other avatar like
this there's nothing in the system
that says are hairy I that is something
that I shouldn't have to
the stream things undead it has to
figure that out
it also has to figure out the timing of
when you do this and
it that you don't use tell me about this
thing and
or that six exchangeable with tell me
about this plastic bottle
and in fact that this this redundant
tell me about this plastic bowl
a so up because there's only one plastic
bottle there
so anyway I hope that answers your
question yep
bugged whose
old him
him I'll the system knows
the I'm not sure I understand your
question maybe
the parody %um his
he the court
%um him
so if the labels are in the wrong places
aunt
I'll that's a good question
I'm and which people are actually using
same words to refer to objects a
it would not matter actually it would
figure out that correspond here
yeah because the this is the bootstraps
itself by observing
correlations & Co occurrences things
and the at you know it's very bad and
the beginning
but it quickly wraps up as it gets a
good foundation
quickly layer on top so a.m.
and this in fact why one why my piece
decent actually is focusing on how-to
guide such systems maker
goes back to classes comment about
teaching American
because we're not talking about system
that learned by experience so we have to
talk talk about
also think about how the teaching and
learning environments are
sign few if you start with her hardest
math and go backwards
I get very far okay I think we're at a
time
thank you very much
