So I'm very happy to
introduce today's speaker.
He's actually a Colorado alum.
He finished his PhD here in 2012.
Yes.
Is that right?
Yeah.
Before that he did a master's degree
at the University of Pennsylvania.
And that's actually where we first met.
He was in a class I taught at Penn.
And then he did his
military service in Korea.
And then applied to do a PhD at Colorado,
and I was delighted to see his application
I thought, yes, great.
Let's bring him here.
And he did a lot of amazing
work while he was here,
he is one of the people
who's really pushed
the dependency parsing
the algorithms for dependency parsers
and also the current version
of universal dependency.
He contributed a lot to sort of
helping people come to a consensus
on dependency labels
for multiple languages.
And he went on from his PhD here,
to do a postdoc with Andrew McCallum
at the University of Massachusetts.
And Andrew is very well
known for being very much
at the cutting edge of machine learning
and NLP and his postdocs
are very much sought after.
And Jinho impressed him so much that
for every year after that,
Andrew would say to me,
"Okay, Martha, when are you
gonna send me another Jinho?"
[all laughing]
And I did, finally this
year at Tim Hammond
has just started his
postdoc at the University
and that if Jinho hadn't,
paved the way so well, Tim,
I doubt that would have happened.
He now has a faculty
position at Emory University
where he's continuing to contribute
to a lot of different
state of the art approaches
in natural language processing.
But he's also taken on
an additional challenge
that's very much on the
frontier of research
in natural language
processing today dialogue,
interactive conversation,
and he's doing well enough at that,
that of the 400 teams
that applied to Amazon
to be chosen to work for the Alexa prize,
he was one of the 10 that
was chosen, his team at Emory
and the work that he's gonna present today
which is kind of another perspective
on that same work on conversation
was nominated for the Best paper award
at [murmurs] last summer.
So we're delighted to have you here,
we look forward to hearing
what you have to say.
Thank you very much.
[audience clapping]
I always enjoy Martha
is like my academic mom.
Because,
[all laughing]
I was a foreign student and
she has been inviting me
for every Thanksgiving
dinner and all these events
and I always has that deep appreciation
and love for my advisors.
CU is the only one of my Boulder.
So this is another thing I
always keep saying to people.
Boulder is my definite favorite city.
I've been to many cities in the States,
and Boulder is by far
the best city I've been,
that is defined very
subjectively by people.
But yeah, this the,
technology always fails you, but..
[laughs]
Let me see.
This is the slide I use for my defense.
So by the time I left CU, Boulder was
top one for many of these things.
I don't know how many of these are true.
So I graduated 2012.
And hoping a lot of these
are still true and even more.
So I just want you to
feel like the excitement
and the usefulness of all the experience
that you're getting here,
and all the wonderful
city with the Boulders so,
and it's great to see all the old faces
and new faces here too.
So this happen to me like every time
and this clicker is not working,
so I may have to stay here more.
On the project that I'm
gonna introduce today
is known as a Character Mining.
So this is a project I started
doing when I came to Emory,
which was 2014.
And since I was doing
what Martha introduced
as a parsing for over 10 years,
so practically 2003 to
2008, all the way to 12.
And at some point, I just
didn't wanna do parsing anymore.
So I started do something different.
At that point, I actually
really had this interest
in the multi party dialogue.
And I kind of saw dialogue is thumped
kind of domain that we have to tackle.
And this is the reason because,
to understand the contextual information
about the dialogue is very
difficult thing to do.
But yet dialogue probably is the most
important data set you can get.
And here's why.
So this is nothing really new to see.
If you actually take a
look at how the data set
has been constructed,
although this bottom line
is a Snapchat or Twitter,
which a lot of people actually
think is getting increased.
So this is what we call a micro blogs.
And there used to be
a time that micro blog
completely dominated the
social media in a data set.
And this is where a lot
of research came out
and is still heavily used,
especially by the US president these days.
But as you can see, the
increase of the size of the data
is not actually increasing anymore.
It has increased a lot before that,
before 2015 was increasing
like exponentially,
but now it's not increasing anymore.
So the next step,
like the Instagram was
in the exact same story.
But now its started increasing
again because of one reason.
So it started making people
to add a story about it.
So it's not like a blog,
but very short stories.
And this is another thing that Facebook
actually tried to do too.
So instead of writing
only a tweet, very short,
it used to be only like 150
characters were allowed it,
but you are actually writing a story.
So people are curious about like,
what's next coming for your life.
So this is a time the instructor say,
adding these features, and they now start
getting more increase.
So people like to hear something more than
just a micro blog.
So, you can guess this,
once they're actually
the quadratically growing right now is,
this Facebook Messenger or like WhatsApp,
this kind of chatting apps.
So here's the thing, how
often do you actually tweet
or put something into the Facebook?
Probably not even once a day.
If you are, get a life.
[audience laughing]
Most people don't.
But how often do you actually put down
something in your chat?
Text messaging, is like every day,
is not even every day, every hour.
So the data that's actually
accumulating for the dialogue
is increasing, much greater rate than
any other kind of data.
And this is the reason
and this is when I actually
decided to do the face
of this character mining project.
And that was the motivation
behind this project.
So I actually wanted to
do the competition AI
as I'm doing with all
that surprise this year
from the beginning,
but then I realized, there's no
such machine learning
algorithm or NLP models
that actually has the good understanding
in the context of the dialogues.
So that's why I started
doing this project.
The very first bottleneck
of this project was,
we want to have a lots of conversations.
But at the beginning
2014, I asked my student
to send me their text messages.
[all laughing]
And you can guess how many
of them agreed to do that.
[all laughing]
So it was actually impossible
to get this kind of
contractual conversation data.
So these are type of
the data that we wanted.
We want to have something daily lives.
We don't wanna have a conversation between
like travel agency to their customer.
Some kind of customer
service or some kind of like
log from some open to
customer service dialogue.
We want to have a daily lives conversation
and also involving
possibly multiple speakers.
Because you don't really
talk to people one on one
you talk often talk people in groups,
which adds another level
of challenge to the task,
but we want to have the multiple people.
And also, we potentially for
machine learning perspective,
you need to have a conversation
that actually carried
on for a long period of time
from the same group of people.
So you can actually do
some kind of mining.
If you have a conversation
with this group of people,
for only like 10 minutes,
you won't be able to
do any kind of mining.
But if you have that 10
minutes happening every week,
for another like 10 years,
probably you can do some good
part of the mining.
So this is why we started
a project using TV shows.
TV shows are actually better than movies,
because movies try to
condense all the stories
into two or three hours,
whereas TV shows they
actually be more explicit
about what they wanna talk about.
Lots of TV shows out there,
not many of them are actually appropriate
for our research.
Such as the most popular
TV show these days is,
"The Big Bang Theory," which is completely
not the way that people talk.
So, we did not want that.
The choice of the TV show that we made is,
a TV show called.
How many of you actually know this show?
Good?
None of my first year
PhD or undergrad students
actually know this show.
[laughs]
Which is very, very sad.
And you guys heard my console people.
The show ended 2004 and
these are all old people now.
[all laughing]
Potentially the purpose in
the languages they are using
is not the most reason,
but they are actually
talking about the daily life.
They all carried on for the 10 years.
And these six main characters actually
appeared throughout the entire shows.
So we actually decided to
do the research on this.
So about total of 10 seasons.
There are 2360 episodes
and over 3000 scenes
their still available.
So that's the data set that we used,
the long term objectively is very clear.
We want to be able to understand
and be able to do actual
machine comprehension on it.
So,
we want to be able to have the machine
be able to answer these type of questions.
And this probably is
familiar to you, right?
Back in 2010, I remember at Boulder,
we had all our concept groups together
and watch the show together, "Jeopardy!"
While IBM Watson beating
human to expert champions
in "Jeopardy!" altogether,
actually Martos was there too.
So, I wanted to actually develop something
like this for the TV shows.
That was the original intention of this.
So originally, I planned for 10 years
to have this happen.
It's been five years, so I
should be at least halfway.
I got the first task of a tours to this
introduced which I'm gonna
show you later today.
But, we are not there yet.
Answering these questions
are really difficult.
Project Overview.
So back in 2000.
So I started the project in 2015,
2016, we introduced the first test
called character allocation.
And 2017, we also decided
to do the emotion detection.
So this is a test, we try to see,
what's the emotion,
how the emotion changed by each speaker.
So this is the project
I also wanted to do.
And the third one is a
reading comprehension.
So this is,
given the question,
given the dialogue and
you try to give out SAT
or like TOEFL GRA type of
reading comprehension question.
And see how well the machine
can answer these questions.
And this year, we actually successfully
got the outworking of
question answering model.
So the "Jeopardy!" kind of
question that I showed you.
So we tried our very first try
of the question answered on the dialog,
and I'll show you what kind
of results that we got.
So every hour,
all work is available in our GitHub.
So if you just type character mining,
GitHub will find you for this.
Today, I'm gonna focus on these two tasks.
Since these are intermediate tests
that we to develop upon to this test.
So they're actually one more test I did
the personality detection tool.
So we try to actually see the big O type,
the big five type personality
for each character.
But he was actually very difficult to do
there on the only text.
but we just submitted a paper to Triple AI
so hopefully, we will get accepted.
Maybe next time I come here,
I can present that work.
So this character identification
task is pretty interesting,
given a dialogue,
and it's also a multi party dialogue.
So, there are three
speakers in this dialogue.
First, identify what we
call personal mentions.
So all these mentions are referring
to certain character in the show.
And the eventual task is we try to map
each mention to all the actual
character to industrial.
Anybody can guess how many
characters are influence?
Including all the extra
characters, they appear only once.
Nobody can guess.
So everyone knows, like these ones are,
these are the ones that appear
pretty frequently, right?
So they're about 400 characters up here.
So the test, the original
cast goal is to map this to
any of those 400 characters,
but we actually got it reduced
down to only 20 characters,
including the six main
characters at the end.
And that actually covers 99%
of the entire show, actually.
So, because all those extra
characters is long tail things.
So this is the actual goal.
So let me actually try to explain
why this is challenging
compared to some of the
traditional allopathic called,
like co-reference resolution.
So the first challenges that we have,
let me show you.
So the very first challenge is,
even if you actually have I,
this I represent different
things in different audiences.
So this is what I call a
heterogeneous pronouns.
So, if you actually think
about the Wikipedia,
and if you find out that
like Donald Trump's page,
it says, "Donald Trump is
the US president of somebody,
he was born [murmurs] he was
CEO of some trucking company."
And all this he actually
refers to one person.
And this is actually true for most
co-reference resolution tasks.
But this is almost
never true for dialogue,
because everyone can be
referring to something different.
So that's actually the first
challenge that we have.
The second challenge is,
you need to link this across
multiple utterances too.
So the distinction between the utterance
is very different from the distinction
between the sentences.
So, sentences are still
talking about very coherently
and with a similar manner.
But utterances,
each speaker actually talks
their own individual manner.
And when this person say
somebody that doesn't
mean this person.
So linking across utterances
actually gives you
another level of challenge.
So this Ross actually is all linked here.
And of course, to all the way to the top.
The third one, probably
the most challenging
part is across document resolution.
So in this context,
there are three speakers.
But this person Ross is
talking about mom and dad,
who happened to be mom
and dad of Monica as well,
they're brother and sister.
But this mom and dad are never
appeared this as a speaker.
So you actually have to infer somebody
that does not even appear in the dialogue.
So we know this mom and
dad actually does appear
in some other dialogues,
So you need to actually
bring the information
from other dialog to understand,
okay, what is mom and
dad are Judy and Jack?
And try to do this kind of resolution.
So this is another, the very last part,
which we just introduced last year was,
there are lots of plural mentions too.
So I think we are actually the first group
who literally had tried to
tackle the plural measures
for co-reference resolution.
So in here, they, actually
refers to both mom and dad,
which none of the
existing systems actually
try to link those together.
So these are the challenges
that we have for this class.
So let me try to demonstrate
how we tackle these problems.
And what's the state of the project is.
First Character Identification.
So this is dataset we
created, so we download it.
So we had to get all these transcripts
the transcripts are voluntarily
generated by the fans,
and their a millions of fans of "Friends",
they are just dying to watch the show
and just write the transcript for it.
So we got many of those.
It did take us about a half year,
more than a semester
work to clean up the data
and put into the JSON format.
So it is now clean,
structured data JSON format.
And all the mentions
are manually annotated.
And all the entities
are also manually found.
So the entire annotation
actually did take us
about two years to get it done.
But the data is available there,
and is probably the largest
resource you can find
for the dialogue parts.
So you can probably use it for your work.
So, oh, I was wrong.
A person came to link to
any of those 781 characters.
[laughs]
I don't think I'm wrong actually.
It's a 781 characters
total of the 10 seasons.
I think if we consider
only the four seasons,
it was actually about 400 I think.
In our experiment each scene is considered
an independent dialogue.
So each episode is about 30 minutes long,
and it's broken down into about 20 scenes.
So each scene is one dialogue.
So mention annotation,
all nominal is all color mentioned,
if it is the person my name entities,
so to spell proper nouns mostly,
and some pronoun that refers to a person.
And also we have our
personal noun gazetteers.
So all things like a
brother, sister, mom, dad,
teacher, all these kind of things.
So we try to consider
all those as mentions.
So this was initially termed Robust Way.
Robust Way actually did really well.
So F1 score of the
robust one was about 96%,
which is pretty high accuracy.
We actually recently is to start doing
this mention annotation,
we're using the end to end
co-reference resolution system,
which does not give this
kind of accuracy actually.
So the machine learning
approach was actually worse,
but not much worse, it
was about 94% accuracy.
These are the annotations.
So their majority of them
are singular mentions,
as I mentioned this
link to only one entity,
and there are few, about 10%.
I guess, about 10% of the
mentions are plural mentions.
It can be linked to more than one.
And these are really,
really challenging problems.
Entity annotation,
there are one, two, three, four, five
types of entities that we generated.
So the first two are pretty easy.
The primary are the primary
main six characters,
those are the primary,
secondary are every other extra
characters that has names.
Generic is the interesting one,
generic, you actually have,
so you know, this is an actual person,
if you look at this example.
"That waitress is really cute,
I'm going to ask her out."
Guess who said this?
Hey Joey, yes.
[all laughing]
And that waitress is a real person,
he even shows in the, yeah,
the person actually exists in the show,
but you never have the
name of this person.
So we consider this as a generic.
And these are closer together.
The general is actually different thing.
So if you look at this example,
the ideal guy you look for doesn't exist.
So the guy is not necessarily
referring to anybody
is a personal mention, but
now referring to anybody,
those we consider as a general,
is a general term.
So something like a, you know, you know,
that you is all general term.
And other.
So others are the ones
that we actually know,
the character exists.
But there's no way, contextual way
that we can actually tell
who this person is from the
context of the dialogue.
So those are all distinguished the others.
So those are actual
characters into "Friends,"
they may even have a name,
but we cannot even disambiguate this.
So the previous example that
I showed you mom and dad,
those one we can actually disambiguate
because we know Ross's Mom
and Dad is Jack and Judy.
But these are the cases
is ambiguous enough
that we couldn't even
tell from the context.
So, if you actually
read the entire season you
probably know who that person is,
but given the dialogue it was impossibles.
So, these are the types of the entities
and this is a distribution.
The primary actually consumes 67%.
So, majority of them,
secondary consumes 25%
generic is very small and 5%
of general and other is 2%.
So, let me actually try to see how this
can be tackled with the
co-reference resolution first.
So, the task that we are trying to do is,
we try not thinking about the
entity linking at the moment,
but given the dialogue we try to
cluster dimensions together.
Not necessarily we know who that is,
but we want to cluster them together.
So, this is a very prototypical
co-reference resolution task,
but this is task is actually,
what we are trying to do
is a little bit different
from the general one
because we're trying to handle
the plural mentions as well.
So we had to design a new
algorithm for doing this.
So there's a couple of parsing algorithm
that we had to design.
So the algorithm actually always
compares the two mentions,
one MI is the left mention
and MJ is the right mention.
So there are two mentions,
and each mention MJ is compared to all
of the preceding mentions MI,
which is the ones that's prior to that.
So given this is going to be compared
to every other mention,
this will be compared
to every other mention.
So its not N square algorithm.
And these are the three
labels that we are injecting
for machine learning algorithm to learn.
So N means when you're
comparing MI and MJ,
N means there's no relation.
So it was a skip that relation.
L means it should be MJ is going to left.
So MJ gets assigned to whatever
the cluster MI belongs to.
So you're comparing these two mentions,
and MJ now guess belongs to whatever
the cluster the MI already has.
If MI does not have a cluster yet,
he will create a new cluster and assign
both MI, MJ to that cluster.
So, that's the left relation,
because he's making the
right one to the left
and the right relation actually is going
the other direction.
So now MI gets assigned to
cluster that MJ belongs to.
So this is basically a flip right?
From the machine learning point of view
these conditions are exactly
identical at this point.
So the reason why we made
these two distinction was this.
So MI and MJ in this case,
one that's actually having the cluster
has to be a singular mention.
So we are not creating any
cluster for plural mentions.
So, if you actually found out
this is a singular mention
that actually will be linked to something,
you will create a cluster for it,
but if you have something
like mentioned like they,
were not creating like
multiple cluster for they.
We cannot create each cluster individually
from the singular mentions
and have this belong to they.
And this is why this
algorithm is distinguishing
these two labels.
So, MI in this case will
be the plural mention.
So, this is how the algorithm works
and this is probably the first
co-reference resolution algorithm
that try to handle the
plural mention in this way.
So, let me demonstrate how this works.
So, this is all the mentions from here.
So, Jack, Monica was talking about this,
so, I want is this, woman two is this,
I three is this and so on.
So, these are all the mentions up there,
And we have a two extra mentions we add
which is so general and others.
So, it could be considered
to be a general or other.
So, we can be compared to these
two arbitrary on clusters.
So, at the very beginning there is
a cluster of general, other
which is currently empty.
Now, we are trying to compare I to,
I to be either general or other.
And since they are not neither of them,
then the end label no
relation will be assigned.
So, we'll just move on,
the next one,
what role is the women.
And I read about these women.
So, this is a prototypical
case of the general.
So, now, this will be
compared to all three of them,
these two will be neither no relation,
but it will be actually having a relation
with the general.
So, we'll actually make it
assigned to this general cluster.
So, let me go on.
So, this I which is here,
I thank God, which is Jack.
So I here, will be the
same cluster as I here,
both are referring to Jack.
At this point there's no cluster for Jack.
So, this is a left relation
we will create a new
cluster and assign both
of these mention to this cluster.
Jack and K.
Next one, so our, so
this is a plural case.
So plural I told you like we cannot create
a new cluster for plural,
but if we already had
the cluster generated
we can assign that to it.
So in this case, our is gonna be,
our will be in this case
it will be Jack and Judy
who is Jack's wife.
Our actually has a relation with I,
which is Jack and also this I.
So our can actually be belong to Jack
but we don't have a cluster for Judy yet.
So it doesn't do anything at the moment.
So it will just pass.
So one question is, if
Judy doesn't ever appear
in this context, you will
not have our cluster.
So and that's okay because there's no way
we can actually tell that from the context
that our actually means Jack and Judy.
So, that will pass.
Our Monica which is Monica.
So, there is no mention of Monica before.
So, it will get passed.
Ross will not be compared to anything,
there is no mention of Ross.
So it will also be passed.
Now you, which is this
part was going on with you.
So it says actually
this is the tricky part
about the co-reference resolution.
The pronoun you can be
either plural or singular.
In this case, do you think
it's either plural or singular?
It has to be plural because of this to.
So we notice is a plural,
and we try to compare you
to any mention before and
now he actually is Ross here
and also there is a you,
second you, which we cannot
interpret from this context.
So, you will also get
that closer to the R.
So, yes.
How do you look ahead,
to is actually there?
oh how do you combine, [murmurs]
How do you compare this things?
You said you is plural or singular?
The pronoun you can be
either plural or singular.
How do you look ahead that
to exists, that's why..
Oh, in our model, the entire
context is already there,
so, the embedding space
will be constructed
with the two.
So hopefully you learned
that two means the plural.
So yes.
So given this now, you will be belonging
to Ross cluster as well
as the other cluster.
In this case, so it has a plural mentions.
And a similar to mom, mom is
now another interesting case.
So sure our, was actually
both mom and dad.
But we only had a cluster for dad.
So he was below it, he
actually got assigned
to Jack cluster, but he never
got assigned to anything else.
So in this point, we actually create
a Judy cluster and try to
assign mom an our here.
Our gets assigned to
Here.
So that's a very that's
a challenging part.
So from the machine point of view
in main chain interpreted
mom will never be there
conditionally disambiguate is now is hard.
So it makes us assigned to this.
So the system sometimes does that.
In this particular example,
Mom and Dad, Judy and Jack,
they do appear a lot in other dialogues.
So it doesn't make a
mistake for this case,
but there are some characters
who did not appear so much,
it could actually make mistakes.
So it is very possible.
Yeah, so especially if it didn't appear
in the training data,
or appear only thought of test data set,
you will actually assign
that to the other,
instead of mom.
That same story it will be on same thing
as all these three so we'll
actually get assigned to here.
So let me actually move on.
So this is basically three
oh guys actually, similar story,
which guy is not gonna
be assigned to anything.
So getting passed.
Me get to label.
let me actually pass and pass.
Here we go.
So this guy is the example of the generic.
So look at this,
okay, I just got this
from the guy next to me,
he was selling a whole
bunch of stuff right.
So here guy also at the same time,
can be assigned to other possibly,
but he actually saw, he
is mentioned again here.
So it is not some random person.
So we need to create a cluster for it.
So at that point, so you
anticipate not to assign
to the other, but now try
to make a cluster with a he.
So now this generic one will
be having both he and the guy.
So at the end,
so at the moment we actually
went through everything,
there's only one last step.
So there's only one mention here
that was not assigned to any clusters.
So those are what we call singletons.
So we collect those
singletons individually
so and push to them.
Some co-reference resolution systems
don't care about singletons.
is all they care about is
clustering on mentions.
But we actually do because even if it is
a singleton that doesn't
have any other mention
associated with it in this tower,
it can be linked to something else.
And in this case it should
be linked to Monica.
So for that entity
linking that we actually
do care about the singletons as well.
So at point co-reference
resolution is done.
So, if not machine learning actually made
a right prediction every time,
this is the cluster that would have
been generated from this algorithm.
So, the effective nice is it handles
the plural mentions quite well
and also the general others entities
are getting assigned and [murmurs].
So, this is the co-reference
resolution parsing algorithm.
Now, given each of the decision has
to be made by machine learning.
So, during the training, we
compare these two mentions
MI and MJ and put to the
machine learning algorithm.
And he will give you the
prediction of and NLR,
the three classes.
So, that's a classic machine learning
and we unfortunately do
use deep learning for this.
This is the algorithm
call we are introduced
as agglomerative a
convolution on neural network.
So, given mention I
mentioned Jay mentioned,
mention doesn't have to be a word,
a lot of mentions the
examples I showed is one word,
but it can be multiple words.
So we are actually extracting
lots of agglomerative
convolution features from each mentions.
So and it all gets injected to one vector.
So each of these engram features
are extracted to one vector.
So it's an embedding for that.
These two things, and we actually have
a course that tells
these kind of features.
So it's not just about the mention itself.
Yeah, also, it's a sentence
associated with it,
like utterance associated with it,
plus misspelled words.
So we have many of these engram features.
So all this engram features, guys,
it gets a embedding representation
and we have all those sets.
And for each the mention,
we actually made another convolution
to generate an embedding,
per each of this mentions too.
So, once you do all these convolutions
you have one vector generated.
And this is what we
call mention embedding.
So this embedding represent MI
this embedding represent MJ.
We use lots of features
to extract the part.
And we also have a discrete features.
So there are some trade transitional
features that people use.
Some there are good example.
easy example is like gender features.
So there are some prototypical names
that people use like Mary, John,
those are gender features
so, we actually included
some of those generic features.
So these together make
star mentioned embedding.
These again, are fused together
and try to do the convolution again.
So every convolution is
done this way to create
this one vector,
which is what we call a
mention pair embedding.
And this mention pair embedding is now
with some discrete paired features.
If the pair actually
aggressing each other,
then put to the multi layer perceptron
using the ReLU activation,
which will try to classify
these three classes.
So that's how exactly we learned,
then this decision is used to generate
the kind of clusters the past.
So this is the very models that we use.
At the moment, we are done with
a co-reference resolution,
so we know how to group mentions together
within a dialog, but we still don't know
which entity it actually refers to.
So we know bunch of
like he dad is all here,
but we don't know if
this is actually Jack.
So we have to do the entity linking part.
So this is the model that we used.
So it's a multitask learning
entity linking model.
So, we actually made another
pass of the classification.
This is part mention.
Given a mention, each
mention should belong
at least one cluster, right?
We collect all the mentions
within the cluster.
And we run them.
And as well as each
mention is also compared
against a lot of other mentions.
So, we collect all the
ones who are compared
that belong to the same cluster.
So these are also all mention pairing
within the same cluster.
And we made our max pool and average pool
and try to do all that again of the
one dimensional convolution
and generate a cluster embedding and also
a closer pairing embedding.
So these basically represent a cluster
that this mention belongs to.
And this is the cluster
of representation also
made by the mention pairing system.
So these two clustering representation,
and since we are handling
plural mentions as well,
we can potentially have as many clusters,
per mention.
So, there are a key number of clusters
we do that all together.
Given the mention embedding
from the previous slide,
we actually try to get the average cluster
for each cluster
representation for the pairing.
So, this is a cluster presentation,
average cluster presentation,
and this average parent
cluster representation.
And now is getting all ready
to go multi layer perceptron
to make a final decision,
and this decision actually
is the one that uses,
every mention represent a character
and I told you like we reduce it down
to about 20 characters at the end.
So the dimension of this is about 20.
So this is for singular mentions
we have a singular
so this is done by Softmax,
so whatever optimized for
but we have one extra slot that says
this is a plural mention.
So, if this is not fired,
we'll trust the decision making
by any of this dimension,
but if this is fired,
then we have a passed
that to the other sigmoid layers.
So this will actually do the
multi label classification.
So each entity it can have,
will be optimized individually,
and he will be able to
produce multiple labels.
So this is a test that we have
one, I'm doing okay, I guess.
So experiments, let me
show you the result.
So the typical prototypical
preference resolution
matrix B cube and SFM blank.
I don't think people
care, this metric so much.
So let me explain a
little bit about B cube.
So CCC, this is the paper that
we actually published Tucano 2017.
So there was a previous
territorial approach
that we didn't really
handle the plural nouns.
So if you actually don't
handle the plural nouns,
all this stuff, this whole project
was doing about 70% accuracy on the end.
Co-reference resolution
70% is actually not bad,
you guys actually have to understand.
But with our new approach,
we can handle both
plural and singular noun
and we got the 74% for the scores.
Entity linking score is
actually pretty much lower.
So if you actually take a look
at only the singular mentions,
F1 score is about, so from the previous
our current approach, singular mentions
scores are about the same.
Because previous project we did it well,
for the singular.
And so with the plural mention,
it was doing noticeably
worse 29, 30% and above 41.
And if you think about the 41% F1 score
is all still pretty bad.
Is still on, I guess the random chance is
one out of 20 still is better than that.
But the accuracy for deployment
is still very difficult.
Overall, the overall accuracy presented
increase about 67%.
So that's what we were getting.
So if you actually are
curious about character wise
at the delinking.
So the first six characters
are the main characters.
And if you think about
the look at the entity
result,
they're actually pretty good.
If you consider only
those singular mentions
they're above 80% on
average, which is as a cause.
So this means in any town like in France,
if you actually run this
tool for the main characters,
you can reliably find the main characters
like in 80% kind of accuracy.
So it's actually pretty good.
But all the ones other ones that
appear only like 1% of the data.
It doesn't really do that well.
How many are people at this?
Sorry, oh, how do people do this?
So I don't actually have
the number for the plural
or singular I take,
it was actually doing about
low 90 or high 80.
That was a number that we did.
So yeah, so people are
human actually does better?
I think he does.
I didn't put the number in the paper,
because I think our evaluation was biased.
The people who actually
evaluated this as a human,
they know the show very well.
So they don't really need to even look
at the text to know this stuff.
So that's why I don't think
that number was actually legit.
So, these are the character identification
test that we have tackle,
and I think it is a very fundamental test
there has to be solved in order
to understand that our context well.
So given this work, now,
the very latest approach
that we did was the question answering.
So let me try to
introduce this little bit.
Let me try to make this in five minutes.
If you're not familiar with the question
answering test in NLP,
There are mainly three types of questions
during tests that people want.
So one is the reading comprehension,
which we actually have done before, too.
So given this kind of text,
then you actually try
to ask a question like,
what's the name of the
trouble making tutorial.
And James the turtle,
so we all know the name,
but you will actually give
you the multiple choice.
And it will give the multiple
choice and try to see
and the multiple choice is giving us
this is not even reasonable, right.
James and Jane, maybe reasonable,
freezer seriously?
Splinter, I don't know.
So anyway, this is our
reading comprehension task.
And the second contest test
is the clothes style QA,
which is also pretty fascinating.
So this is given a news article like this,
you try to have the description about the
event happening in the news article,
and make the one blank and make
the machine actually guess,
out of this context.
And he actually has to
guess though, personally.
So this is close style QA,
close style QA can be also tackled
by multiple choice kind of thing.
Because if you actually know you're always
asked for the person, then you can run
the named entity recognizer,
extract all the persons first
and try to pick one of the person.
So that can be done.
The test that we are trying
to run is a spam base.
So this is probably
the most popular forum.
If you know how to create a dataset called
like squad kind of data set.
Given our context,
we actually have the question,
and you're supposed to find the answer
spanned from the context.
So this is the task
that we are trying to do
in the forensic QA.
The major challenge in this
test is the evidence test
is the multiple or multiple dialogue.
So this actually makes a
whole lot of difference.
So if you think about these kind of cases,
the context from Wikipedia is very similar
in a way that is written
for that question.
Is both written in some
formal descriptive way.
But in the dialogue, you may not actually
even complete your sentence.
So if we're talking about
like a birthday cake,
somebody started like, "Hey,"
and a secretary was, "I
got something for you."
"What do you have?"
"I have a cake."
"What is it for?"
"Is for a birthday."
And if the person say again,
"Okay, why did Mary buy John?"
So you will never say
birthday cake, right?
Because those are like scattered,
all the information is
all scattered around.
So unless you understand
the entire context,
you can't really answer
these kind of questions.
Whereas this kind of study, you can more
or less do some kind of string matching
to get the answers out.
So that's where the challenge comes.
Let me elaborate a little bit more.
So entity resolution
which we actually look
at the at, in character
identification task
is very difficult thing to do.
So, if you actually read this context
and say who forced Rachel
to raise the state?
Who is it?
Monica yes, in which season?
[all laughing]
If you don't know that
season, then you have a
life like I do so.
[laughing]
So it's Monica and take a look at this.
Here, she forced me, it says that.
So first, you actually
have to know who she is.
So hypothetically, we could
do this with the Monica
by using our previous
task and raise the state
is written here, right.
So, no one forced you to
raise the state, right.
So you need to know who you is.
And you actually have
to understand this part
compared to this part.
And the meaning of the data
is not true because sure
somebody says no one forced you.
But here she's saying somebody forced you,
and that somebody is she.
You see why this task is difficult?
This is what's actually
happening in our data set.
So I actually got the real examples.
So you need to handle
all this constitution.
These are obviously classic examples
of NLP is not necessarily for dialogue,
but we actually have a lot more of this
because we are handling TV show,
which are supposed to be funny,
and people try to be
intelligently funny like Chandler
and us lots of metaphors and sarcasm.
And look at this.
Hey, Joey, what would you
do if you're omnipotent?
Probably kill myself.
Excuse me and little juice.
They're very funny example, right?
That's probably the only reason
why this paper was accepted.
[all laughing]
Why would the Joey want to kill himself?
To do this,
you actually have to do the inference
of understanding
omnipotent, Judy understood.
So I'm omnipotent.
And this means little Judy
that's probably why Judy
wanted to kill himself.
No deep learning will give you that.
[all laughing]
There's no way.
So difficult happen a lot.
Sarcasm, the big [laughs]
one person who makes basically
a 50% of his speech is sarcasm.
So morning, Tuesday morning.
Hey, you made the pancakes.
Yeah, like if there's any
way I could ever do that.
Teacher make a pancake.
The machine will say yes.
Because he said, yeah.
Well, we all know it's not.
So this is the challenge
part about this dialogue.
So let me go into the type of cultures
that real examples this are
one dialog that we pull
out from our data set.
And there these are
types of questions that
we try to generate.
So it's the 5W, 1H questions
which are prototypical known
as [murmurs] questions.
But this is such a pretty difficult
so this one is going
to do with a character
case tonight on there
are more than one places
that it says about it.
Where are the Julianne
Chandler Central Park?
Who is Judy getting phone number from?
is Casey?
So if you actually notice this problem,
who is Joey getting a phone number from?
We actually annotated only this?
Not the later QC?
Because we want the machine to learn?
This is actually the context that actually
should get the answer wrong.
Not some other words.
Just because Casey person a lot.
We don't want you to pick it.
So we try to so and obviously
what do you think is the
most difficult questions?
These two questions, right?
Those are long answer questions.
So those accuracies
are really, really low.
So let me actually try
to finish in two minutes
so we can actually talk something.
So answer span we have
some rules for answer span.
This is some conventions we
made for the annotation online.
So and we did lots of crowdsourcing.
At the end, our annotation
quality is actually pretty good.
It's about after we did all the cleanup
and the pruning is above 80%.
Again, we actually made it to high 80,
which is very acceptable kind of quality.
So the data quality is
pretty good, over the summer,
we actually cleaned up more.
So we use the same first four seasons
of brands created all
these three datasets,
and we did a lot of analysis.
But let me skip this.
So you guys asked me questions.
I think I'm running out of time.
So we ran this into the three state
of the art questionnaire
systems out there,
including the ladies approach, BERT,
I think BERT is now obsolete.
It's been out only for six
months, it's already obsolete.
If you're in machine learning
related computer science
field, too bad.
Your state of the art will
not last for two months.
So when I was PhD, when I
established state of the art,
used to last about a year,
now it doesn't last for two months.
So experimental result.
There are many different ways.
At the end, the best results that we got
for the spam matching is about 64%.
Compared to the Wikipedia question
answering squad that we showed,
I think people are getting up to 90%.
So, but since it's much
more challenging question,
I think our dataset is also smaller.
But, we currently have like
we have something better
than this, trying to [murmurs] about 67%.
But it's still very low accuracy.
So we did error analysis,
many different types of errors comes out.
These are the overview
of the character mind.
I'll be happy to talk about it after.
I'll be staying here
until five, so please do.
One thing that I do wanna
show, Academic Tree.
Two familiar faces, you probably know.
Half me, they are now father
and mother but more or less.
[laughing]
Produced a lot more students [murmurs].
I'm hoping this will happen to all of you.
And really happy to see you.
Thank you.
[audience clapping]
So we have a few minutes for questions.
Can you apply this to novels as well?
Novels.
Yes, I think so, novel
maybe actually even easier,
because it's more Korean
a there's more narratives
that's coming out.
But I guess I think people
are attracted to novels
for their reading comprehension,
and also close style.
I don't think there's a
word for the spam base,
but I don't imagine it cannot be done.
So novel maybe, I thought about
doing the children's book.
So that's the domain I've
originally thought about.
I want to move on to the [murmurs]
so yeah, I believe so.
Can you do like, is the
relationship exactly..
Right, so that's our eventual goal
that eventually we went to actually build
big knowledge graph out
of all these characters,
and match that to some
metadata that we have.
So at the moment, we are not there yet.
The character education is not all that
as good as the species is being.
And there's also another level
of semantic parsing involved.
So, which we are actually
working on currently,
but it will take another couple
of years probably to get there.
Because your model has been
trained not scripted language.
Do you think it could be
problematic if you try to use..
[laughs]
So, this model may not
actually work for all domains.
For example,
question answering
model actually can work.
But the character adaptation model
when we try to apply that
to the "Big Bang Theory,"
obviously doesn't work that well.
So, but the question is our model,
I think, is actually pretty good.
The script language is actually harder
than the real competition.
So, because the all the reasons
that I was showing,
like sarcasm metaphors,
they intentionally use this so much.
Also humor which we don't do as much.
So actual conversation,
there is a difficulty
level of the discourse.
So the sentences are not
well organized, possibly.
But other than that scripting languages
are actually harder to do.
So these are co-reference resolution tool.
We're actually currently
trying to use that for our,
and more of the Alexa prize project.
t's been working okay.
Not the best, but its been working okay?
Any other questions?
4.30
Yeah.
So there's cookies and
coffee and lemonade up here.
And Jinho we'll be here for
another 20 or 30 minutes or so.
If anybody wants to follow
up with some more questions,
or ask other questions,
he will be happy turn to.
So thank you very much.
[audience laughing]
