Welcome back. We're going to talk about
the history of computational linguistics
a field of research that has been active
for at least 80 years. So fun fact: the
term artificial intelligence was coined
here at Dartmouth College! There was a meeting
in 1956 where a group of researchers were trying
to figure out what would it mean for a
computer to think, and what - how could you
get a computer to demonstrate
intelligence. And they believed that
every aspect of learning, or any other
feature of intelligence, can in principle
be so precisely described that a machine
can be made to simulate it. Notice the
optimism in this phrase. This kind of
optimism in the belief that computers
could model any problem and could do so
fairly easily was a hallmark of this
first stage in research in artificial
intelligence in the 1950s and 60s.
So here we have a timeline of research in
computational linguistics and natural
language processing, and the timeline has
two components: on the upper part you
will see approaches that believe that
statistical extrapolation, so trying to
find statistical patterns or frequency
patterns in a text, are the way to learn
a language. This is the way that a neural
network would work for example or any
kind of statistical machine learning
would work, trying to find which words go
with which and how often they do that.
On the bottom of the timeline, we have
approaches that believe that you learn
languages through rules. That formal
grammars, rules, and in general the
manipulation of symbols, is how you
should learn a language and how it - you
should describe one as well. So let's
start with the top of the chart. In the
early 20th century, people conducted
research on neurons, both on human
neurons and on computer then structures
like computer switches which we turn on
and off, and a - an idea called connectionism emerged which proposed that learning
is the connections between neurons
biological neurons or in between
computer switches. So these kinds of
connections between things was where
learning would happen. And indeed this
inspired the first neural network such
as the perceptrons in the late 1950s.
Throughout the late 50s and 60s, there was research
on neural networks, but unfortunately
these neural networks were fairly
limited. They did not have hidden layers,
they only had very few input neurons, and
so they could not succeed in the way
that they - their creators had foreseen.
This led to pessimism about artificial
intelligence and to a period called the
first artificial intelligence winter in
the late sixties and seventies. Despite
these problems, progress was being made
on accumulating data. For example, the
first million word corpus for English
was created in the early 60s, and people
working on applications like optical
character recognition and authorship
attribution, which is trying to figure
out who wrote a document based on how
often they use certain words. At the
bottom of the chart, regarding formal
grammars and rules in the early nine -
1954 for example these were the first
attempts at conducting machine
translation and these programs used
rules that would find a word in Russian,
for example, and change it for a word in
English. They would do so having a file
that would essentially be a dictionary
telling you that this word in Russian
corresponds to this word in English. It's -
they were successful with small examples.
One of the early examples was
translating a couple dozen sentences
about coal in scientific papers from
Russian to English and it worked and so
there was optimism about what these -
about what these rule systems could do.
This was happening at the same time as
research - research was being conducted on
how to formalize natural languages. So
Noam Chomsky during the 50s and early
60s invented theories of grammar that
were based on rules. On rules and
transformations. And so natural language knowledge
could be systematizing these rules which
people thought could be put into a
computer. As a matter of fact, in the
1960s this when the program Eliza was
created, which is a simple psychologist
chatbot. You put in a greeting or a
question, and then the computer using
pattern matching regular expressions,
switches the question and returns it to
you, so it appears to be an interaction
and it does so again because it has
rules modeling language.
So again the
1970s was the first artificial
intelligence, the late 1960s, I apologize, and the
1970s were the first artificial
intelligence winter and during this era
people were developing the rule based
approaches to parse human language and
to try to see if this parsing could be
put into computer programs. As usually
happens in human endeavors, people
disagreed on how this could be done and
many theories of syntax splintered from
there. If you're interested, this was
actually called the linguistics wars.
There were so many theories that you
couldn't go to a computer scientist and
tell them please implement the theory of
syntax, because there were so many of
them. This led to increased interest in
statistical approaches in numer -
statistical extrapolation. People were
doing things like statistical
translation which was trying to find
correspondences in words between two
languages, so how often would the word
you are be translated to estas in
Spanish, and these efforts were fairly
successful. So successful that there's
someone called Fred Jellinek... who no one
is sure exactly of what he said, but the
most usual citation for it is that "Every
time we fire a linguist, the performance
of our system goes up." People had a lot
of faith that simply by using numerical
systems, they could
find enough talents for the computer to
model human language. Unfortunately,
hardware in the 1980s was not up to this task.
The volume of data you would need to
analyze would be too much for the
computers at the time, and artificial
intelligence went into a second
artificial intelligence winter in the
early 80s, and in the late 80s and early
90s. On the other side of the fence
people were still trying to work with
formalizations of language. People were
building automata,
for example finite state machines and
finite state transd - transducers to
model aspects of language, but it
ultimately turned out, as we'll see next
week, that these models are - would be
incredibly large and almost unmanageable. People were also building a kind of
system called an expert system, where
they believed, for example, that you could
extract the knowledge of a thousand
doctors and then put, through questions
and answers, and then put all those
questions and answers and knowledge into
a kind of decision tree and then model
that so that another doctor could query
the decision tree and you could get all
those decisions. But it turns out that
extracting that knowledge from humans is
extremely work intensive, extremely
expensive, and research has not kept up
in - in that kind of paradigm as it has with
machine learning that is statistically based.
in the 1990s, two things happened. First
hardware improved enough that you
could now reliably run neural networks
and other learning algorithms on your
home computer. There's that but also
enough corpora were accumulated so that
you could find patterns in large
collections of texts. This has led to the
era where we are right now, in between
the 2000s and 2010's,
people have invented new architectures
for neural networks: architectures with
more hidden layers, architectures with
more input neurons, and that have more
interrelationships between these neurons.
We call these kinds of networks deep
learning in general. So these new neural
network architectures combined with more
computing power have allowed for a
furthering of methods wh - where you're
trying to find numerical correlations in
between data. That has changed. also
people are now used to to using human
languages with computers in the 20th
century. Most of the communicate - of the
communication with computers was done
through keyboards and through
programming commands. Nowadays language
mediated interactions are very common.
All of you have cell phones where you can
say hey Google or hey Siri and the cell
phone will understand what you're saying.
In the near future, the field of
computational linguistics and natural
language processing is trying to
understand how to better understand
sentences not only to model human
languages, like what words come after the
other, but try to extract knowledge from
that, try to extract embodied knowledge
so that the computer is also aware not
only of the linguistic context of what
would it mean in the real world, and will
this have better software applications.
On the formalization side, to be honest
most research nowadays is in machine in -
numerical based machine learning. However, progress has been made in trying to get
simpler formal grammars and we'll see if
these two fields can merge in the future.
So in summary, there have been several
artificial intelligence winters when
research turned away from connectionist
or numerical methods and more towards
rule-based approaches but the moment
we're in right now is the boom of deep
learning methods and people have applied
many of them to natural language
processing, as we will see in the
following weeks.
