So let’s talk about complexity. For a codebreaker
who spends all day decrypting messages and
unlocking secrets, knowing what kind of cipher
was used makes puzzling out the answer a lot easier.
But the vast machine of everyday communication
didn't come with any blueprints. We need to
work extra hard at trying to reverse engineer
language into its component parts to get out
just how complicated a system it really is.
So what do we find when we finally crack the
code? I’m Moti Lieberman, and this is The
Ling Space.
Any language can be broken up into different
systems: syntax, semantics, morphology, phonology.
And different languages divvy up the workload
in different ways. But linguists generally
agree that, on the whole, all languages are
equally capable of expressing pretty complicated
ideas. But what about human language overall?
Just how complex is it, and how do we even
measure that? To get to the bottom of things,
we’ll need to hone in on what language actually is.
As we’ve touched on before, one thing that
we can rule out pretty much right away is
the idea that a language is just a bunch of
sentences, pre-built and ready to be pronounced.
If that were true, each of us would need an
infinite memory to carry them all around in
our heads. That’s because, in principle,
there’s no upper limit on how many sentences
there are, or even how long each one can be!
Like, take the sentence “Joan should be
allowed to work at Bletchley Park”.
We can always add a few words to the beginning
of it and still end up with a perfectly good
sentence, like in “Alan thinks that Joan
should be allowed to work at Bletchley Park”.
And there's nothing stopping us from doing that again. And again. And again.
So, it doesn’t seem very likely that we’ve
just got all these possible sentences memorized
ad infinitum, especially since you’ve probably
never even heard most of them before, but
you still recognize them as English. Instead
of a list, language is more like a set of
rules encoded in our heads, which we can tap
into at any moment, to produce any sort of
sentence we like!
And we’ve seen these rules in action before.
When we talked about X-bar theory, we showed
you the sorts of trees that we use to represent
the structures of sentences. And at the roots
of these trees are the rules that generate
them. Take a sentence like “Christopher
might help”, which looks like this.
Underneath this lies a couple of rules, which
can be put together and fed into each other.
These say that a sentence — or inflectional
phrase — can be made up out of a noun phrase,
plus some indication of tense or modality,
plus a verb phrase.
There’s a lot of different kinds of rules
we can come up with to show how elements come
together, different combinations that encode
any linguistic system. But how complex do
the rules need to be to fully capture actual
human language? Depending on the answer, we
can get an idea of how complex natural language
really is compared to other systems, like
the artificial languages that we program computers
with. That’s because once we can nail down
the kinds of rules we need to account for
it, we can fit them into the Chomsky hierarchy,
named after linguist Noam Chomsky.
The Chomsky hierarchy is like a ranking of
different types of rules, along with the different
kinds of languages they’re able to generate.
The rules at the very bottom of the hierarchy
form what are called Type-3 grammars, with
more sophisticated rules showing up the higher
you go on the list, all the way up to Type-0.
It’s like golf, where smaller numbers score higher.
So where does natural language fit? Well,
let’s start with the simplest possible grammars,
the Type-3s, and see if it fits there. These
work using rules that look a bit like this.
That first rule works by outputting some word
in the language you want to generate, followed
by a placeholder which can be filled in with
more words later on. So this rule could start
by producing something like “If phrase”.
Then, we can apply the rule again to replace
that phrase part with another word, followed
by yet another phrase; so, we get “If you
phrase”, “If you pass phrase”, and so
on.
We can keep going like this as long as we
like, until we get to the end of our sentence
and apply our second rule. It says we can
fill in that last bit with just one word,
so we can get “If you pass the Turing test,
then you’re conscious”.
The sort of language generated by a Type-3
grammar is known as a regular, or finite
state, language. Since we managed to get
our rule to produce an English sentence, maybe
English is a finite state language!
But as you probably guessed, it isn’t quite
that simple. For one, the number of "if"s in
a sentence like that usually has to match
the number of "then"s. So, you can’t have
sentences like “If if you pass the Turing
test, then you’re conscious” or “You
pass the Turing test, then then then you’re
conscious”. And our rules are too simple
to account for that - they’ll let you stack
things up wrong, or give you garbled results.
Now, there is a bit of flexibility here, and
with the right amount of tinkering, we could
probably debug our rules and sidestep these
specific cases. But the problem only gets worse.
Since any clause in a finite state grammar can be replaced with any other clause of the
same sort, we can replace the first half of
that sentence with something like “either
you pass the Turing test, or you have a mind”,
so the end result winds up as “If either
you pass the Turing test or you have a mind,
then you’re conscious”.
And again, nothing’s stopping us at just
one swap, so we can also get a sentence like
“If either you pass either the Turing test
or a psychological exam, or you have a mind,
then you’re conscious”. In general, you
need as many ors as there are eithers; there’s
really no room for error.
This is working so far, with a lot of effort,
but it looks like we really need some kind
of counter to keep track of how many "if"s and
"either"s we use, or the rules might just start giving
us nonsense. Our grammar needs some kind of
memory, which a Type-3 grammar just doesn’t have.
So, what about the next level up — Type-2?
Well, these grammars give us a lot more flexibility.
Some of the many rules allowed by a Type-2
grammar look like this.
This new set-up looks a lot like our old one,
just with a couple of extra rules that make sure
our sentences don’t run amok with too many
ifs or too few ors. With just a couple of extra,
more powerful rules, we’re able keep these
pairs consistently matched up. And we can
get a good sense of just how complex language
has to be, too, since now we can compare it
to other Type-2 languages! For instance, the
artificial language of sentential logic, which
we talked about a while back, falls into this
class, along with most computer programming
languages. And our X-bar rule for sentences
does too! So we’re all done! Right?
Well… not so fast. As it happens, there
are patterns in language that can’t be generated
by these rules, either. Take a sentence like
“Charles says that we helped John paint
the house”. Like we saw before, we can keep
extending this sentence indefinitely, so “Charles
says that we let the children help John paint
the house”, “Charles says that we let
Mary let the children help Peter help John
paint the house”, and so on. It’s not
a problem for English, since our existing
rules are more than capable of handling it.
But when we translate these sentences into
a dialect of Swiss German, things get tricky.
For one, the verbs “let” and “help”
each need their own kind of specially marked
noun phrase — an accusative one, in the
case of “let”, and a dative one, in the
case of “help”. Overall, there need to
be as many accusative nouns as there are lets,
and as many dative nouns as there are helps.
And there’s no upper limit to how many lets
and helps there can be.
What really throws a wrench into the works,
though, is that in Swiss German, all of the
nouns actually get grouped together at the
front of the embedded clause, before all the verbs.
To visualize what’s going on, you can think
of the overall pattern like this: sandwiched
in between the beginnings and ends of each
of these sentences, we always find some number
of accusative nouns, followed by some other
number of dative nouns, followed by however
many lets you need, followed by however many
helps.
And because of this bunching together of all
the nouns and all the verbs, when we draw
lines connecting each noun to each verb, those
lines cross. For this reason, Swiss German
is said to have crossing dependencies. And
that’s a real problem, since it can be mathematically
proven that this kind of language can’t
be generated by a Type-2 grammar - check the
video description if you want to know how.
Because of that, we’ve concluded that natural
language is at least mildly context-sensitive.
The rules of the next grammar up the list,
so Type-1, can reference contexts, and they
generally look like this:
And we’ve actually seen these kinds of rules
already. When Chomsky was originally developing
his theory of grammar, he proposed adding
context-sensitive transformations to solve
the problem posed here. In modern linguistics,
this has taken the form of syntactic movement —
something we now see just about everywhere,
from question formation to raising verbs,
and more! So since natural language needs
the type of rules that only Type-1 systems
or higher have, we know that’s what
we want.
But what about Type-0? Well, as it turns out,
Type-0 grammars end up being just too needlessly
powerful to apply to human language, like
using a supercomputer to do your taxes. Type-0
grammars have rules which are “unrestricted”,
which means that you can have any combination
of symbols to the left of the arrow, and any
combination of symbols on the right. There’s
no real template for these sorts of rules,
except something like “A→B”, where “A”
and “B” can be literally anything. So
although we can make machines talk to each
other with unrestricted grammars, when you
line that up with what we know of natural languages,
it just doesn’t compute. That’s why Type-1
has just the right degree of complexity.
So, we’ve reached the end of The Ling Space
for this week. If you solved the enigma of
language, you learned that we can think of
a language like a set of rules that come together
to generate all of its sentences; that we
can rank those rules, using the Chomsky hierarchy;
and that human language lands somewhere in
the upper half, in terms of power and sophistication.
The Ling Space is produced by me, Moti Lieberman,
and directed by Adèle-Elise Prévost. This
week’s episode was written by Stephan Hurtubise.
Our editor is Georges Coulombe, our music
is by Shane Turner, and our graphics team
is atelierMUSE. We’re down in the comments
below, or you can bring the discussion back
over to our website, where we’ll have some
extra material on this topic.
Check us out on Tumblr, Twitter, and Facebook,
and try dropping by our store. And if you
want to keep expanding your own personal Ling
Space, please subscribe. And we’ll see you
next Wednesday. Vidimo se uskoro!
