>> Well, everybody welcome
back for episode 2,
of what kind of computation
is human cognition?
A brief history of thought.
Let me begin by reminding
you that I've been
asked to give a super compressed,
but still a fully
coherent synopsis of
my graduate course at
Johns Hopkins of Foundations
of Cognitive Science.
If you kindly enter your questions
during the talk into the Teams chat,
I would appreciate it.
Our MSR summer intern, Tom McCoy,
who taught this course with me
in the spring of this year,
has kindly agreed to
curate the chat channel.
You'll recall that we started
the lectures by noting
the historical swinging
between building models of
intelligence on symbolic
versus neural computation,
and claiming that this
instability will only be
resolved by adopting
neurosymbolic computation.
Today's episode will in
fact lead up to an argument
for neurosymbolic computation
of a particular type.
Let me remind you also of
Dennett's geographical metaphor on
which pretty cohesive
set of theories of
cognition clustering
around the East Pole
have grown out of a
rationalist tradition
based in symbolic computation.
In addition to many authors very too
deep in history to make
it onto this chart,
we discussed these authors last
time and we'll just go over
some other chart on the
remaining authors today
devoting more attention to
Western views than last time.
You will also recognize
from last time this
list of major theoretical
issues in cognitive science
of dimensions of the
theory of cognition space
on which different proposals differ.
We discussed the green ones
in episode 1 and we'll
continue the orange ones
in today, episode 2.
Episode 2 is going to start off with
the same issue as episode 1 did.
One that divides
contemporary East and West.
The status of formal rules.
As now let us take
the form of rules or
concept characterized by formal
definition as in mathematics,
or our knowledge and concepts
characterized by examples.
A major scholar supporting a West
Coast position favoring examples
is Eleanor Rosch's legendary
work on categorization.
Concerning the vertical dimension
of hierarchies like this taxonomy,
she showed that not all
categories are equal.
There is a distinguished level
which she called the basic level.
This is the most inclusive level at
which category exemplars
have high similarity.
In this case, it's here, chairs.
Concerning the horizontal level,
depth dimension, she highlighted
typicality structure.
Within the category some
members are more typical
than others unlike Boolean
mathematical category membership.
She showed that basic
level categories and
prototypical category
exemplars are psychologically
privileged and recall and
reasoning about them as
faster and more accurate.
Many people took the results
on typicality as evidence
against the possibility that
human categories are defined by
Boolean properties and evidence for
the importance of
individual exemplars as
opposed to general
categories in cognition.
For instance, in phonology,
exemplar theory defines
the phoneme b as
the set of speech segments in
memory that have been labeled as b,
as opposed to the East
Pole's discrete category
defined by Boolean features
such as plus consonantal,
plus voiced, plus labial and so on.
Although I do try hard to
maintain an optimal degree of
pedagogical neutrality and
presenting this material,
I do feel compelled to
point out the fallacy here.
Even in Boolean category,
some members are more
typical than others.
Suppose the category
grandmother were really defined
formally as a woman who bore a child,
who bore a child, and
that membership in this
category were all or none.
You're either a grandmother
by that rule or you are not.
It would still be true that
some grandmothers are
more typical than
others in that some
grandmothers more than others,
share properties with
other grandmothers.
A 30 year-old grandmother is
less similar in age, in health,
in hair color etc to
other grandmothers than is
your prototypical 65-year-old,
gradual, white haired granny.
Nonetheless, the idea
of the categories in
particular and knowledge
in general takes the form
of stored examples rather than
general formal definitions and
rules is very popular
outside the East Pole.
Rosch's work had strong
influence on George Lakoff,
one of the founders of self-proclaimed
cognitive linguistics.
A diverse collection of
Western theories that have in
common a strong opposition to
East Pole, especially
Chomsky Linguistics.
There are other anti formal
linguistic theories too,
that go by the name
functional linguistics,
which place the
function of language as
the center of linguistic theory.
But here it's connections to
cognition that are emphasized.
Lakoff argued that human
linguistic categories,
like noun or the phoneme b
displayed typicality effects,
that there are therefore not
definable by formal definitions,
and that therefore linguistic
knowledge cannot be a formal system.
Therefore, we can toss
the formal approach of the
East Pole to linguistics.
On the general question,
what is the appropriate level of
formality for cognitive theory?
Is human knowledge a formal system?
Cognitive linguistic says, "Hell no."
Lakoff opposed formal approaches
to cognitive science in
general arguing that
cognition must be
understood through
how it is situated in
the body and in the
rest of the world.
He says, "Do meaningful
thought and reason concern
really the manipulation of
abstract symbols and their
correspondence to an objective reality,
independent of any
embodiment (except,
perhaps, for limitations
imposed by the organism)?
Or do meaningful thought and
reason essentially
concern the nature of
the organism doing the thinking-including
the nature of its body,
its interactions in its environment,
its social character, and so on?"
Of course he certainly advocates
the second view there.
There are some experimental
results that I find
pretty spooky concerning embodiment.
Two days ago, at the
cognitive sciences,
the eminent developer
of this, Janet Werker,
described [inaudible] for a student,
Alison [inaudible] showing that
six month old English learning babies
normally react to the difference
between two Hindi consonants,
dental d and retroflex b,
which they've never heard,
can no longer do so when
a teething toy is placed
in their mouth that prevents
their tongue tip from moving.
It's the curling of the tongue
tip that distinguishes these D's.
These, by the way, are
kids too young to babble.
But if a different teething toy
is used that prevents
their lips for moving,
rather than their tongue,
this sucker shown here, but now,
they lose their
ability to distinguish
the contrast between D and
B in their native language,
where it is now the lip movement
that makes the difference.
This change is evidenced
not only in their behavior,
it's visible, as shown in this work,
in their brain activity as
recorded by
electroencephalography, EEG,
[inaudible] in these regions here,
where a distinction is made for
one contrast and not for the other.
Another argument concerning
the formality of cognition is
the famous Chinese room
thought experiment.
This is a thought experiment
that takes issue with
the strong AI position that
following the right program,
like one that passes the Turing Test,
is sufficient for a system to have
human-like understanding
and intelligence.
I've omitted the name of the author,
John Searle, because sadly,
his fame came in part at the
expense of his students,
some of who were driven out of
the field by sexual harassment,
for which UC Berkeley
stripped him of his status as
emeritus professor in 2019.
The thought experiment consists of
a closed room containing
a person and two slots.
Into the input slot goes a
story written in Chinese,
along with a question about
the story also in Chinese.
From the room's output slot
pops the answer to the
question also in Chinese.
The person inside, Joanna,
knows only English,
and she follows the
instructions in a book,
a manual in English,
telling her what
strokes to write when
she sees certain
strokes in the input.
The book says nothing about the
meaning of Chinese characters.
The intuition that this thought
experiment is designed to
deliver is that there is nothing
here that understands Chinese.
Joanna doesn't.
Obviously, the instruction manual,
a mere book, can't
understand anything.
The conclusion then follows,
manipulation of meaningless symbols
following even a program that passes
the Turing Test can't
be sufficient to
generate understanding
or intelligence.
After all, executing a program that
simulates a firestorm doesn't
burn the building down.
Why should executing a
program that simulates
thought generate understanding
or intelligence?
Believers in strong AI must
think that intelligence,
unlike fire, lives in another world,
a mental world disconnected
from the physical world.
They must be dualists in the sense of
[inaudible] , but we, of course,
are not dualists, we are
physicalists who believe that
cognition is a property
of the physical brain.
So this paper argues,
it must matter that
it's a physical brain
that is performing symbol
manipulation in our case,
that's what creates intelligence,
so it's about time we bring
the brain into the story.
We turn finally to the remaining
answer to our initial question,
what is cognitive
science the science of?
Historically, the
east pole position in
our position to the Chinese
Room argument is that
cognition is all about
the program running in
the mind or the virtual
machine that runs the program.
It doesn't matter what physical
device implements [inaudible] ,
however, is that cognitive science
should pay careful attention
to brain dynamics.
In his landmark book,
Donald Hebb laid the foundation
for modern cognitive neuroscience.
He attempted to link neural
properties to abstract
mental properties and behavior via
his notion of cell assemblies.
These are circuits of
interconnected neurons
that mutually excite each
other and allow neural activation to
reverberate for extended
periods of time,
enabling memory and sequential
thought processes to link
an input stimulus to a
later output response.
These cell assemblies come into
existence through what we
now call Hebbian learning.
The well-worn neuroscience cliche is,
cells that fire
together, wire together.
Hebb didn't take credit
for this actually.
He says the general
idea is an old one,
that any two cells or
system of cells that are
repeatedly active at the same time
will tend to become associated,
so that activity in one
facilitates activity in the other.
Hebbian learning is the foundation
on which all subsequent
neural learning algorithms rest.
Above the level of cell assemblies is
the structure of
brain areas, patches,
that most believe are
at least somewhat
specialized in the type of
information they process;
low level visual properties,
object level visual
properties, speech, etc.
At the level of general
domains of information then,
neural representation is generally
believed to be localized.
This is what supports the field of
cognitive neuropsychology
pioneered by
Alfonso Caramazza and others.
This field studies the
cognitive impairments
induced by brain damage,
most often strokes.
Strokes typically kill neurons in
a local portion of the brain
and this often corresponds,
well enough, to a functionally
defined component of cognition.
Here's a picture of a
functional architecture for
reading single words aloud
from Caramazza's paper.
Some stroke patients lose
this functional component and
cannot pronounce non-word
sequences of letters.
If they can't look up the
pronunciation in their lexicon,
they can't pronounce the word.
Whereas patients who lose
that box can sound out
letter sequences using
letter to sound rules,
but can access the meaning of
words from their written form,
although possibly, they can
through their spoken form.
It's patterns of cognitive
deficits of this sort
across individual
patients that constitutes
cognitive neuroscience
data for inferring
the functional
organization that yields
unimpaired cognition
in intact brains.
Though functional components or
information types seem to be
localized to brain
areas, within an area,
information is often encoded by
overlapping distributed
patterns of activity,
rather than the activation
of single neurons.
On the largely outdated
grandmother cell view,
recognizing a face as Kim's,
in the upper face,
with the red dot.
Recognizing a face as Kim's involves
activating the Kim neuron as
opposed to the blue Sandy neuron.
DiCarlo and Cox, in this paper,
argue against this view and argue
that all the representations of
Kim's face in different poses
fill out the red
manifold of states here,
in the brain area called IT,
each point on the
manifold being a vector
of activation distributed
throughout the area.
They view the job of the
processing of the neural pathway
for object recognition that's taking
the crumpled manifolds
of activity patterns in
low level visual areas that
encode low level visual features,
and mapping them to
manifolds like these in
the middle that can be
separated by hyperplanes.
How does bringing the neural
level into cognitive science
affect the computational
analysis of cognition?
What are its implications for
this fundamental computational issue?
Should a cognitive theory
be a computational theory,
viewing the mind as a
machine that takes data in,
process it according to an
algorithm, and outputs behavior?
In his brilliant 1979 book, Godel,
Escher, Bach, and more
concisely in this paper,
Doug Hofstadter lays out
an interesting Midwest position that
cognition is computational
at the hardware level,
but not at the abstract
software level.
Symbols are instantiated as mutually
activating neural assemblies,
like Hebb's cell assemblies.
These are active symbols that
drive the computational
dynamics themselves.
They're not passive symbols as in
standard symbolic computation,
where symbols just sit around until
some external processor moves them.
The dynamics of symbols does
not follow an algorithm.
The neurons that make
up the symbols do.
This makes the abstract
symbolic level much more
fluid than in standard
discrete symbolic computation.
A year after Hofstadter's paper,
the PDP books broke on the scene.
Note the subtitle,
Explorations in the
Microstructure of Cognition.
Consistent with Hofstadter's view,
the PDP approach strove to take
computational cognitive modeling
down closer to the neural level.
As mentioned at the beginning
of episode 1 of this series,
children's over-regularization
of the regular rule
for forming the past
tense of English verbs,
producing comed for the
past tense of came,
after previously having
correctly used come,
was a poster child
phenomenon supporting
the view that learning
language is learning rules.
In this chapter, Rumelhart
and McClelland say,
"There is no denying
that rules still provide
a fairly close characterization of
the performance of our subjects,
and we have no doubt
that rules are even more
useful in characterizations
of sentence production,
comprehension, and
grammaticality judgments.
We would only suggest that
parallel distributed
processing models
may provide a mechanism
sufficient to capture
lawful behavior without requiring
the postulation of explicit,
but inaccessible rules."
As with Hofstadter,
they're claiming that
micro-level elements,
neurons, obey algorithms,
and that approximate
abstract macro-level rules
emerge from the global
system behavior,
but the system cannot be described
precisely as algorithmically
following rules.
They're presenting an alternative
to postulation of explicit,
but inaccessible rules of language.
Who are they arguing against?
Production rule learning theories of
the CMU School, like John Anderson's,
do treat rules as symbolic
data structures that are
examined and revised during
learning algorithms,
a clear case of explicit
rules in my view,
but [inaudible] don't commit
to their rules being explicit.
They characterized
linguistic competence
as describable with rules,
as rule-governed,
while remaining uncommitted
on how that is achieved.
They say nothing about
whether there are
algorithms to manipulate these rules.
They say nothing at all
about algorithms, period,
so they can remain
agnostic as to whether
linguistic rules are
operated on explicitly.
This is where it's convenient
to work exclusively at large,
most abstract computational level,
abstracting away above the level of
algorithms as theoretical
linguistics historically has done.
Now, on the flip side,
neural models commit to a low-level
description of processing,
but do not generally make explicit
claims on more abstract levels.
In 1991, McCloskey pointed
out that connectionist models
do not come with the theory
that they instantiate,
even a weak sense of theory,
such as a functional architecture,
like the one we saw
from Caramazza earlier.
Which details of the model are
relevant to the hypotheses
being tested and which are just
arbitrary implementation choices?
This is especially difficult when
the model's structure
arises from learning.
All of this makes it difficult
to assign credit and blame for
what parts of the model
are responsible for
its successes and what
for its failures,
and makes it very difficult to
compare to previous theories,
which aspect of those theories
are falsified by this model.
Two years after Rumelhart and
McClelland's paper on the past tense,
Steven Pinker and Alan
Prince published a review,
which I think they hoped
would have the same effect as
Chomsky's review 30 years
earlier of Skinner's book,
which we talked about in episode 1.
The goal being to convince
the East Pole that
the latest rebellion on
the West Coast had been
completely crushed.
Early in the paper,
they fired off this bullet cartridge
loaded with a dozen bullets.
All the failures,
one after the other,
of Rumelhart and
McClelland's analysis.
To show the inferiority
of the PDP account
relative to the symbolic orthodoxy
that they were defending,
they had to overcome
McCloskey's observation of
the incommensurability of PDP
models and bonafide theories.
They had to construct on
this McCloskey style theory
for the Rumelhart and
McClelland model to assign
credit and blame for aspects of
the account that can be compared
with symbolic-rule theories.
This was a tour de force
and not quickly done.
This paper was 120 pages long,
more than twice as long as the
55-page paper they were critiquing.
I would say it achieved what
Pinker and Prince hoped it would.
The damage was so severe
that 30 years later,
[inaudible] found it
worth the effort,
which I mentioned in episode 1,
to resuscitate the PDP
approach to morphology with
a paper in [inaudible] in 2018.
Pinker and Prince's paper was one of
three long papers in a special
issue of the journal cognition,
which was the official party
sanctioned mouthpiece
of the East Pole.
The papers in this crushed
connectionism issue were predictably
convincing to those
of the East Pole and
profoundly irritating
to those of the West.
The lead article defended
the philosophical sector
of the East Pole,
authored by Jerry Fodor,
the epitome of East Pole
philosophy, and Zenon Pylyshyn,
a prominent flag bearer for
East Pole cognitive science
generally, especially envision.
Fodor and Pylyshyn's
paper is frequently cited
nowadays by deep learning researchers
working on compositionality,
one of four key cognitive
properties they discussed.
Their critique assumed the context
of philosophical semantics,
where the meaning of a thought
is its truth conditions,
so that the meaning of the
thought that Kim loves
Sandy is the proposition,
loves Kim, Sandy,
which is true in any world where
the person who referred to by Kim
stands in the love relation with
the person referred to by Sandy.
In this expression, Mu
is the meaning function.
Fodor and Pylyshyn argued that
an adequate theory of
cognition must explain
why cognition necessarily
displays four key properties.
Systematicity; the set of meanings
of all possible thoughts of
an agent is the set of all
well-formed propositions built
from the sets of predicates and
individuals known to the agent.
Thus, if an agent can
think the thought
with meaning loves Kim, Sandy,
they must know the predicate love and
the individuals Kim and Sandy,
so the agent can also think
the thought, loves Sandy, Kim.
The second property of cognition
that a theory must explain,
according to Fodor and
Pylyshyn is productivity.
The set of an agent's
possible thoughts and hence,
of possible meanings, is infinite.
The third property is
compositionality, meanings compose,
so that the meaning of
SVO is the meaning of
V applied to the meaning
of S and the meaning of O,
as we saw directly with Kim loves
Sandy in the orange box above.
Finally, the last property
was inferential coherence.
The set of possible inferences
from a set of beliefs, B,
that a cognitive agent
possesses is the closure of
a set of the agent's known rules
of inference operating on B.
Here, I've given a
rational reconstruction
of Fodor and Pylyshyn's argument,
which is stated much less precisely.
Fodor and Pylyshyn claimed
that the language of
thought theory meets this
quadruple requirement,
where the language of
thought [inaudible]
Fodor asserts that an agent's set of
thoughts is a formal
language generated by
a grammar over thought constituents
like love and Kim and Sandy.
Their position, of course,
is that the structure in
representations is discrete.
Their grand conclusion was that
connectionism must implement
a language of thought or it can't
meet their adequacy condition.
I had a prolonged debate
in the literature
with Fodor and his collaborators,
spanning over a half
dozen publications.
In a 1994 paper,
I pointed out that the language
of thought only entails,
therefore, target properties when it
is supplemented with
several strong assumptions.
The language of thoughts
grammar must be assumed to be
recursive or you don't
derive productivity.
The language of thought
semantics must be
presumed to be composition
or you don't get that.
Inference within a
language of thought
must be unconstrained
rule application.
The point is that these
assumptions simply
stipulate as bald assumptions.
The truth of their four conditions.
Their symbolic theory
doesn't explain them,
it just describes them.
It doesn't derive them from any
principles other than themselves.
I also explicitly presented a
connectionist theory that truly does
solve the explanation of
systematicity and productivity.
As for compositionality
and inferential coherence,
it would be necessary to
supplement the connectionist
theory I gave with
assumptions that are directly
parallel to the ones that they have
to make for the language of thought.
From that, I want to pass to
a much more recent debate,
an East-West debate in Trends
in Cognitive Science in 2010.
In this debate, the
main issue was whether
cognitive theory is
bottom-up or top-down.
Tom Griffiths, Nick Chater,
Charles Kemp, Amy Perfors,
and Josh Tenenbaum argued that
probabilistic models over
symbolic hypothesis spaces enable
exploration of a variety of
representational structures,
not just the vector
spaces of neural nets.
With Marr's top-down approach,
computational modeling
does not depend on
correctly characterizing
implementation level structure,
for example, the
structure of neural net,
before knowing what higher-level
capabilities it enables.
Another important argument
from a later paper,
by Sam Gershman, Eric
Horvitz, and Tenenbaum,
namely that these models allow
decision-making to optimize utility,
including computational costs
and computational constraints.
This work, of course,
defends the proposition that
cognitive representations have
discrete structure, in large part.
On the other side, Jane
McClelland, Mat Botvinick,
David Noelle, Dave Plaut,
Tim Rogers, Mark Seidenberg,
and Linda Smith advocate
a bottom-up approach,
starting with a continuous
dynamical computational system,
one type being neural networks,
which allows learning rather than
the modeler to determine the
structure appearing at higher levels,
and does not depend on correctly
characterizing computational
level structure in advance.
Rather, it commits to fundamental
computational mechanisms.
They say, "From our perspective,
the hypotheses, hypothesis space,
and data structures of the structured
probabilistic approach,"
defended by Griffiths et al,
"are not the building blocks
of an explanatory theory.
Rather, they are sometimes
helpful but often misleading
approximate characterizations
of the emergent consequences
of the real underlying processes.
Likewise, the entities over which
these hypotheses are predicated,
such as concepts, words,
morphemes, syllables,
and phonemes, are
themselves best understood
as sometimes useful but sometimes
misleading approximations."
Neural networks representations are
continuous vectors having
no explicit structure.
What do McClelland et al
mean by the structure
emerging at higher levels
in their neural models.
This structure is not explicit,
but it can be extracted by
the modeler by analyzing
properties of the network's
continuous vector representations.
Here are some examples.
First, from the paper itself,
this model was discussed.
In this model, activation
flows from left to right.
It completes relational fact
triples such as in this picture.
Canary can question mark,
which completes to
four properties: grow,
move, and so on.
The representation layer learns
a distributed representation
of the entities,
with structure in this space
organizing gradually as shown
over these successive
snapshots during training.
Eventually, it learns a phylogenetic
hierarchy revealed through
a hierarchical cluster analysis of
the vectors in the
representation layer.
Here are the same vectors plotted in
2D via principal components analysis.
The structure of the hidden space
here is rearranged depending on
which relation is being queried.
The rearrangement reflects
the similarity of
the entities as arguments
of the particular relation.
In the top box here,
the relation is, and
in the bottom box,
reconfiguration for the relation can.
This is pretty much what they gave
as an example of structure emerging
at higher levels in
this paper in 2010.
But I'll mention some
examples from another era.
This is from Jeff Elman's
epic making paper,
Finding Structure in Time,
where he introduced what we now
call the recurrent neural network,
with input symbols
coming in one at a time,
the network simply
predicting the next symbol.
The structure here is in
the mean squared error of
the prediction over time,
that's what's plotted here.
Segmenting the string at the peaks of
this error does a pretty good
job of parsing you up the words.
You can see that each peak
a new Word begins, mostly.
Many years ago, a boy and a
girl lived by, and so on.
You can see that it groups
her and boy together
as many such learning models end
up doing as a matter of fact.
But any event, the point is
that the model has extracted
structure in the stream of letters
which were generated
by sentence grammar.
A year later, Elman
presented a similar model,
now taking as input words and
sentences generated by a
simple grammar with embedding.
These are the first two principal
components of the hidden state,
starting at the lower right,
while processing the sentence,
boy chases boy, who chases
boy, who chases boy.
The encodings of who
cluster together,
but each encodes the depth of
embedding of the
clause it introduces.
This enabled the model to learn
subject verb agreement across
intervening embedded clauses.
It learned to correctly match boy
with chases and boys with chase.
Another example of the
structure of this sort,
I take it, that McClelland
et al were referring to.
I'd like to close with
an argument of my
own that coordinating bottom-up and
top-down theory development allows
progress on one of
the most abstract problems
of cognitive science,
the theory of universal grammar.
That is to say, giving
an account of the properties
that the grammars of
all human languages
share and precisely
characterizing the limited ways
in which they may differ
from one another.
At the same time,
this coordination of bottom-up and
top-down theory development allows
progress on neural
network architecture.
This argument was made in a
two-volume book published in 2006.
Later work here took
on unification of
discrete and continuous structure
in cognitive representations.
This, which is my final
slide before the summary.
This is an elaboration of the
summary slide of the talk,
you can watch on YouTube
for more information.
It was presented at
the inaugural meeting of
the Society for Computation
in Linguistics.
The main claim of the talk
is that strong synergies
exist between the grammar of
symbol structures and
neural network computation,
that is, integrating
the East Pole and
West Coast perspectives
could strengthen them both.
This is the neural symbolic approach
to modeling intelligence
that I advocate.
The idea of integrating these
perspectives vertically,
of viewing symbolic and
connectionist models is
characterizing one and the
same computational system,
at two different
levels of description.
On a micro-level,
where the micro-structure
looks like a neural network,
and viewing the same
thing at a higher level,
the macro-level, we
see simple structures.
This is modeled after
the same cross level structure
that you see in say,
computer science.
Where virtual machines that are
folder manipulators or graphics
processors or NLP programs.
These macro-structural machines are
emergent from microstructures
which contain a bit processing.
In physics, we also see that
the macrostructure properties of
macroscopic materials like
the gas in this tank.
The property is that
this material has at
this level of description include
things like temperature and pressure.
But the same exact system when
described at the
microstructural level,
consists of a set of molecules,
they don't have
temperature, nor pressure,
they have velocity, momentum.
The relation between the
two is mathematically
prescribed by statistical mechanics.
We're emulating this
relationship here,
but now we're linking
a discrete symbolic structure
with a continuous
lower level structure.
What we have embedded in
the continuous vector space is
approximately discrete structure.
This structure in neurons,
approach to neuro-symbolic
computation,
which I favor as opposed
to neurons in structure,
which is a symbolic model
and replaces some of the
components with neural networks.
But the second approach develops
hybrid architectures as currently
advocated by Gary Marcus.
Now, I am also a fan of this type of
neural-symbolic computation is
possible to take advantage of
the power of symbolic structure right
away using these hybrids in a way
that's only gradually emerging
in the structure in
neuron's approach.
It's a more long-term paradigm.
The neurons and structure
approach has been well
developed at MSR by Alex
[inaudible] collaborators,
achieving remarkable results in
automatic program synthesis.
After the vertically
integrated system,
at the lower level,
the micro-level, we have
neurons that follow
precise algorithms but cannot
be meaningfully interpreted.
The macro-level symbols can
be meaningfully interpreted,
but they do not follow precisely
specifiable algorithms.
This instantiates
Hofstede's general picture.
With that background out of the way,
let me summarize the four
theoretical synergies
developed in this work.
First, from the top-down.
An important part of
symbolic structure is
compositional structure.
Possible to take compositional
structure which is
natively given at the higher
level and push it down to
realize confidential structures and
activation patterns which is
exactly the pattern we
were looking at here
in this structure on the
right under photo leaves.
It's possible to realize
these compositional
structures in activation.
Themselves have a
compositional structure.
These are special vectors called
tensor product representations.
I don't have time to go
into the details there,
that's another talk of its own.
Suffice to say that this top-down
route enriches the theory of
neural computation with the notion of
these embedded compositional
structures in the form of TPRs.
On my side of
the higher level description
is grammars are all
about well-formedness,
and possible to take
the notion of well-formedness
down to the neural level 2.
You can actually realize
grammatical well-formedness
as a function called harmony,
which evaluates the
degree of satisfaction of
micro-constraints that are
embedded in the network weights.
The harmony or network state is a
measure of its well-formedness,
processing in neural network
seeks out states of high harmony,
avoid states of low harmony.
This can all be viewed as a
constraint satisfaction process where
individual weights constraints on
the co-activation of the
units that they connect.
Again, we have enriching
the lower level,
a concept of well-formedness,
which can do a lot of the
work that grammars do in
the higher-level
theory as [inaudible].
Now, in the bottom-up direction,
these networks that I
was just referring to,
optimize harmony, there activation
spreading algorithms are
optimization algorithms that
seek out states of high harmony.
This conception of what you do
with the notion of well-formedness
can be pulled up to the higher
level to give new grammars.
Because this is not how
grammatical well-formedness
was conceived of before.
Now it's conceived of as something
that you need to optimize,
and harmony is brought up
to the grammatical level.
It is used to define a new
grammar called harmonic grammar.
Where the grammatical discrete
structures are those that have
maximal harmony or harmony
is a numerical measure
of well-formedness now.
Grammarcality now becomes
equated with optimality.
There are
deterministic and probabilistic
versions of harmonic grammar.
What harmonic grammars negotiate
is self-grammatical
constraints that say,
for example, sentences
should have subjects
or syllables should
start with a consonant.
These are constraints that
populate the grammar,
but they conflict with each other,
and the only way to determine what is
optimal is to assign
different strengths
to the constraints so that
the stronger constraints have
priority over the weaker constraints.
In harmonic grammar,
the notion of strength
is formalized in numerical weights.
So we actually compute
a numerical penalty score for
violating some set of constraints,
and that's what we try to
optimize in order to find
the grammatical structures.
However, very shortly after harmonic
grammar was created in 1990,
Alan Prince and I moved
the notion of grammatical
strength out of
the realm of numbers into
a more symbolic realm.
What resulted was optimality theory
in which a grammar as before,
consists of a bunch of constraints
that often conflict with each other,
and grammatical structures
are the ones that
best satisfy these constraints
given their relative strengths.
But now, the relative strength
is encoded in a hierarchy such
that higher constraints have
absolute priority over
lower constraints.
But it need not be constraint
ranking the hierarchy is
specific to a language,
but the constraints are
not, they're universal.
This is where the contribution
to universal grammar arises.
It provides a way of
specifying the content of
universal grammar as a set of
constraints that operate to determine
well-formedness in all
natural languages.
The only way that languages can
differ is in how they priority,
rank those universal constraints.
It is symbolic.
Actually, I don't have it
on the slide here but in
terms of trying to
reconcile what were
previously viewed as
conflicting neural network approaches
versus symbolic approaches,
it happens that Prince and I
met at a debate where I was to
defend connectionism and he was to
defend ruled-based linguistics.
He, being the prince
of Pinker and Prince.
But we saw value in
each other's differing
positions and worked together,
and Optimality Theory
was the result of that.
The last connection between
the levels here, again,
a bottom-up connection, is
that neural network computation has
continuously varying
vector representations.
This kind of continuities,
often called gradients
in cognitive science,
the representations are
graded rather than discrete.
This also can be pulled up to
the higher level to
introduce the notion of
gradient symbol structures.
These gradient symbol
structures can be
evaluated by harmonic grammar
so you can still define
what is a grammatical structure
even though the structure consists of
partially active symbols arranged
in different positions
in the structure.
I believe that the contributions to
both macro theory and micro theory
here provide strong
evidence for the value of
this integrated bottom-up,
top-down research strategy.
The value of integrating the best of
symbolic and neural
network approaches
to the study of intelligence.
That's was the value of one of
the contributions Optimality Theory.
You don't need to take my word
for it but thanks to a quote
that I just got from Tom McGuire
a few days ago. Thanks, Tom.
The paper called the "On
the Rapid Expansion
of Optimality Theory
at the end of the Twentieth Century."
This is not the same
[inaudible] by the way.
As I said, this is based on
the summary slide of this
talk you can see on YouTube.
The talk goes up
considerably further into
fifth point which is application
of these ideas in NLP,
this is worked on at Microsoft.
But that is a whole other
talk which unfortunately,
we don't have time for today.
But [inaudible] into that,
but in this video,
and you can see many of
the relevant papers on
archive authored by me and
collaborators here at Microsoft.
Let me summarize. We've seen
from many different authors
a number of claims, arguments,
[inaudible] example.
Specific examples and
not just general rules,
figure centrally in human knowledge.
Knowledge is situated and embodied,
even speech perception
in infants is strongly
connected to the motoric
aspects of speech.
It matters what physical machine
an intelligent program is running in.
Micro-level, algorithm-following
neurons can self-organize to form
macro-level symbol systems with
structure and dynamics that are
flexible and non- discrete,
and can only be approximated by
discrete symbolic algorithms.
Cognition requires
compositionally structured,
but not necessarily
discrete representations.
Finally, I was arguing that
development of cognitive theory
needs to integrate bottom-up
and top-down methods,
and that doing so can lead to
unified neural-symbolic styles of
computation that strengthen both
micro and macro-level theories.
I thank you for your attention.
Happy to guest to questions.
>> Tom, actually, let
me get in charge of
the moderation because the time is
[inaudible] some people
have a lot of questions.
Just raise your hand
and then [inaudible]
Yosh was first.
>> Hey, Paul, I'm Yosher.
I just had a question.
I've been looking into the
OpenCog system recently,
and it has a similar idea of
combining symbolic system and
the neural net system to create AGI.
What are your thoughts on that?
Have you ever worked with the system?
>> Unfortunately, I'm not
familiar with this system,
and I look forward to learning
about it from the others.
But let me just say that
the contrast that I was
trying to make between
neurons and symbols versus
symbols and neurons is surely
relevant here because most
the accessible kind of
integrated systems are
these hybrid systems for which
it would make sense to develop
an API that combined primitives from
symbolic computation and
neural network computation,
which then can be woven together,
essentially, as I was saying,
by conceptually building a symbolic
[inaudible] and replacing
certain parts of it,
certain functions, for
example, with neural networks.
But I'm afraid I don't know
enough to say more than that.
Does that have any relevance?
>> Yeah, that does make sense.
Is your idea relating
for the hybrid approach?
Is it towards the AGI of the
dream that we all have in AI?
Is that what the hybrid system
is capable of achieving?
Is that something you think
the hybrid system
capable of achieving?
>> Capable of achieving what? AGI?
>> AGI. Yeah.
>> Despite the tradition of making
naively over-optimistic
predictions about
the success of one's approach to AI,
I think I will refrain from that.
I think this structure
in neurons approach that I was
outlining in the last slide offers,
in the long-term, our best
chance of combining the
best of both worlds of putting
neural and symbolic computation
together in a way that allows
the strengths of
neural computation to
overcome the weaknesses of symbolic
computation and vice versa.
>> Got you. It makes
sense. Thank you so much.
>> If anyone else had questions,
you are free to raise your
hand on teams. Gonzalo?
>> More than a question, it's
a fun series of thoughts.
You mentioned in a
point you made about
the need for describing all possible
thoughts of being infinite.
I was wondering if you
can elaborate this more.
Do you mean urgent thoughts
or just grammatically
correct thoughts?
Couldn't help thinking
about [inaudible] is
universal library or all possible
works of literature exist,
yet the library sign it.
I don't know if there is a
decision to be made about
generating sentences that
are infinite in quantity,
but the ones that are
sensible in quantity.
>> Well, the comment about
infinity of thoughts
was in the context of
this very strong east pole view that
photo imposition were advocating
where they are working at
a level of abstraction that
has no regard for any kind
of practical or sensible
to what a mind might
actually be able to do.
It's an idealization.
I personally think it's
a valuable idealization.
I've learned a lot
from trying to see how
neural networks can try to
approach that idealization,
but it's not that work
that's going to tell
you the right way to take into
consideration the limitations that
human cognitive architecture
imposes on thought.
There's certainly plenty of
other places to look for that,
but not with photo impositions.
The infinitely many books
in a finite library.
Since in the spirit of
your considering fun things to think
about along the lines
of issues in the talk,
I can say that there are
approaches to embedding.
Let's say that's the most
workout case I know,
embedding symbolic
strings in vector spaces,
which use fractal encodings.
So basically every time a new
symbol is added to the string,
the representational space available
shrinks to a region inside
what you had before,
and the next symbol comes
in, it shrinks again.
But within the region accessible
to you when each symbol comes in,
you do the same thing to
store which symbol it is.
In a bounded vector space,
you can embed arbitrarily
long strings.
Of course, it requires infinite
precision to be [inaudible].
>> That's cool.
>> [inaudible] would have been
happy with that, I don't know.
>> Thank you.
>> Sure, thanks for the question.
>> We also had a couple questions
left over from last time.
I think a couple of those
might be interesting
since we have little
bit of time here.
First, Matt had asked,
"It seems like there
are two possible goals
for the types of work you showed.
Number one, for the scientific
value of understanding
people and their
methods for cognition,
or number two, to inform
a time set building
something intelligent
are both directions,
understanding the
cognitive scientists.
Does the work you presented
here tend to have as a goal,
one of those as opposed to the other?
>> At the birth of cognitive science,
AI was a central component
of the coalition of fields
that were coming together.
I think there was really wide
acceptance of the idea that
intelligence or cognition could be at
a sufficiently abstract
computational level
that what you're talking
about could just as easily be
implemented in a
computing system or in
a brain or maybe not just as
easily, just as difficultly.
At that time, many of the founders of
AI we're in this
cognitive science realm.
I don't think there was
a lot of attention,
I very well could be wrong
to making an effort to
distinguish the study of cognition
in machines and in people.
But as a field of computer
science in general,
within academia turned from
more theoretical to
more applied emphasis,
I think AI came along with it.
Now it's not so common to find
prominent researchers in AI
who think that what they're
working on is a large part trying
to understand in cognition,
as well as built smart machines.
However, there are some
people obviously think that.
The kind of work that I talked
about through most of the
time in these lectures
was by people who I would
say felt that contributing
to the understanding
of human cognition was
a non negotiable aspect of
doing cognitive science.
Although the contribution could
be pretty indirect and abstract,
as I was saying the original founders
of the field conceded to be.
I do think that just as building
[inaudible] implemented
systems to demonstrate that
your ideas actually work in
some scaled up
environment is important,
which is the computer
science culture.
It's also important
more empirical side
of cognitive science that
when you build models,
you should look at human
data and you should see
whether your model can be
informed by human data,
like ways in which it
fails to predict it
and ways you can improve the model or
ways in which your model
makes new predictions
that can drive new experimentation.
For many people in
cognitive science is
probably almost non negotiable
item to do cognitive science,
but you must be somehow
speaking in a direct way to
empirical data about humans
doing cognitive tasks.
I don't know if I've actually
addressed that question,
I danced around it more than
I addressed it, I suppose.
But that was for Matt, you said?
>> Yes.
>> To say Matt, anything about
this question or answer?
>> Yeah. I think it's a good answer.
I was mostly curious,
I agree with you.
Most people working
on AI are probably
not as interested in
the human connection,
but when you're speaking especially
last week about models showing,
you showed architectures
with different kinds of
memory and different kinds
of functioning units,
this was sort of a
model for cognition.
So I guess I'm just curious for
cognitive scientists is the goal
to understand how humans do
cognition and build models
then have it informed by
humans or is the goal to
understand cognition so we can
build something intelligent.
It sounds like you're
saying that first one,
at least the first time,
and then maybe a
direction for the second.
You totally answered my
question. Yes, thank you.
>> But also remind
you of something that
[inaudible] raised the
question for you last time,
which Eric Horvitz followed up
on subsequent communication.
The end of cognition under
constraint have seen a lot
of progress in the paper
that he was an author on
that I mentioned today,
for example, a lot of it
is viewed that cognition
under the kind of constraints that
short-term memory imposes on
the human system that
Simon worried about,
could be very different
computation that
results than systems
that have nothing
resembling that kind of
constraint [inaudible].
>> Awesome, thank you.
>> As I said last time also,
people working in
human computer systems
tried to make that a feature,
not a bug, to try to make good use
of the differences between what
current computers do
well a version of
their very different constraints
to decompose tasks appropriately.
>> There's a question from Ida.
>> Yes. Thank you so much for this
wonderful series of lectures.
It's been such a pleasure.
>> Thank you.
>> My question concerns
the universality
of the language of thought.
If I understood the earlier
'80s ideas correctly,
it is supposed to be universal.
Whereas Stan DeHaan during
the [inaudible] 2020,
when he was giving his award lecture,
he was showing that a
huge body of work where
geometric processing in humans
seems to have similar
language-like properties,
but neither the brain areas
converge with the
natural language areas.
There was the possibility
that there might
be not a universal
language of thoughts,
but there are these different
separate pathways to it.
I'm wondering what is the perspective
of your harmonic model or
how do you see it now in this
synergy between the East and West?
>> Well, that's a fast thinking
question. Thank you, Ida.
I would say that
the approach that I am pursuing,
the structure in neurons approach,
aims to try to provide
deep learning systems with
the primitive materials
that they need so that
they can learn to build their own
symbolic computational systems.
It seems that following what Stan
was saying in that [inaudible]
was you could very well imagine that
the primitive materials
that you use to build
your systems will not
be domain universal.
So deep learning system given
the primitives that are ideal for
building numerical cognition,
might end up proceeding in
a rather different way than
another deep learning system given
the raw material to construct
hierarchical syntax.
I think that there's a nice
opportunity to pursue that
domain independent structural
cognition hypothesis.
Does that make sense, Ida?
>> Absolutely. Thank you so much.
>> I think a good question
to end on would be the one
that Jason [inaudible]
left last week.
He gave a general form
and a specific forms,
you could answer either.
The general form was what's
your modern perspective on
the intellectual history
you've recounted?
The more specific form was of
the scholars you've discussed,
were they usually disagreeing
about actual mental behavior?
Questions like whether
language and reasoning behave
categorically or what kind of
inductive bias humans have,
or were they maybe
arguing about how would
to accurately describe the behavior.
For example, when should we
use the word intelligence,
or when should we use a
model with latent variables?
>> Well, taking the more
specific question first,
I do think that maybe only the
Turing paper of those that I
discussed that I would
consider addressing how we
should talk about cognition.
I think the other work if I'm
not missing anything in
my mental run through.
I think the other work was
all about how the mind
works and the scientific
description of
its workings as opposed to
how we should talk about it
in a less scientific context.
I think these are
genuine debates about
what's the right formal description
of what's happening inside our heads.
This question, well, last week,
I didn't say much about it.
When Jason ask this
question about what
my modern perspective on
these classical debates,
shall we say, would be.
But in today's presentation,
I try to show at the
end what I thought
about that which is that I do think
that what emerges from this back
and forth pendulum swing is that
the compositional structure of
symbolic representations
is a fundamental part
of what makes cognition possible,
and likely what makes AGI
possible, in my view.
But at the same time,
using continuous spaces to embed
these compositional
representations in is
crucial for allowing the magic
of deep learning to operate.
I do you think that there
are ways to pull together
a coherent subset of views that
were often conceived of
as in opposition before,
but which can actually
be reconciled with
an appropriate mixture of techniques
and perspectives from both
the East and the West.
That's a general answer
to a general question.
>> Thanks. It looks like
Yosh has a question,
and actually we do
have a couple minutes.
So Yosh, if you want to ask him.
>> Hey, Paul. I have
one more question regarding
the hybrid approach.
How much efficiency, or how do
you think the hybrid approach can
help us with the transfer
learning problem that we
are seeing in deep
neural nets right now?
We train the models
on a narrow subset.
Then we want to transfer the
learning to a different domain,
maybe related domain, but
even unrelated domain.
Do you think the hybrid
approach can help
us solving that problem somehow?
>> Well, I do see a way.
I do think that symbolic
representations in virtue of
the abstract general
semantics of the symbols and
their structures do provide
a more universal foundation for
performing different tasks for
using the knowledge
in different ways.
As opposed to knowledge that has
been learned by deep
learning in the context
of performing some
particular task and with
the input from some
particular dataset.
I do think that the value of
these abstract
representations will play
an important role in bringing
the neuro-symbolic systems
to a level of general purpose
knowledge that we're looking for.
I think that traditional symbolic AI
had general purpose knowledge.
It's just that the knowledge it
had wasn't really
the right knowledge.
That is to say the squeezing
of our human common sense and
knowledge of language at all
that trying to squeeze it
into the constraints of
discrete representations
and computation meant that
the knowledge that was being
utilized in traditionally
high systems really did not
have the richness to cope with
the complexities of the
problem with the way that
modern distributor
representations begin to do.
I do think that what
was right was that
the form of the knowledge
made it very general.
But what was wrong was that
the formal character of the
knowledge was very limited.
Well, [inaudible] The
neuro-symbolic approach
holds the potential for
bringing together the
generality that we want without
the confines and restrictions of
discrete symbolic representation.
>> Got you. Thank you so much.
>> Thanks, Yosh.
>> We're out of time. I guess,
Matt, if you wanted to-
>> Yeah.
>> Thanks, Tom. I
appreciate you doing this.
>> Thanks, Tom. That was great.
Thank you again, Paul.
Really appreciate the
lectures and your expertise.
Anyone else who has questions,
I know Paul is a very
approachable person.
I'm sure, he'd be happy to
take questions offline through
e-mail. Let's thank Paul again.
>> Thank you so much
questions for your questions.
You're welcome to learn again.
>> Goodbye everybody.
Thank you, Paul.
>> Goodbye, everybody. Take care.
