- Hello.
So, yes, as you've just heard,
I'm a computer scientist.
I'm not an artist, not a philosopher.
So, I'll be approaching,
this talk will come from,
kind of, a different angle, perhaps,
than the other ones we've heard today,
and in some sense I'm gonna, I'm gonna
try to look at, almost, like,
an opposite perspective.
So, I think what you'll
hear quite a lot about today
is how artists can use AI,
can use neural networks
and these technologies in a creative way,
and what I'm gonna talk about
is how the internal processes
that are going on in these
models that generate images
and sounds and things like
that, how those processes,
in some way, resemble
human creative processes.
So, the talk's called
Madness in the Machine,
Generative Models in the
Context of Human Creativity.
And I should stress, these
views, these, kind of,
this description of
artificial intelligence
and of these kinds of
algorithms and networks
isn't necessarily the one that most people
in the field, for example, of
deep learning would reference.
This is more just, this is kind of my way
of thinking about how
these algorithms operate,
and I'm just kind of
hoping maybe it will help
to give you all a peek under the bonnet
and what's going on,
sort of, under the hood
with these algorithms.
Okay, so, I think one
really fundamental thing
is this idea of this, kind
of, interplay or contrast
between structure and noise
or you could call it order and chaos,
madness and reason,
things that are fabricated
versus things that are found.
You know, this feels
like all of creativity
somehow involves this dichotomy.
This is why I've called
it madness and machine,
and so we have these two quotes
here from famous writers.
"The most beautiful things are those
that madness prompts and reason writes."
and then we have, you know,
this quote from Hemingway.
"The good parts of a book
may be only something
a writer is lucky enough to overhear
or it may be the wreck
of his whole damn life."
And I think what they're both getting at
is that there's something
kind of arbitrary
about the source of inspiration.
There's this idea that
something comes along
that really wasn't under
the artist's control.
It wasn't reasoned or planned by them,
but is then transformed into art
by the internal processes of the artist,
and I think there's
definitely a parallel here
with the way we attempt to generate things
using algorithms.
So, yes, like, the ideas that, you know,
the inspiration is random,
but the outcome is,
so I use this word a lot, structured.
I don't know if that's, kind
of, a familiar word here.
It, sort of, means that there's a pattern.
There's a shape going on here
that isn't just something
that would occur randomly.
Okay, so it's all very
well to talk like that,
but, you know, what does that
actually mean in practice?
So, you know, underlying
most of these AI techniques,
these generative models,
what you really have
is good, old-fashioned statistical models.
So, you have machine learning algorithms.
What do they do?
They find structure by, more or less,
by fitting probability
distributions to data.
And so we have here an example
of a very simple probability distribution.
I've probably just took
this off Wikipedia.
It's a two-dimensional
Gaussian distribution.
By the way, I should say,
this talk is gonna be very non-technical.
There may be one or two equations
or graphs like this in it,
but it's absolutely not important
to worry about the details here.
This is just a, sort
of, illustrative example
of what a simple probability
distribution looks like.
Okay, so, given that distribution,
which is basically these,
kind of, contour lines
that appear in the graph,
you can then start generating things.
You can generate data
by picking random points
on that graph.
That doesn't sound very
creative so far, right?
All we've got is a little dot
appearing at some x y position.
So, how do we get from that
to something that resembles
a creative process?
Well, one of the key ideas to get around
is that when you're
creating anything real,
anything interesting, you're
working in what we call,
what we tend to refer to
in some machine learning
is a high-dimensional space.
A high-dimensional space of data.
What does that mean?
So, you know, when we talk about,
so, it's actually, in some
ways, it's kind of misleading.
It just comes back to our,
kind of, Schumann tendency
to want to reason about things
in, kind of, geometric space,
to talk about two dimensions
and three dimensions.
Once you go beyond three dimensions,
calling it a space is,
you know, kind of, a,
it's just a metaphor.
It's an analogy.
So, anyway, what do I mean
by high-dimensional space?
So, you know, there's this famous story,
The Library of Babel.
I'm sure a lot of you have read this
where Borges imagined this idea
of a library that held all possible books.
So, he says, you know,
the shelves register
all the possible
combinations of the 20-odd
orthographical symbols, a
number which, though vast,
is not infinite.
So, that's important.
There's not an infinite
number of possible books,
but it's just ridiculously
large, and so, you know,
my calculation, sort of,
back of the envelope calculation here,
if we allow up to 10,000,000
letters, essentially,
or 10,000,000 characters per book,
and there's 25 possible characters,
then you have this
number that is, you know,
impossibly larger than
the number of particles
in the universe.
So, to actually build this
library is kind of, you know,
is clearly unthinkable.
Another example here, you know,
this famous Michelangelo quote.
"Every block of stone
has a statue inside it,
and the sculpture is,
the task of the sculptor
to discover it."
So, he's taking this big
block of Italian marble
and finding all of these things inside it.
And so, again, if you imagine, somehow,
parameterizing the space
of possible statues
that you can get from a block,
lots of ways you could do that.
You could think about
in terms of, you know,
voxels or something, or you could think
about the actual chiseling strokes
taken by the sculptor.
Obviously, there's an awful lot of,
there would be a lot of parameters.
It would be a very high-dimensional space.
So, you know, these are things
that, unlike the example here
where we can just plot the whole space
on a piece of paper, 'cause
it's two-dimensional,
we can't do that with any
kind of realistic data.
We can't even enumerate them.
We can't count.
We can't write down all of
the different possible options
that we have.
So, what do we have instead?
Well, you know, ultimately
what algorithms have,
actually is the same
thing that people have,
which is a generative process.
So, the simplest example
of a generative process
in artificial intelligence, I think,
is an autoregressive model,
and so the kinds of autoregressive
models I'm talking about,
these recent ones, are generally,
they're generally neural networks.
They're parameterized by neural networks.
That point, in some
sense, isn't so important,
but the basic concept is that
if you've got this very
high dimensional data.
So, you've got this, you know, whole book
full of possible words.
What you can do is, you
can split that data up
into a sequence of very small pieces.
So, you look at each word individually,
each letter, each pixel, each voxel,
and then you predict each piece
conditioned on the previous one.
So, very simple example,
if you have a language model
that is modeling one word at a time,
the probability of the sentence
the sky is blue
can be split up into this,
what we call the prior probability of the,
then the probability of sky
given the probability of is,
given the sky, and so on.
And then, once you have
these probabilities,
you can start generating data by guessing
what will come next one step at a time.
So, if you've got this distribution
of all the possibilities, you can, you can
pick a sample from that.
You can choose something.
You know, maybe after the word the sky,
the word is is quite likely.
There's a few other words
that could also, you know,
you could also think of.
The sky over.
The sky above.
Something like that.
All of these will be given
a different probability,
and you're gonna pick something
according to that probability,
and then once you've picked it,
you, kind of, treat the guess
as if it's a real thing.
You feed it back into the system,
and then you guess what comes next,
and this is why I think that
generative autoregressively
is a little bit like
hallucinating or dreaming.
It has this flavor of there's something
that you just made up, and,
then, what having made it up,
you're gonna treat it as if it's real
and your brain is going to deal with it
as if it's something
that actually happened.
And so what is this?
This is a very nice illustration
of an autoregressive model
that I found on the internet.
It's by someone called Dan Katz,
who kindly gave me permission to use this,
and what it illustrates is that underneath
these kinds of autoregressive models,
there's basically a tree of possibilities.
So, what he's done here is trained,
actually quite a simple language model.
So this is not, sort
of, a state of the art
powerful neural-network language model.
It's a simple, I think,
trigram language model,
which means every word just depends
on the three words before it.
Or is it the two words before it?
Something like that, anyway.
So, and it was generated by
taking a novel by Jeff Noon.
Channel Skin, I think it's called.
Or Channel One Skin, I thought it was.
Anyway, trained on that novel,
he gets a set of, sort of, probabilities,
an autoregressive model that, kind of,
has a certain degree of, a certain belief
about which words will follow which.
So, rather like the example we saw before
with the generated poem, what this does
is reveal the kinds of patterns
that there are in the
text that it's trained on.
Anyway, he then generated
this sample, the sentence
the whole forest had been
anesthetized and so on.
I think, maybe, the first
few words were provided
by the author, and then it
was allowed to generate,
and what this image illustrates
is all of the other branches,
all of the other possibilities
that could've been made
when that sentence was generated.
So, you can see there's
these certain points
where there's different choices
that the model could've made.
At the point when it said anesthetized,
it could've said recorded, issued, set-up,
you know, lots of other things,
all of which would have been,
at least in this kind of,
this world, somewhat consistent
with the start of the sentence.
Her temporals could've glowed
instead of her temples wired
and so forth, and essentially
any one of these branches
could've been followed to generate
some other piece of text,
and I just find this helpful
to keep in mind when thinking
about these types of models
that this is, there's always
this tree underneath them.
And so, if we go back to this
idea of auto-suggesting text
if you're using, you know,
for example, your Gmail app,
and you see that having
typed part of a sentence,
it suggests a continuation
for that sentence.
What that's basically
doing is taking a tree,
a much more, you know, richer and more,
kind of, complex tree than this one,
and picking out branches that are,
things that are particularly
high probability,
things that are really likely
for you to actually type.
Of course, it could show you far more.
It could give you, you know,
it could completely
overwhelm you with choices,
but then, you know, it would become,
it would get in the way, basically.
So, you know, that basic
model, which is very simple,
is now, you know, been taken quite far.
So, I don't know how many of you have seen
this unicorn story that's
become quite a, sort of,
sort of a poster child
for generative models.
This is from earlier this year
from a group at Open AI.
They trained a very large,
very powerful, neural network
and a huge amount of
data, and then, basically,
followed the procedure I just said.
So, they gave it, in this
case, they gave it a paragraph.
It says in a shocking finding
down to the unicorn spoke perfect English,
and then they allowed it to free generate
conditioned on that paragraph.
So, it's essentially
asking the network to say
okay, if this is how this article starts,
how should it continue?
And it continues with this, you know,
remarkably, sort of,
consistent description
of how the scientists
named this population.
They're Ovid's Unicorn.
He's from the University of La Paz,
which fits with the fact that
it's in the Andes Mountains,
and it kind of, what sort of,
it's a little bit hard
to get across the flavor
of why this is interesting.
And so, it's very easy to
look at a generated image
and say, okay, that
looks like a real image.
That's amazing.
Generating text doesn't seem
like such a difficult thing.
Like, a person can sit down
and write a whole bunch
of text very quickly in a
way that we can't generate
a photo-realistic image very quickly,
but actually text, in some ways,
is harder for these algorithms
because in order to write
something like this,
it needs to remain, kind
of, contextually consistent.
It has to keep around the fact
that it's talking about unicorns
and the Andes Mountains,
that they spoke English,
and importantly, you
know, this is something,
obviously this is something
it had never read before.
Right?
There were no articles in the training set
about unicorns in the Andes Mountains.
So, it has to, given some
new piece of information,
has to, kind of, assimilate it
and stay, you know, sort of,
relatively consistent with it,
and another thing that
I think is interesting
is when we look at
these generative models,
the mistakes they make
are super interesting,
'cause the mistakes reveal something
about what the network's,
sort of, world model is,
what it's learned.
So, there's some odd things here.
One of them is that the
unicorns are four-horned,
and so, you know, the one thing, that
the, sort of, defining
property of a unicorn
is that it has one horn.
Even the name, you know,
it's got uni in it.
So, there's something like,
that feels like a mistake
that a person just
wouldn't make, you know,
and a person would
just, sort of, you know,
automatically have, kind
of, visualized these things
containing a single horn.
Okay, so.
So, rewinding a little bit here
to some work that I did, and this is,
by deep-learning standards,
this is basically a prehistoric paper.
This is from six years ago,
and the, kind of, the
point here was just to show
that this same method of
autoregressive prediction
can be used with continuous data.
So, it's not just about
predicting one word after another.
It can be used for things
that, when I say continuous,
you know, things like
for images and sounds
and things that don't,
things that don't have,
consist of a set of discreet words,
and so in this case, this
was for online handwriting,
and the data was recorded as
a set of penned coordinates
as a person moved their, as a
person wrote on a whiteboard.
And what this image shows
is the set of predictions
made by the network.
So, after each blob in this image,
the next blob is the network's prediction
for where it thinks it will go next,
and the heat map shows, you know,
where the red parts are where
it thinks it's most likely
that the pen will be next, and so forth,
you know, further out, and
it shows that there's a very,
just by doing, just by
following this simple protocol
of, basically, whatever I've given you,
try to predict what comes next.
It shows the network learns
a lot of rich structure.
It's amazing just how much you can learn
simply by predicting data.
And, yes, again, you can think about this
as a branching tree.
So, this illustration shows, okay,
given a particular series of pen strokes,
which is this, kind of,
nonsensical stream of letters,
you can look at possible continuations,
the branching points,
the place where the tree
could of grown elsewhere
according to this predictive model.
You can also, of course,
and the original, sort of,
motivation for this work
was that you can also, what
we call condition or control
the predictions made by the system
by feeding it some real text.
So, this is a text, sorry, text synthesis,
handwriting synthesis,
sorry, program essentially.
So, the idea is that, you know,
given some, given some
text from his travels,
It Might Have Been, the
network then, actually,
produces this, produces these images,
and there's an online demo.
I think I haven't put the url on here.
It still works after six years.
It's taken some work to keep it going
on one of the University
of Toronto servers.
Anyway.
This idea of conditioning is important
because this comes back to the notion
of how do artists use
these systems as tools.
So, basically, you know, as
long as they receive some input,
in this case, the text,
then you have some control
over what they do next.
You know, in the case of the text itself,
the input is, sort of, like a prefix.
It's a start to the story
and then you let it continue,
but of course, there's so many other ways
we could think about
controlling these systems
and having more, you know, more ability
to modulate what they do, but
I think that will be covered
a lot more in the other talks today.
One thing we can also
say is that, you know,
if you take away that conditioning signal
and just allow it to free-generate,
you get this kind of nonsense,
but within this nonsense
you see little islands
of, kind of, structure.
So, you see, like, the
word the, or the word he
appearing in there, and what that shows
is that even just at this
level, low level of prediction,
the network has learned some
of the high-level structure
in the system, which is something
that is remarkably difficult to do.
Okay, I'm gonna skip
through the next few slides.
I think I've already.
- [Automated Voice] It's
never men in their songs.
- Oh, tell you what, I'll go back.
It's too much fun.
So, a bunch of you may
have heard of WaveNet,
you know, that basically
took the same principle
and applied it directly to,
you know, raw audio data.
Now, if I can get this to play.
- [Automated Voice] The
first commercial flights
took place between the United
States and Canada in 1919.
- [Alex] So, this now used
in production by Google,
often if your phone is speaking to you,
then it's running technology
like this under the hood,
and, of course, you know, that means
that the people designing
these technologies
can, you know, they can choose
what your system says to you.
To some extent, they can
modulate how it says them,
and then get it, you know,
transferred into speech,
and, obviously, there's
lots of creative things
you can do there.
Because it's a raw wave form,
you can also apply this
to music, for example.
(piano music)
That's from a network
trained on a whole bunch
of classical piano music,
and you can also run this
thing, again, take away.
- [Automated Voice] It's
never men and their songs
on a designed.
(automated gibberish)
- So, once again you hear something that,
you know, it's gibberish, but
you can hear little islands
of words in there, and so this is, again,
this is what happens.
Maybe the key point to take away from this
is just how important that
conditioning signal is.
That's the thing that
provides the, kind of,
the high-level structure for these models,
and they, sort of, fill in the gaps,
is basically how they work.
Okay, so one thing you might ask is
how would you do this with images?
Right?
It's all very well with
a sequence of words
or a sequence of audio samples.
Images aren't sequences.
Well, what you do is you
turn them into the sequences,
and you predict all the
pixels in a particular order
one after another.
So, going through this image
here that's being generated,
at each point in time, it's predicting
each one of these pixels
and using the kind of,
all of the previous ones
that have already been
predicted as context.
This bar on the right
shows the, basically,
the prediction distribution.
Like, it shows what the
network thinks will come next
in terms of this is for a
particular color channel
within the image.
It's just a number between zero and 255,
and what's sort of interesting
about it is, you know,
sometimes it makes these predictions where
it's, what we call, multimodal.
There's a strong probability
that the next thing will be zero.
There's some probability
that it'll be 255.
So, probability of a complete absence
of a particular color or a probability
of it very strongly being there,
and that's important because
that's the branching point.
These are the decision points.
So, I t's back to the tree again.
The point's when it has these
two very distinct decisions
are the point when two different images
or two different pieces of
images kind of branch off
from one another and you
get, it has the possibility
to create different, you know,
a diversity of things.
Okay, so again, you know,
we saw before about how
quickly GAN's have advanced.
The same thing is true
with autoregressive models.
So, this is from 2016.
This is what they looked
like trained on image net.
This is what you got on image net in 2018,
and this is what you now get in 2019.
So, it's basically reached
the point of photorealism
in a few years.
And I should credit, these are all works
by my colleagues at Deep Minds.
One name you might see
in a lot of these slides
is Aaron Vandanord, whose
really been pushing the envelope
as far as these autoregressive models go.
If you're still not convinced
that it does faces as well,
I think it's safe to say
that these are now indistinguishable
from real images.
So, and one of the key things that makes
this most recent model so powerful
is that it's got a, kind
of, hierarchal structure.
So, it does this next step,
this tree autoregressive modeling.
It does that at several
different levels in the system,
and that's important, again,
because that way you can kind of
create the high-level structure first
and then fill in the lower
level details afterwards,
and these details, they turn
out to be very important
as far as the actual, sort of, end product
of these models go.
So you can think of them, I mean,
maybe I'm stretching
things a little bit here,
but you could think of this as something
like the creative process of the model,
as the thing, the person,
when you're designing these models,
that's the thing that
you're kind of creating.
Right?
So there's no, mathematically speaking,
it's hard to argue
whether it would be better
to do it like this hierarchically
or whether it would be better
just to predict one pixel
at a time as was being done before,
but intuitively, we have this idea that,
well, if I was going to
create a complicated picture
like that, I'd want a, sort
of, high-level structure first
and then I'd want to go
down and put in the details,
and that's how, this is, I think,
that's the sort of illustrative example
of how it is that, in some sense,
human-like creative processes
end up in these algorithms,
because the people
designing the algorithms
appeal to their own,
kind of, human intuitions
when they're making
these sorts of decisions
about how they should work.
(student coughs)
And another interesting point
about that particular model is
that it's relatively diverse
in what it generates.
So, this is the ostriches from, you know,
trained on the image that ostrich class.
Real images are on the
right, and you can see
they're very diverse.
Some of them are close-ups
of ostriches faces.
Others are, you know, ostrich,
an ostrich behind a metal
fence, two ostriches
or three together, or whatever,
and if we look at the generated ones,
it's not quite as broad
or as diverse as that,
but it is still, quite, there's
a lot of variation there.
I don't know if you can see,
but there's also one ostrich
with two heads.
So, again, this is back to this,
there's a, it's interesting
the kinds of glitches
that these systems make.
So, you think that it has
a pretty consistent model
of this bird and that it's
generating all these images,
but then it can make
something with two heads,
and this is sort of an
ongoing area of research
is trying to understand what
the actual world models,
what the actual representations
that these are underlying these systems.
What have they actually
learned about the world?
They've clearly learned something,
or they wouldn't be able to give us back
things that look real,
but anyway, the point from this issue
of image diversity is that
they're really trying to,
kind of, learn something
not just about, you know,
a particular image of an ostrich,
but really this whole set of images,
which, when you think about it,
is an extremely difficult thing to do.
I mean, there's this infinite number
of possible images for
all of these, you know,
types of animal, types of
building, and so forth.
And other, some other
models, this is a GAN model.
I'll talk about GANs a little
bit more later, I think,
but it tends to be a
little bit less diverse,
and so it kind of comes back to the idea
that the role, the goal of the GAN
is just to create something
that looks very convincing
as opposed to actually attempting
to model a complete distribution.
I mean, this can be, there
are, people have done,
added, specific things to address that,
but I find that there's an
interesting parallel there,
again, with, you know,
human artists in a sense
that it's not necessary.
So, a writer doesn't
have to be able to create
any sort of character.
It's sort of, it's sufficient for them
to be able to create
certain types of characters.
You know, there's always a sort of a bias
that every artist has
towards the particular things
that they seek to represent and I think
it's an interesting thought
that, you know, unlike the, sort of,
traditional statistical
machine learning approach,
an artist is not attempting to, kind of,
ingest everything and reflect
everything exactly as it is.
It's fine for them to focus on one.
In fact, it's sort of necessary for them
to focus on one particular thing,
in very vague, kind of,
high-level terms there.
So, you know, what does this mean?
You know, we talk about
autoregressive generation
as being just this series of things
that are somehow dreamed or
imagined one after another.
I feel like there's a sort of a, you know,
a direct parallel there with, you know,
what you can think of as
something like autoregressive art.
So, I think about Picasso
drawing, you know,
his horses or whatever
with a single pencil line.
Jack Kerouac typing out
On The Road in a few days
on a continuous scroll.
There's this quote by Allen Ginsberg.
"First thought, best thought."
This idea of spontaneity being important
in the creative process.
There's this idea of
surprising yourself somehow
because, and it's again,
back to this notion
of the mixture of structure and noise,
and actually there's a story
that some of you might have heard
about this Picasso, or one
of these Picasso doodles.
It's probably apocryphal,
but I think it's quite interesting.
Supposedly, he drew, he
was asked to draw a sketch
on a napkin in a restaurant,
and he quickly drew something like that
and then handed it to
the woman who asked him
and told her that will be,
you know, 10,000 francs
or something like that,
and she said, oh, but it only took you,
you know, five seconds.
That's crazy.
And he said, no, it took me my whole life,
and obviously what he meant was
his whole life had been
spent training himself
to be able to do this.
And this is kind of, again,
the process that we see
in these types of models.
It takes a huge amount of
time, a huge amount of data
to train these models.
This is where the GPUs and the TPUs
and all the powerful
computational hardware comes in.
There's a massive amount of
number crunching involved
in learning all of this structure,
but then once you have
it and you've embedded it
in the system, at least
if you're willing to,
kind of, operate in this very, kind of,
this autoregressive forward way,
then it's very efficient.
You can generate things very rapidly.
Okay, so, really though,
you know, most novels
aren't written like that.
They're not written like On The Road.
It's rather this, you
know, slow, painful process
of, you know, planning,
sketching, writing, editing.
It's very iterative.
There's, like, one little
thing changed at a time,
and then you move on.
There's this great quote from
John Berger that I like here.
He says, he's talking about
his own writing process,
and he says, "I modify the
lines, change a word or two
and submit them again.
Another confabulation begins."
There's this idea that the words
are talking among one another.
Every time you add a new word in,
that changes the discussion
and you have to talk
for quite awhile longer
before you can, sort of,
you can decide on what
that word should be.
There's a, perhaps somewhat tenuous way
that you could see this
type of process reflected
in generative models as well.
This happens when those models
are not auto-regressive.
Its not just a forward
chain of probabilities,
but rather you have
mutual interdependencies
between many variables.
So, x depends on y, and y depends on x.
In that case, generating
those things together
is not so simple.
You can't just pick one
and then get the other,
because they both influence
one another back and forth.
So, you end up with
something that's more like a,
sort of, a network of things
that constantly interacts
with one another.
So, the Boltzmann Machine
is one of the, sort of,
I guess, one of the canonical examples
of that type of model.
Here's another one.
This is from 2015.
This was a model called DRAW
that, again, was worked on by
my colleagues at Deep Minds,
and it's a very interesting
one in that it sort of,
the way it works internally, it builds up
what we call the canvas inside the network
by iteratively, so this, the
movement of these pink boxes
shows how the attention of the system
shifts around while it's
creating these images.
Obviously, these are faces up here,
and these are from street view
house numbers on the right.
And the process of, kind
of, finishing these things
is iterative here.
Like, it maybe stays in
one part of the region,
and kind of crystallizes that
and then moves to another
region of the image afterwards.
So, there's something,
if you think about it
as a creative process, it's
a little bit more holistic.
It's a little bit more
like take this sketch
or this blurry kind of idea
and gradually, kind of, make it manifest
into something concrete.
Another model, a more recent
model that's, kind of,
works on this principle is
what's called Flow-Based Models,
and the idea here, so
apologies for the maths here,
but it's really, you
know, if you just look
at this little curve on the left,
the idea is to take something very simple,
as simple, you know, in
this case, a white noise,
you know, standard Gaussian,
and then iteratively transform it
until you end up with
something complicated,
and so I think the key thing here,
it's not autoregressive, it's not the idea
of actually outputting one pixel at a time
or one concrete thing at a time,
but it is still iterative.
So, there's still this notion
that you can't, somehow,
you know, if someone asks
you to create an image,
it's hard to, sort of,
create the complete thing
in one go.
It's rather, it's a process.
It's a series of steps.
Here are some faces
created with a Flow-Model.
Interestingly here, you can see, you know,
these faces are realistic,
but they're noticeably
more synthetic looking
than some of the other ones,
the ones I showed before,
some of the other ones
that we've seen.
They're kind of, it's
trained on celebrities,
so their faces are, sort
of, probably, you know,
improbably perfect to start with,
but these are just,
these look too perfect.
They look airbrushed.
They look symmetrical.
And that's actually like a, sort of,
a common failure mode, in some sense,
of AI, of generative models is that
they have a tendency to take the mean.
They have a tendency
to average things out,
and getting away from that
and creating things with,
I think what someone once
referred to as digital dirt.
I think that's been a
really, sort of, key part
of when artists started
to get really interested
in generative models, when
they started to create things
that looked, kind of,
messed up and interesting
as opposed to things that
looked, sort of, too perfect.
I'll go through this, you
know, talk about GANs,
where it's, we already mentioned GANs.
Probably you've heard about it a lot.
In some ways, what I think is interesting
about GANs is, you know, what
we've talked about so far,
if we think about,
again, go think back to,
you know, how these things
reflect human processes.
The models we've seen so far are kind of
like the positive mode of creativity.
They just generate stuff,
but we know that when we
are making things ourselves,
there's a critic there.
There's a self-conscious critical phase
where we look at what we've done
and we decide if it's good
and we modify it accordingly and so forth,
and in some ways the GAN bakes that notion
into its structure.
So, it has two networks, a
generator and a discriminator.
The generator is trying to make data
that will fool the discriminator
into thinking it's real
and the discriminator's
trying to do the opposite,
trying to distinguish it from reality.
So you can think of it
as a, sort of, battle
between a forger and a detective
or, maybe a little more
broadly, an artist and a critic.
I mean, that's a little bit unfair.
A critic isn't, an artist isn't attempting
to convince a critic that
something's real, but anyway.
There's this basic opposition
between the two things
and that's, it's proved
incredibly fruitful.
So we have, so these were the, sort of,
the images, I think, that first
caught everyone's attention
in terms of photorealism being,
coming from neural networks
and it's funny to think
that was only, you know,
two years ago.
Right?
It seems like a long time, to me at least.
But this is, you know, it's still,
it just shows how powerful is this idea
of explicitly having a model
that is attempting to, kind of, to refine
and to criticize the thing
that the system is doing.
Anyhow.
What I often think of,
yes, I think I've got time
to play this clip here.
So, this is from the movie
Solaris by Andrei Tarkovsky
in the 70s, and, very briefly, the plot
is that a bunch of scientists
are orbiting the planet Solaris
and finding that it's creating things
out of their memories.
It's sort of like a big generative model,
and at the very end of the film
the hero finds himself back
in what appears to be his
childhood home with his father.
He's looking through the window of it,
but there's something not
quite right with the scene.
All right.
It's raining on the inside of the house,
and the rain appears
to be hot or something.
It's steaming as it hits.
So, there's something,
it's regenerated reality,
looks very convincing,
but it's not quite right,
and sometimes I think of this
when I think about mistakes made
by generative models, where as they get
more and more powerful, the mistakes get
more and more high-level and
more and more interesting,
and they reveal more semantic information
about what it is that they've understood
and what it is that
they haven't understood.
(intense music)
So,
(chuckles)
and so from there, I'm
just gonna skip past.
So, these, you know, I had some slides
about possible, sort
of, now outmoded slides
about artistic tools based on GANS,
but I think you're gonna hear so much
more interesting stuff about that.
I think, maybe, one last
important point I'd like to make
is, you know, I've talked
about these models so far
as if they're being somehow
creative in the human sense,
all they're, all they are trying to do
is to make things that look or sound real.
Right?
It's just very similitude.
It's just take, give me
an image that I can't tell
isn't a real image, but
that's not what we, you know,
we expect a lot more than
that from human creativity.
So, these just, you know, some canonical,
and I apologize for my lack of diversity
in my choice of artists here.
I'm just, I'm lazily reaching
for canonical examples,
and of course these tend to be
dead, white, European males.
Lots of other people
you could put on here,
but the idea was just, you know,
this is non-figurative art.
This is, you know, the 20th century.
You know, obviously, we moved away
from slavishly recreating
reality a long time ago
in terms of our, this
Malevich had this quote
"trying desperately to free
art from the dead weight
of the real world."
So, is that an act of
reaction against this idea
of just recreating
things, and so, you know,
the question becomes, well, one question.
Obviously, one answer
to that is that's fine.
The machines are tools that create things
and we'll just use those tools,
but one question you could ask
is could the machines
themselves actually be creative,
and how would you do that?
Right?
Like, you can't have
the objective functions
we've had so far because those
are just trichomonal realities,
try to predict things correctly.
We'd have to have systems that
could actually surprise us
for us to consider them creative.
So, how could we train them to do that?
For one kind of theory of thought here,
and this is not specifically
about creativity.
It also relates to the idea
of exploration and curiosity
is to try to make systems that are
what we call intrinsically motivated.
So, they're driven to learn
for the sake of learning
rather than being extrinsically motivated
by targets or rewards
that we've provided them.
So, as long, as long as we've specified
what the targets are,
as long as we've said
you'll get rewarded for
this, and not for that,
it seems pretty hard for the
system to really be creative,
for it really to be open-ended
and to really surprise us.
So, it's more like an idea
of can we embody the notion
of art for art's sake.
Okay, I apologize for the equation here,
but it is an extremely
fundamental equation.
You know, this is from
Shannon's paper of 1948.
It's kind of, like, the thing
that underpins the whole
of the information age in some sense,
and the idea is that
the length of a message
is related to the log
probability of that message.
So, in very simple terms, this means
that when all of these
algorithms are set at the start,
they're fitting probability distributions.
One way you can think about that is that
they're learning how to compress data.
They're building a model that
somehow shrinks the world
to something more compact,
and there's lots of kinds of
formalizations of this idea.
People talk about the
Kolmogorov complexity,
the minimum message length,
minimum description length.
It has the same flavor that, basically,
the more you can learn about the system,
the more compact your
description of it becomes,
but, you know, what happens then
if you want to generate?
You want to use this in
some sort of creative way.
So, what happens if you're
able to create new data
or even just choose where to look,
choose where to point your camera.
Then, suddenly this idea of compressing
doesn't make much sense.
Right?
If your goal is just to compress things
and to learn them well,
then you should create nothing at all
or you should just go,
you know, sit in a corner
and stare at the wall so you
never get any more information,
but the information you
have is very compressible.
And so my PHD supervisor,
Yergan Schmidhuber
had this, has this this
theory how to, kind of,
resolve this paradox which
is that you're not learning
how to, just how to compress everything.
Rather, you're looking for things
that maximize what he
calls compression progress.
So, seek out data that
maximizes the decreasing bits
of everything you have ever observed,
and then, perhaps, everything you
and the culture that you're
in has ever observed.
So, create the thing that makes
the most sense of the world,
and if sense is being
measured as your ability
to now further compress it.
Right?
In the sense that you
now have a better model,
and therefore more compact model.
And one of his, sort of,
bumper sticker, you know,
slogans for this is
happiness is the first derivative of life.
So, there's this idea
that what makes you happy
is this change.
It's not the absolute point
of how well you've modeled things,
but rather the rate at
which it's changing,
the rate at which it's decreasing,
and I think, I mean, this is,
as far as, like, creativity goes,
I think it's fair to say
that this is, more or
less, unexplored territory.
I mean, these, Yergan has, these ideas
have been around for quite awhile,
but obviously it's much
harder to, you know,
put this sort of thing into practice
than it is to explicitly train
to generate, you know, a
convincing audio signal
or something like that.
And I guess the other interesting question
we would ask here, you
know, and this sort of
goes back to some of the
things Georgia was saying
is, you know, if we did
allow them to, you know,
freely create stuff.
Well, for one thing, you know,
what would we then be doing?
What would our role in the
creative process actually be?
Right?
There has to be some way
in which we're guiding it.
Otherwise, we're just spectators to it.
And the other thing that
I think is interesting
to think about is would we even recognize
their creativity as our own?
So, even if they were able
to do this, the machines,
and able to, kind of, create things
that help them to make
more sense of their world
as they've experienced it, would it be
just fundamentally alien to us or not?
And I think that is a
very interesting question
as to whether, if you
left them to, sort of,
create things in this very open-loop way,
you would start to actually, you know,
there would be some common ground
between that and what
human artists create.
And I think I'm gonna wrap up there,
because I'm running short of time.
So, thank you very much.
(audience applauds)
