so yeah so we just had Steven from
Rigetti explained the software stack
behind you know the forest ecosystem and
today I'll present a bunch of algorithms
in quantum deep learning the the nascent
field of quantum deep learning that we
discovered using their software a co
system for experimentation so okay so
it's on archive if you want to check it
out it's pretty recent and it's with
Jason and Michael some of which some of
my co-authors are in the room alright so
hopefully you're slightly familiar with
classical deep learning you've probably
heard of AI and I mean Skynet and all
that stuff knows but you know there's
there's a lot of hype in the in the
general media about classical deep
learning right there's deep mine and all
sorts of companies doing some impressive
stuff and you know the successes of
classical deep learning came from you
know deep theory work in like the 80s so
in a sense you know for the theory of
quantum deep learning we're in the 80s
going 90s very soon because it's moving
very very very fast but this paper is in
a sense an attempt to unify all the
efforts that have been done so far under
like one formalism and and so on so so
Steven mentioned variational algorithms
and those are kind of circuits where
each you parameterize a bunch of you
Terry's each unitary is parametrized by
you know it could be an exponential the
poly like e to the I you know something
like e to the I theta X hat you know
like a an exponential of a poly and
that's a parametric unitary there's all
sorts of possible choices of parametric
unitary but that's what we mean by
quantum parametric circuit it's a
circuit that's composed of a bunch of
unit Ares that are parameterised by a
real number right and these have been
the algorithms of people have been
running in the near term era of quantum
computing but coincidentally they're
very similar to classical deep
so so in classical deep learning you
have an input you know X it's a vector
and then you have a bunch of little
transformation so like a neuron usually
takes in a bunch a bunch of
contributions from neurons and a
previous layer it applies an affine
transformation which is you know a
linear combination and a shift and then
it applies a squashing function which is
a nonlinear function but in general
you're always applying the same kind of
class of elementary function at each
node in this graph and you know the you
have a flow of computation forward and
that's represented by this structure of
the deep neural network so this is what
we call a feed-forward neural network
you give an input you apply all these
parametrize tiny transformations and you
know you apply many many layers of them
and many many of them and ultimately you
get an output that depends on your input
and also depends on your choice of
parameters so here Phi is a vector of
real numbers right so we have our
parameters Phi our input X our
feed-forward operation would be
something like f and then we have some
metric to benchmark how well our output
is doing and where what it should be
alright so in general for doing
classification or supervised learning
here we'd have a desired output for a
given input let's say it's Y then you
know my loss function or error function
is gonna quantify how well my current
output did for the given input right so
that would be for a single data point so
in general I'd have a loss function of
my output for the given data point in
the parameters and the desired output
and this is a real number that we'll
want to minimize our goal is to minimize
the error right it could be like a
distance of you know this is this could
be like cat or dog you know it could be
a binary label and you know I want this
picture to be more cat than dog and then
I give you a certain penalty depending
on that but the goal is that we're going
to try to figure out optimization
strategies to minimize this loss
function and here I only have a single
data point but in
it's going to be the average loss for a
bunch of pairs X you know X J and YJ
right and we want find the set of
parameters so deep learning itself is
just setting up this problem having your
data and then an optimization algorithm
to efficiently minimize the minimize
this loss functions subjective
variations of the parameters v right so
that sounds a lot very similar to these
variational algorithms or quantum
parametric circuits also called like
yeah there's various names so people
call them quantum neural networks that's
you know that's ruining the punch but
the point is that people have been using
these in the past few years now and
their connections to classical deep
learning has not been you know fully
fleshed out well so let's compare so we
have a bunch of classical parameters of
our circuit which is a bunch of
transformations like this little
rotations one in two qubit rotations
sometimes sometimes it's come more
complicated transformations we have
let's say an input C naught and
depending on our choice of parameters
will have a certain output that depends
on these parameters so now my
feed-forward operation is a parametric
unitary which is you know the
composition of all these little
transformations similar to above and our
goal will be to minimize I'm going to
come on I'm gonna have my loss so my
loss has to be a real value right so how
do I get a real value out of a quantum
computer well I could get I could look
at an expectation value of an observable
so if I define an operator for my
desired output that you know quantify
its eigen value quantifies how well
we're doing then my goal will be to
minimize the expectation value of this
operator subjects to variations of the
parameters so if you're doing the
variational quantum eigen solver you
could phrase it as a problem like this
and actually you could phrase a whole
lot of problems in this way and we do so
in the paper namely you could phrase
possibly classical deep learning as a
kind of quantum parametric circuit
optimization problem right how
we do that and that's that's what I'll
answer today and I'll talk about how to
optimize quantum parametric circuits so
what can be phrased as quantum
parametric circuit so in in this paper
we talk about how to in a sense quantize
feed-forward neural networks in a
previous paper from December with one of
my co-authors we we showed how to
convert Boltzmann machine training which
is a completely different than
feed-forward
type of neural network how to train it
using quantum parametric circuits right
so so in a sense if we put everything
under the same roof and then we have a
strategy to optimize parametric circuits
then you know we have one optimizer to
rule them all so so then how do we
efficiently optimize as quantum
parametric circuits right so the current
paradigm that's really successful it's
been really successful for the you know
near-term era where we have noisy
operations and we can't do many many
quantum operations before the party is
over so the current paradigm is to have
these parametric circuits and then to
have quantum classical hybrid
optimization so that would mean you have
a classical processing unit that feeds
your current guests to the parameters to
the quantum processing unit and then the
quantum processing unit feeds the it
could Bex execute multiple runs doing
you know just the feed-forward and an
expectation value of the loss many times
it gives you an expectation value of the
loss for the current set of the
parameters right for this state and and
so the this classical processing unit
just gets you know a loss or classical
you know it's it's a real number and it
gave the qpu of a big vector so the CPU
just sees the qpu as a black box that i
feed a vector it tells me how well I'm
doing and then I could I could work with
that so I could do finite difference
gradient I could do the finite
difference gradient descent I could do
nell there me there's all sorts of black
box optimizers for classical computing
but the point is that you know what's
okay so that that's that works really
well for noisy intermediate scale
devices
and for the near-term era but you know
we'd like to have one theory of
everything's quantum than you know just
quantize all the things right that's
what physicists do they just put hats on
things right so what in the near-term
how are we gonna leverage quantum
properties to accelerate this
optimization right how so we're using
classical computers for optimization of
quantum circuits if we're already using
quantum circuits why not also have the
optimization be quantum right so let's
try to phrase it such that phrase our
problems such that we have we can have
super positions of parameters right and
this is a vector of real numbers we're
gonna figure out how to encode that on a
quantum computer then I could have a
wave function over my possible
parameters and then in each branch of
the wave function when I apply my
circuit I get a you know an entangled
superposition of having a different
parameter and then applying a different
parametric circuit so how could I do
this and then how can i leverage that to
perform a fully quantum optimization of
kind of these quantum neural network
like parametric circuits okay so that's
that's what we're gonna treat today so
the trick is given a circuit that you
know the the classical the classically
parametrized version there's a way to
upgrade it in a sense so what you do is
you convert your parametric operations
to be quantum parametric operations so
in a sense it's like converting an X
gate to control the X gate that would be
of a very simple version but now that
the control is is a is a real number it
has to be a register that holds a real
number so how that's not something we
it's something we talked about in
classical computing you know floats and
whatnot you know 16-bit precision
registers for real numbers but we don't
talk about that in quantum computing as
much what we know about is phase
estimation and we have some intuition we
have you know Fourier transforms and all
these things so how can we have a real
number on a kind of a digital quantum
computer or made of qubits like
Righetti's right how are we going to do
this how are we to upgrade our gay
to be quantum parametric this is just a
notation that you know all of these
gates make a big complicated controlled
unitary where each each operation has a
quantum register that stores the real
number value so what am I talking about
when I'm talking about quantum you know
something that an observable that would
have a real number kind of spectrum nice
smooth spectrum well you know when we're
talking about continuous variable
quantum systems we're talking about you
know modes Q modes right like you know a
resonator or a harmonica a quantum
harmonic oscillator so if you're
physicists you learn everything about
this but the reality is that you can
simulate having a continuous variable
object on a digital quantum computer
using lock logarithmically many qubits
for the range that you want so a
discrete variable calling computer can
approximate a content or a continuous
variable object with exponential
precision via this conversion so what
you do is so if Phi is position and pi
is what we consider to be momentum right
and quantum mechanics the momentum is
the canonical conjugate to to position
it just means it's the fourier it's to
go from one to the other it's a Fourier
transform so here the Fourier transform
would be like a 90-degree rotation so
the point is that you can encode
position in kind of a binary form right
if I if I have a bunch of qubits and I
scale they're the gaps of each qubit by
a factor of two each time then if I look
at the joint spectrum it's it's gonna
have you know if I look at the readout
of like all my qubits there's if there's
zero one then I could use that and
consider that as kind of a real value
readout right I can encode a real value
into a bunch of bits classically so I do
that with the standard basis of my
qubits so if i encode a position right
and these kind of exponentially scaled
you know zero is it's the zero or one of
each qubit then how would I get the
momentum well it's the good old quantum
Fourier transform
so if we apply the corner before you
transform to this this kind of position
operator and here it's a very general
formula that you know any interval you
know and the real numbers that you want
you could approximate it like this then
you just convert from position to
momentum via the for the quantum Fourier
transform which is really efficient to
do on a quantum computer right in some
cases if you have an analog cue mode
like this a Fourier transform is very
natural it's just get a Fourier
transform from it's natural dynamics if
it's on a digital computer there's a
very efficient gate synthesis for this
so I'm gonna talk about position and
momentum and continuous variable kind of
degrees of freedom but keep in mind that
you know you can have arbitrary
precision you could tune it to you know
like a few qubits I think we use like
three cubits for our numerix per
parameter so you know you could tune it
to your liking later on maybe someday
we'll have 16 32-bit precision for this
but for now we keep you know we're
probably to use like one or two or three
qubits at a time to simulate the cue
mode so so to give you an intuition of
phase space which is this position
momentum kind of I'm plotting kind of a
pseudo probability distribution over
position in momentum it's like the Vigna
function if you're familiar so so just
quantum mechanics 101 again what are
what does it look like in phase space
when I apply various operators that are
Exponential's of these observables you
know Phi and Pi
right well if i exponentiate momentum
momentum right generates shifts in
position right so the position is
horizontal so I generate a shift so it's
like a flow it's like a wind that takes
this state and each piece is gonna you
know get shifted according to the you
know the vector field that's local there
okay so what would PI squared look like
well PI squared would be depending on my
value of momentum which is vertical I'll
shift you more you know forward or
backwards depending if you're negative
right so it's going to look like this is
going to be greater and greater this
could be parabolic in amplitude and
yeah so similarly just like we said if
you rotate 90 degrees you get to
position so similarly a shift in
position or an exponential position
generates a shift in momentum right and
that's actually the basis for phase
estimation we'll get to that that's a
very common algorithm in corn computing
if you're familiar so okay so what if I
do five squared well sooner it's just
the 90 degree rotated picture and for
those of you know this Hamiltonian which
is a harmonic oscillator Hamiltonian if
you combine the flows of both operators
it becomes kind of you know you go here
and then you go down and then you go
here you go up so I mean becomes kind of
a circular flow right and that's the
harmonic oscillator Hamiltonian which is
gonna you know this describes kind of a
oscillation and a parabolic potential
right okay so this is just like face
based intuition but that's all yeah
that's all we need to figure out all of
quantum deep learning actually so so
deep learning a very common optimizer
that people use in classical deep
learning is gradient descent right
you're trying to take a gradient of the
loss function and go down down the hill
right you know the gradient is the
notion of what is the direction that is
the steepest descent so what if we
pulsed an exponential of position what
it would look a sorry exponential of a
function of the position operator it
could be a polynomial let's say right so
what would that look like so let's say
this is my function of Phi and then if
if I take the derivative or the gradient
in high dimensions it looks like this so
now if I look at my flow my flow here
the direction and magnitude is going to
be controlled by this this this value of
the gradient right so I'm generating
shifts in the momentum according to the
gradient so if I have an exponential
like this and I look at the Heisenberg
picture of how my operators get shifted
around you know it's another way to look
at how things get transformed instead of
looking at how states get transformed
you look at how operators get
transformed the point is that momentum
got shifted by a value proportional to
the negative gradient so I've kicked
I've kicked my variable right a kick
would mean I changed
Bentham and then if I had like you know
kinetic energy then would start flying
off right if it's a soccer ball or
something right from classical mechanics
so that's something we're gonna use
actually simulated kind of kinetics
right to optimize over the space of
quantum neural Nets or quantum
parametric circuits so this is a little
aside just because a lot of people learn
about phase estimation in their you know
quantum computing 102 or something and
they don't have an intuition for how it
works it's just like a magic there's a
whole bunch of phases everywhere and
then it suddenly works right so the
intuition is that if you look at it from
a continuous variable standpoint I could
have kind of a squeezed state which is
well literally squeezed like this it's a
Gaussian that's race that's very thin so
let's say so phased estimation is just
kind of like addition depending on the a
certain variable here I'm going to apply
a linear shift that depends on the value
of this variable right so you could
replace Phi 1 here by an observable a
general observable and then this is the
phase estimation algorithm right so so
what happens here is that I'll shift
let's say I'm trying to add this guy's
position which is the observable to this
guy this is my pointer state
well I apply you know the Fourier
transform which is a we're inverse
Fourier transform 90 degree rotation I
do a phase shift depending on this guy's
position and then I undo the the
rotation and I get this right so now
what is it one plus two equals three all
right great school yeah we did it all
right
addition using a whole lot of Hilbert
space but okay so so that's continuous
variables but really we just described a
way to simulate continuous variables
with a bunch of discrete variables so
what if I had a pointer state if I had
you know a bunch of qubits that form
kind of a 64 dimensional cued it here
what if I phase this to me at a position
then I would generate a shift this way
by this position this much and then this
is actual you know full numerical
calculation with phase estimation looks
funky but you know basically you will
always measure the right position here
so it's just readout
right so in a sense phase estimation is
just applying a linear shift depending
on my observe eigenvalue right so then
getting a quantum gradient is just kind
of phase that it's like phase estimation
everything in quantum computing more or
less is based on phase kickback or at
least a whole ton of algorithms so just
like phase estimation gradient
estimation is based on phase kickback
but instead of having to have a very
sharp pointer state here and having to
measure and getting statistics to
estimate the gradient and then passing
it back to my classical computer a thing
we'll do is just let it slide just give
it a kick and let it rip as we see in
Canadian right okay so so how are we
going to do it how are we going to do
that how are you gonna induce a phase
shift right of all our quantum
parameters according to our loss
function because that's what we want to
minimize right so we could what we could
use this phase kick back like this right
because we'll have these kind of quantum
controlled parameters so so what you do
is you have this quantum parametrize
unitary you have a superposition of
parameters right this is like a
superposition of the space of quantum
neural Nets and then in each case you
apply a different parametric unitary
there's a way to synthesize this this
big gate and this would be just our
computer or compute register remember
where you apply the parametric unitary
this is just like our register for our
parameters so if you apply the unitary
of feed forward you apply a phase kick
according to the loss function so what
I'm doing is depending on the loss
function I'm kicking your momentum
accordingly right so if you're if you
have a large eigen value of the loss
function then you've made a big mistake
so I'm going to punish you for I'm gonna
I'm gonna kick very hard so so for the
you know just like in you know you've
probably seen if I apply if I have a C
naught and I apply a poly Zed and then I
apply another C naught
so if you're doing quantum error
correction the the Pauli's that error
creeps up the control right so this is
kind of the same principle it's just
face kick back so in this case face kick
back would be like a multivariate
continuous variable phase kickback right
but this is you know this circuit is
just a fancy version of this it's the
same principle it's kind of yes yeah
okay so so the point is if I apply a
kick at my output and and then I
uncomputable what what I've applied is a
shift of the momentum that's nonlinear
right and of all my parameters right and
you could do the algebra on this and up
to second order in Ada which is you know
this constant or how much you're kicking
you get a phase that is you know the
expectation value of how your output did
right it was the expectation value of
the loss operator that we wanted to
minimize okay so we just did what we
wanted we we have we have a phase kick
according to our loss function and over
here we know that if we kick according
to a function of a continuous parameter
I'll have shifted the momentum according
to the gradient so now we'll have two
ways to use this right but before I talk
about this I was talking about one data
point so what you can have is you can
have pairs of loss functions and inputs
a very general formalism you could do
all sorts of stuff we'll get to that in
a second but what you could do is you
could sequentially stack these kicks
right you could like it's like if time
was frozen I was accumulating punches
and then I let I let time evolve and
suddenly my ball is flying in the right
direction right so in the paper we
talked about how to do this in parallel
in quantum parallel and classical / out
parallel and in sequence but the point
is usually you talk about batching your
data in classical machine learning and
it's really important to be able to
paralyze over batches so we treated that
fully in the paper and I'll talk about
it a bit later and some detail but the
point is that if you batch a bunch of
phase kicks which were these script l
loss functions let's say I batch over my
whole data set
for convenience so we always have the
same kind of kick every time then I'll
have let's say I define my cost function
to be the average loss over my whole
batch then I could stock these kicks and
get a face kick according to my full
batch cost function okay so so what how
are we going to use this right like well
so we just induce a face kick according
to a certain function of kind of the
position right and multi-dimensional you
know continuous variable space well if
you remember Schrodinger mechanics from
you know basic quantum mechanics the
Schrodinger equation or the Hamiltonian
that that describes the evolution in the
Schrodinger equation is usually of the
form momentum squared and a potential
which depends on the position right so
then if I simulated dynamics with a kind
of a time-dependent potential that I can
tune then like I said if I shit if I
induce a force on my parameters now we
could use the force by adding a kinetic
term so by simulating kind of a very
coarse grain time evolution alternating
between my phase kicks and then my
kinetic term Exponential's then suddenly
I'm stepping down the hill right and by
tuning the mass well actually do I talk
about this here yeah so so by tuning the
mass in the time in this kind of
time-dependent simulation there's ways
to hit kind of the adiabatic limit if
you're close to you know a local minimum
that I could Taylor expand as a
second-order thing then I you know I
could tune my mass over time to kind of
stick the landing right because if I
just give it a kick and let go then it's
get just gonna swing through the minimum
but what I can do is the intuition is I
would start with a small mass I would
kick it with a certain force so it
starts to go flying right but you know
in machine learning you want to tune the
learning rates as you go so what you do
is you start
small mass and then you simulate
increasing the mass right by tuning what
I'm kicking for this kinetic term so if
I start with a small mass and then
suddenly I have a big mass by
conservation of momentum I'm going to
slow down it's just gonna be the same
momentum but now it's for a much larger
mass so it's gonna be a much smaller
velocity and that works that analogy
connected to the adiabatic theorem and
show show this converges so we call this
sorry we call this method quantum
dynamical descent and for those who are
familiar with it there's a type of
quantum algorithm for classical
optimization because this is classical
optimization of parameters well it's a
quantum way to do classical optimization
there's a certain class of algorithms
called quantum a quantum alternating
operator on satsu so so they're always
alternating operators that you know you
have an exponential of an operator that
everything commutes and then you have
something that doesn't commute you have
a you know a potential that you want to
kind of optimize in a driver that mixes
things right so this quantum dynamical
descent is a form of queue AOA to
optimize neural Nets right so so let's
look at how how this works let's look at
a circuit so just for notation where do
I didn't even begin what is it in it a
good initial state for my parameters
well I could create a Gaussian State
let's say because it has some nice
mathematical properties for analysis I
could create a gaseous state easily in
analog Q modes and there's algorithms
for that in the discrete Q modes but the
point is I have an initial position and
an initial momentum and then I also have
an initial uncertainty in position
versus momentum and I could have
trade-offs between both that's an
interesting thing because we're using
quantum certainty to fuzz where we are
and where we're going right so so these
all these parameters we call them the
initialization hyper parameters which
are theta ok big complicated diagram
here alright let's let's break it down
so so this is the quantum dynamical
descent algorithm you have your
preparation unit area at the beginning
you prepare this Gaussian state that
depends on your hyper parameters hyper
parameters are kind of like parameters
that you assumed were true like your
architecture of your neural network
you know any yeah basically anything you
assume that's you know a certain sir
parametrize a certain way that you
assume and then you do your optimization
with that assumption like okay maybe
this is the best architecture for my
neural net and then I optimize it well
maybe that was a bad assumption so what
happens is that you often have to do
hyper parameter optimization when you're
doing machine learning and later on as
you'll see well the reason I keep all
these hyper parameters around is because
we're gonna quantumly optimize those
hyper parameters in a second it's
quantum all the way down but okay so
this is the quantum dynamical descent
algorithm you prepare the state you
apply this kind of face kick you know
compute phase kick uncomputable and then
you apply v squared because when you
conjugate with a fourier transform it
becomes momentum squared and then you
alternate doing this right and you keep
going all right so that's that's the
quantum dynamical descent this is the
parameter register compute and this is
classical data right like the loss
function the desired output would depend
on classical or quantum data so you
could have a data register it's it's
kind of a general formalism we have
hyper parameters which we call the rate
hyper parameters which are just how much
you're kicking in you know momentum or
or kinetic term so you know in the near
term we know we can't do super long
algorithms right so at some point we
want a version that we could just
measure and checkpoint classically like
keep in classical memory where we are
and where we're going right so what you
could do is because as I said with the
phase kicks you generate a shift in
momentum according to the gradient well
by repeating the circuit a couple times
you could get an estimate on the
gradient in the landscape in a quantum
fashion and this shows a speed-up as
compared to a finite difference kind of
black box kind of a finite difference
gradient because you know that's that's
slow you just got a try you know if you
have n parameters usually you have to
try and expectation values for n
different parameters to get an estimate
of the gradient whereas this is order 1
so single-shot roughly and then you
could you know you can have an update
rule where you update your parameters
according to your shifted momentum and
then you can simulate conservation of
momentum and it becomes gradient descent
with one
which is a technique using classical
machine learning so it's nice that we're
taking our class going to ition and
we're quantizing it everything right so
okay so that's momentum measurement
gradient descent previously that was
quantum dynamical descent so here's a
phase based picture just a cartoon of
comparing both techniques so let's say I
have a phase kick according to a cubic
kind of function right so then my
gradient is going to be a quadratic
function that's why you get like the
Star Trek symbol so if I shift the moon
time according to a quadratic function
then it's going to look like this right
once I've kicked it and then what I do
is I measure the shift in momentum and
then shift position you can see it's
shifted slightly according to this
gradient right and I keep doing this all
right I ate a rate like this quantum
dynamical descent on the other hand is
going to keep everything coherent right
so it's going to be a wave function
where each piece of the wave function
just goes down it's local slope so each
possibility feels a different value of
the force right so then that allows for
tunneling in the space of optimization
and we're hoping that you know it allows
for speed ups in certain cases where the
landscape is very difficult so there's a
whole theory of adiabatic algorithms and
optimization algorithms from you know
annealing but now you can apply it to
feed-forward neural nets and it's the
same kind of theory so we're you know we
haven't proven any speedups per se of
using this but the fact that
everything's coherent its actual real
Schrodinger dynamics and high
dimensional spaces so you'll get
tunneling then we hope that this allows
for more powerful optimization in
general people are hinting towards the
fact that QA seems to have a square-root
speed-up over classical optimizers but I
don't know if that's been formally
formally proven yet but that's what the
insiders in the industry think right now
ok so I talked about this ok so we know
how we know how to port well we haven't
seen that yet but theoretically if we
could port classical neural nets to
quantum parametric circuits and then we
could optimize them there then
everything's great but I didn't say what
you could do with quantum parametric
circuits it was very abstract right
that was just for the optimization
section so what can you do with quantum
parametric circuits you know how do you
machine learn quantum data with this
well done few things you could do you
know there's a couple apps you know no
big deal but these are the ones that we
covered in the paper in full detail
actually I can there's more but I can
fit in the slide but but yeah I'll go
through some of them that are more
near-term but the point is that you know
as long as you have a way to plug in
your data and a way to induce a shift in
the momentum write an exponential of the
observables eigenvalue that you want to
optimize then you're good and there's
there's crazy ways to use like copies of
states to induce like a loss function
that's induced by inner product but
that's kind of later you know that's
probably for long term quantum computing
but the point is that you can learn
States you can learn mixed States you
know the circuit decomposition of next
day it's like in the dilated picture you
can learn unit area channels so on and
so forth one that is of particular
interest is you know learning a quantum
to classical map so how to classify
quantum states or how to have a
continuous label for classification
which is basically regression or how to
learn a measurement in general it's all
the same class it's all the same problem
there's also quantum auto-encoders
whether you want to compress data or you
want a denoise data there's ways to use
autoencoders there's something in while
we'll get to the the generative
adversarial circuits but that's a
quantum analog of generative adversarial
networks from classical machine learning
parametric hamiltonian optimization is
includes all these chemistry algorithms
and these optimization algorithms and
cue AOA itself
so using that we're going to be able to
do metal learning but we'll get to that
and well there's a way to plug a quantum
neural network to its output to a
classical neural network and have this
kind of back propagation of errors all
the way through right I didn't mention
why it's what we call it back
propagation but the point is that as you
uncomputable
right it's it's made up of a bunch of
tiny transformation
so as you uncomputable time so in a
sense it's a phase error because it's
phase shifts and as you can compute you
get you get kind of kicked one at a time
as you own compute so you could kind of
trace back the the signal for the
quantum loss function that's propagating
through the circuit through kind of
looking at the algebra of how things get
nudged and and we do that in crazy
detail on the paper but that's important
if we're gonna design better quantum
parametric circuits they don't have
problems with their their gradient
signals which is a big problem right now
okay so okay so let's look at some of
these examples so quantum classification
well a general measurement I'll use an N
Scylla I'll have a state I'll apply it
unitary and then I'll measure right and
you know if I have classification
according to you know one I have
multiple qubits right and each qubit
depending if it's if it's one then it's
that state right it's a unit let's say I
have unary encoding then I could
classify very simply right if I'm doing
a unary encoding that means I have
multiple qubits and there's only one key
bit at a time that is on or hot and each
possible on state is a different class
right so then what I can do is have lost
function that's very simple and there's
ways to do this in the near term but in
general you can have a you know mean
square loss you can have all sorts of
functions but it's it's it looks simple
in our abstracted out formalism but the
point is you plug in a certain label
value and then you have a loss function
that depends on this label right and we
show we show how to use this numerically
in the paper a bit later so quantum
Hamiltonian optimization is another
application that's relevant for the near
term that's when you want to prepare a
state that is of low eigen value of a
certain Hamiltonian right
I want to approximately find the ground
state or or yeah in general it could be
a non commuting Hamiltonian alright so
you can have multiple terms that don't
commute and in which case there's ways
to paralyze
paralyzed your your gradient
accumulation right so this would be you
could for each term in the Hamiltonian
could compute the gradient of the you
know you can split this up like if it
was a batch your Hamiltonian each term
becomes like a batch term and then you
could add up the contributions to the
gradient of each term and then do
gradient descent on the total and then
you could find a ground state of a non
commuting Hamiltonian which would be
difficult there's ways to quantum leap
ramit paralyze this using kind of a ghz
entanglement so in a sense you're yeah
you create G edges at entanglement using
kind of a multi target adder here you
apply the phase kicks and you know if I
have a superposition that's entangled in
in in g j-- said you know let's say I
had you know 0 0 0 plus 1 1 1 1 so on
and then I apply a relative phase shift
like this but in each copy then if I
have n copies and I'll have a factor of
n here and then I as I uncomputable
unregister tensor with a bunch of zeros
right so if I had yeah if I had 0 0
yeah and it was like I create I create
the gh that state apply phase kick then
I earn compute it this is a trick in
quantum sensing but we use this to
paralyze data for quantum deep learning
right it's kind of it's kind of nuts
I invite you to check out the paper ok
so all I'm almost there
geneviève adversarial networks this is a
way to have two networks that kind of
battle - like when one network is trying
to fool the discriminator right and as
its is it's Toronto so ok so the
discriminator gives you a label of what
I think this input was true or false was
it from the real data set or the fake
data set right and the generator is
trying to generate inputs for the
discriminator Network that fool it right
and
using this actually I won't have time to
go into the details but using this and a
way to flip the gradient signal
basically you could train a
discriminator and generator all at once
and what this allows you to do is create
States find ways to generate a circuit
that you know let's say classically I'd
have pictures of bedrooms how do i
define the problem of having a good
picture of a bedroom how do i quantify
my loss function for all this is bedroom
II oh no that's not very bedroom E and
it's got to be continuous right well I
use a different neural network for that
so in the same vein I use a parametric
circuit to give you to convert from you
know a binary loss function to kind of a
any loss function on the input and the
the loss function is gonna get converted
from binary to like whatever this format
of input is and it's get a route to a
generator and then the generator is
going to do gradient ascent on this loss
function instead of decent and in a
sense you could generate states that
mimic a sample from a certain data
distribution okay we're almost we're
almost done so this is how to merge
quantum and classical deep learning
right seamlessly merge our back prop
with the original Hinton style back
propagation well what you do is you have
a quantum classifier that has kind of a
vector of observables you feed the
expectation value to as an input to the
neural network you feed forward you back
propagate the gradient in classically
and then you feed that gradient as a
linear phase shift right and that linear
phase shifts gonna induce a shift in
momentum depending on this value of the
back propagated gradient and we showed
that this works really well how do we
get this well we converted feed-forward
neural networks to quantum circuits I
invite you to check out the paper to
know more about how we did that there's
ways to do riilu and whatnot and all
sorts of cool cool activation functions
there's a whole bunch of other stuff in
the paper including like meta learning
which is where the optimizer optimizes
itself and parallelization and swarms
and all sorts of crazy stuff but yeah
let's just blast through some numerics
just to show you it works it's not just
my crazy thoughts so this is you know
this neural network learning an XOR
function using a circuit we converted
quantumly right tunneling in the space
of neural networks right so quantum
dynamical descent it learns it minimizes
the loss function to learn the XOR
function which is you know zero when you
have zero zero or zero when you have one
one and it works also with momentum
measurement gradient descent and these
are the decision boundaries
you could learn unit Ares we'll go into
that you can optimize q aoa which is
this optimization algorithm this is
applied to a max cut problem so the
point is that you know since our
optimization algorithm itself is a QA oh
a problem here we show you could
optimize QA way so it's kind of like you
can optimize itself forever but yeah
this is so this is learning let's say
prepare an eigenstate of momentum right
as an input now I have a hybrid neural
network trying to learn the circuit has
to learn the Fourier transform and the
neural net that's connected to its
output has to combine the readouts of
the different bits to give the correct
item value so it's got to learn both
have to learn jointly how to optimize
the loss function which is the mean
square loss so I'm trying to decode the
eigen value of a state right I just give
you the eigen value what the observables
is I gives you give you a parametric
circuit connected dural net and then
both have to talk to each other they got
to optimize jointly so with hybrid back
prop between classical neural nets in
this quantum circuit we could optimize
both together and as you can see you
learn really well the eigenvalue so the
point is we have a formalism now to for
all of classical deep learning as
quantum deep learning and we have
quantum ways to optimize quantum deep
learning and we have classical ways to
optimize quantum deep learning so the
theory of quantum deep learning is
expanding it's great and now that we
have the back prop now we can move on to
specific on sots right what are good
circuits what are these good parametric
transformations such that I get I get
cool applications
work right classically you have
convolutional nets we have all sorts of
networks we need the equivalent for
quantum deep learning
another thing I'm exploring personally
is how can quantum deep learning
influence classical deep learning we
have an interesting mechanism here of
how how through dynamics weights can
optimize themselves so that's something
I'm exploring can you have a model where
just from the physics the system learns
you know I mean our brains not quantum
I'm not gonna say that but if you have a
quantum model then maybe you can convert
it later to kind of a classical
stochastic model so that's something I'm
looking on but you know it's a very
nascent field six months ago it didn't
exist more or less and now I have two
papers in it and things are moving very
fast so if you're looking to do any work
in quantum deep learning and you're
around IDC always looking for
collaborators and come talk to me cool
thanks yeah
Sabir
yeah so if you're if you're doing
quantum parametric circuits if you're
doing like an elder meet or finite
difference gradient descent if you have
n parameters you need to get n
expectation values for each iteration if
you're doing the gradient descent like
the momentum measurement gradient
descent then it's order 1 expectation
values for update right so it's much
faster like as you have a large network
with a lot of parameters in terms of
quantum dynamical descent because we
connected it to the whole theory of
adiabatic optimization and whatnot
we didn't formally prove a speed-up
because it's kind of a moving target
right we don't even have one application
we have about 50 in the paper right so
there's various cost functions there's
various scenarios but yeah we're looking
into you know fixing a certain scenario
where we could potentially show speed-up
for the optimization but well ok so it's
actually it's not adiabatic it's Q AOA
which is kind of a near term version of
quantum adiabatic optimization but yeah
okay so that's the so Eddy far he
invented this it's called the quantum
approximate optimization algorithm and
it's when you have alternating operators
it's basically like a Trotter i's
simulation of adiabatic evolution but
with very coarse grain kind of operators
like big blocks well people are trying
to prove things about QA it's an active
field of research yeah I mean but the
point is that if your optimization is
coherent then you could do meta
optimization you can and hyper parameter
so optimizing hyper parameters and
classical neural nets it's really hard
people just do a grid search right they
just try a bunch or they do grad student
descent they have their grad student try
a bunch of them so then that's really
inefficient right but here you can
optimize you could get gradients on the
hyper parameters really efficiently
right so you could have super positions
of different architectures of neural
networks and then you could in each
branch of the superposition have an
optimization and then see what the loss
function was what the performance was
when it was trained
yeah there's no we haven't formally
proved to speed up yet but you know it's
we're proposing another heuristic and I
think for near-term optimization if you
can implement this it would be useful
people would see a big difference on the
wall clock time because getting many
many expectation values when you have a
certain block of time on a qpu you know
that's difficult so if we could cut that
by you know let's say you have 20
parameters by a factor of 20 you have a
factor of 20 speed-up that's valuable
