Hi, it's Cary.
A few weeks ago, I decided I really wanted to get my computer to generate jazz on its own.
What happened?
*sigh*
Well, a lot happened.
But, here's a brief summary.
*Sexy Baroque music*
Bully #1: Hey Look! That loser's listening to Baroque music!
*Top 10 Saddest Anime Deaths*
COMPUTERY!
Cary: Could you maybe compose your own jazz music?
Computery: I'm generating it now.
*jazz music*
Cary: How was Computery able to generate those jazz
MASTERPIECES!
Peering deep into Computery's brain
exposes three types of neural networks sitting on its hard drive.
The Character Based LSTM,
Robot Flower: HELLO!
Cary: HyperGAN,
Weird Speaker Box: HULLOO!
Cary: and PixelCNN.
Roboty: BEEP BOP BOOP
Cary: Computery definitely used one of them to generate that lovely jazz music.
But I don't know which!
I've been meaning to clear up some file space.
How about each of you neural nets show me what music you got,
and I'll OBLITERATE the two of you that performed the worst!
Character-Based LSTM, you're up first.
Present Cary: Here's CBLSTM's pros and cons, if you want more info, watch part one.
Past Cary: So CBLSTM, whatcha got?
Present Cary: And now,
After three weeks of waiting,
it's finally time to see what CBLSTM's got!
CBLSTM, take it away!
*What CBLSTM's got*
Yeah you suck.
Time to die.
*Switch*
BANG BANG BANG
*CBLSTM's screams of pain*
BANG BANG BANG BANG BANG
*CBLSTM dying in the background*
Up next is HyperGAN by 255BITS.
*CBLSTM still dying*
Which I've drooled over before.
*Poor CBLSTM*
It's a type of Generative Adversarial Network.
*Poor, poor CBLSTM*
Which I'll describe in more detail in a future video.
For now,
just know that there are two neural nets,
A generator and a discriminator,
that are each co-evolving to outsmart the other.
If we want our generator to produce images,
we can use a deconvolutional neural network
I could be entirely wrong with how this works.
But,
put simply,
imagine starting with a low-res image of random pixels.
Then,
gradually apply specific Instagram filters
Photoshop effects,
and upscalings
until you..
uh...
magically get an image of some object.
It takes a very complicated, fine-tuned network of convolutions to get this to work.
*look at the text on-screen*
*look at the text on-screen*
*look at the text on-screen*
HyperGAN has already proven to be effective at producing good face images.
But,
how the heck are we going to get it to produce music?!
Well, if we could just convert music into images,
we'd be good to go!
Lucky for us, MIDI files are great for that!
They explicitly describe the pitch and time of every note played,
which we can then graph on a 2-D image like this!
Up and down mean higher and lower in pitch,
in semitone intervals.
and right and left mean forward and backward in time!
In twentieth of a second intervals.
A white pixel means a note is being played at that pitch, at that time.
While a black pixel means no note.
But wait!
What do these colors mean?
Well, HyperGAN works faster on smaller images.
so I kept my music images to a size of 96 by 64.
That's 3.2 seconds,
which is a little too short for my tastes. ( ͡° ͜ʖ ͡°)
Luckily, we can encode more information per pixel
by taking apart the red, green, and blue channels of each pixel,
treat each channel as a separate pixel placed horizontally.
Now, each of these mini-pixels can be turned on or off, independently of the others.
In this example, watch as we encode 27 bits of information about our music,
into just 9 pixels!
that simple tac-tic [tactic] expands our time window to 192 pixels,
or 9.6 seconds.
Admittedly, this method is not very elegant.
failing to use the intuitive, spacial nature of convolutional filters.
Oh well.
I mean, we get these DOPE-ASS LOOKING FRENCH FLAGS AS A RESULT SO WHO COULD COMPLAIN?!
Anyway,
are there any pros & cons?
OF COURSE!
um,
of course.
I originally read these pros and cons out loud,
but that took too long and the video got boring,
so if you want to read them you can.
The main downsides are that there's no sense of time,
and the training progress can go backwards.
But you know WHAT?
None of those caveats even MATTER
because HyperGAN's training process is by far the prettiest to watch.
It's so mesmerizing and makes me forget all those other problems.
[Great Days by Joakim Karud]
Well,
that was a lot of hard work, HyperGAN!
You must have some good music to play for us now!
*HyperGAN's "good" music*
*The best note you will ever hear in your entire life*
that
was
nauseating.
I hope I never cross paths with you for the rest of my life.
leave immediately.
*switch*
*death*
*death in the background*
Last up is PixelCNN,
Which was written by these people?
AND these people?
and Andrej Karpathy is in there again?
Anyway,
this is the neural net I know the least about.
But, what I believe it does, is it
traverses the pixels in reading order
one by one
In this way, it's behaving a lot like a recurrent neural net,
because you've got iterations performed in sequence.
NOT all at once.
ALSO NO LOOKING INTO THE FUTURE!
But at each iteration, meaning the microscopic level,
PixelCNN gets its input information by performing convolutions on the surrounding pixels it's seen before.
so there IS still some aspect of spatial reasoning going on here.
So in a way,
PixelCNN is like a blend between CBLSTM and HyperGAN
using the powers of sequences and 2-d space together!
That sounds really corny, like
Critic: wowow. there's three options, gosh darnit, sounds just like another boring old Goldilocks story.
huh?
too big, too small, JUST RIGHT
too hot, too cold, JUST RIGHT
too time-based, not time-based enough, JUST RIGHT!!
Cary: I'd like to point out that because PixelCNN generates pixels in reading order,
there is a specific directionality to its output,
top to bottom.
You can kind of see that in certain training sessions of PixelCNN
which expose the way it actually works.
And ESPECIALLY in how it can be used to complete images missing their bottom half.
Is that a good thing or a bad thing?
I honestly have no idea, but in my opinion, it's a good thing!
because music is also directional.
It doesn't go backward in pitch, forward in pitch, or backward in time.
It only goes forward in time!
Okay, so now we're watching PixelCNN train, and yeah,
I'll admit, it's not as pretty as HyperGAN training.
but, I mean, you saw what the networks looked like,
PixelCNN is rigid and square,
in contrast to HyperGAN's curves and fluids,
so, of course, their outputted images will reflect those traits.
and if you're wondering why the music looks rotated, it is!
I wanted to make the time dimension correspond to going down on the y-axis instead of the x-axis.
because then it more closely aligns with the order in which PixelCNN draws pixels top to bottom.
Now, I'm a little nervous, because
I already crushed CBLSTM and HyperGAN without even knowing if PixelCNN could do any better.
sooo...
now it's our only option left.
Well, life's just not worth living if you can't take a few risks, right?
Why don't you play us your best melodies, PixelCNN?
*PixelCNN's music*
Hey, that's not too bad!
I guess YOU get to stay on Computery's hard drive after all.
*on-screen*
*on-screen*
*A few more of Cary's favorites from PixelCNN*
*on-screen*
*on-screen*
*on-screen*
NO BRO
YOU EVER SEEN TESTING LOSS SO LOW BRO?
Also don't judge me by my debug logs...
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
College: YOU'RE NOT A REAL [??] STUDENT.
Crying Cary: I know, I knoooow. D:
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
here ya go!
3 hours of unedited Computery jazz.
At the link in the description.
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*awful music*
*on-screen*
*on-screen*
*PixelCNN's music with horror in it*
*on-screen*
*on-screen*
Thanks to all my patrons who are supporting me on Patreon!
Tantusar's gonna read your names!
T: Carykh is brought to you at the five dollar level by:
Nullpersona,
Samuel Ytterbrink,
Jeremy Neander,
Jakob Persson,
taterbits,
kitkatyj,
Benjamin Gordon,
Trevor,
Dunkel Blau,
Brady Wagner,
and Mike Koss.
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
*on-screen*
Finally, thanks to Hazel Cricket for animating this!
You know what guys?
I think this experiment has been a moderate success!
I wanted Computery to make me some Jazz piano so I wouldn't be bullied,
and Computery definitely pulled through!
Oh man!
I get to go to school in peace now!
I can hardly believe it!
I'm tearing up with joy just THINKING ABOUT IT!
Let's go experience that magnificent bliss!
*sexy jazz music*
Bully #1: Hey, look! that loser's listening to
TTS: JAZZ.
Bully #1: Let's beat him up!
Cary: No! It-It's not what you think! It's not what you think! No!
*Top 10 saddest anime deaths part 2*
bye
