Vsauce!
Kevin here.
Please name one single object in this image.
Just one.
This kind of looks like a typewriter?
This is a blue balloon?
Some kind of chunky cream cheese?
And Uhh..
I don’t even wanna guess what that is…
Yeah, this is impossible.
You CAN’T accurately identify even one element
of this photo.
A similar image went viral last year but don’t
feel bad that your brain can’t make any
sense of this image.
Neither can artificial intelligence, at least
not yet.
Google Cloud Vision API says this photo of
Jake and I contains hats, glasses, people,
food…
The best it can do on this photo is… packaged
goods.
The image does, however, I promise, contain
real objects that are distorted by AI and
I’ll reveal what they are a little later.
Our brains love identifying, classifying,
and making sense of meaningful inputs.
And without the ability to focus on meaningful
information, you’re just… confused.
I’ve been working with LG, who sponsored
this video, on exactly what all this means
for technology in our everyday life.
We’ve never really been able to filter and
prioritize audio-visual information until
pretty recently, and until very recently,
it definitely hasn’t been easy or affordable.
They sent me the LG Velvet so I can go over
a real example of how tech has to be human-focused
and work with our brains -- and explore how
premium technology is packed within devices
accessible to more and more people.
That democratization of technology is amazing,
but when you hear about “pixel binning”
and “audio bokeh,” it’s hard to get
a sense of why it actually matters to you.
So here’s a question, how exactly does this
phone take your picture?
Light reflects off a subject and into a lens,
which focuses it.
Then it’s passed through a mosaic filter
and image sensor, then through a digital converter
where it’s processed into what goes into
the memory.
So, yeah, it’s a lot like what an eyeball
does as it sends an image to your brain.
The larger that sensor, the more information
it’s capturing… and that’s hard to achieve
when a phone’s lens is tiny.
A premium smartphone like the LG Velvet uses
Pixel Binning technology to overcome this
hurdle.
Pixel binning sees data from four pixels and
uses an algorithm to combine them into a single,
more precise pixel.
Sometimes color information is pretty clear
-- but in low light shots, you either need
an exceptional sensor or a way to interpolate
the visual information.
A quad Bayer filter that’s 50% green, 25%
red, and 25% blue creates an image in RGB
the same way your eye does, so paired with
the pixel binning algorithm, you’re able
to take limited visual information and get
a clearer, brighter photo.
The Velvet further enhances camera performance
with Multi-image Fusion, which uses LG's proprietary
algorithm to combine several images at once
as it identifies and eliminates the unwanted
noise in each one.
The images are then fused clearly, which also
highlights the most dynamic visual values
-- and it just plain interprets visual data
more clearly than we can.
The clearer the picture, the fewer cognitive
biases creep in for our brains to see what
isn’t there.
The Bokeh Effect works similarly.
It’s from the Japanese boke, meaning blur,
and it’s that tasteful blur that minimizes
what we don’t want to focus on and highlights
what we really need to look at.
Do something for me.
Hold your hand about 15 centimeters from your
face, then focus your eyes right on your palm.
Everything in the background is blurred now.
It’s about playing with the depth of field
and distance from the lens, whether it’s
your eyes or your camera.
Bokeh happens when a large aperture lens opens
up… and you can see the bokeh effects develop
as the aperture gets broader.
Smartphones just do not have large, sophisticated
lenses with easily-varying apertures, but
a dual lens camera allows for one lens to
focus on the image and another to capture
information like distance.
With information from both, the LG Velvet
can apply a Gaussian blur algorithm to the
portion of the image that should be out of
focus.
Bokeh works great, but it’s actually really
important.
Too much information, or irrelevant information,
is a cognitive distraction from what we need
to focus on.
When our eyes are distracted, they tend to
focus on what’s closest to us no matter
if it’s what we should be paying attention
to.
In 1960, J. Mandelbaum showed that the eyes
of pilots and drivers fixate on the dirty
parts of a windshield instead of the full
view of everything in front of them.
It’s not an eye malfunction, it’s just
your brain…
Okay.
It’s just your brain focusing on the wrong
thing.
Like, if my glasses are dirty I have to clean
them immediately or it drives me insane.
That’s me fighting the Mandelbaum Effect.
Bokeh prioritizes what we want to focus on
so our brain doesn’t have to fight itself
to sort out information.
Audio bokeh has the same effect, but the technology
is different.
Rather than softening less-important pictures,
audio bokeh eliminates unwanted sound so you
can hear what you need to hear.
Check it out, I’m outside right now where
cars and construction and nature are all noise
competing with my voice to get to your ears.
This is recorded in basic mode so you can
hear what basic mode sounds like.
This is voice bokeh, now you’re really just
hearing my voice, the meaningful thing that
you want to hear when you’re on the phone.
Without that, we’re subject to what cognitive
scientist Colin Cherry called, ‘The Cocktail
Party Problem,’ where the superior temporal
gyrus has to engage to prioritize auditory
information, which makes it possible to hear
a single conversation at a noisy cocktail
party.
The Velvet does this for you by breaking down
sound frequencies and using a series of algorithms
to identify the subject’s voice.
Then it’s separated from all the other sounds
being captured by using Fourier analysis.
The spectrum of tones that make up sound is
mapped and processed to make a sort of fingerprint
of static noise, then the lesser tones are
turned down.
Your ears can’t filter any of that, especially
not in real-time.
You’ve got to deal with all the noise at
once and use your mental energy to focus on
the parts you want to hear.
With audio bokeh, the tech handles that for
you and frees up your gray matter.
Now let’s free up the confusion over the
photo from the beginning of the video.
I created this image with a tool that uses
AI and machine learning to combine and edit
images.
Everything came from an original set of real
images, and it tracks the movement of these
images throughout the evolution of distortions,
and stores them as ‘Genes.’
The Genes for my image were ‘Shopping Basket,
Cauliflower, Necklace, School Bus, Stethoscope,
Toyshop, Syringe, Upright, Manhole Cover,
and Grand Piano’
The result of all that is a frustrating to
look at, mind-bending pile of noise.
It’s an exaggerated example of the noise
problems tech is tackling with amazing speed,
precision and efficiency, and it's increasingly
available to more of us every day.
So yeah, when a new phone comes out, it’s
easy to get wrapped up in how many megapixels
the camera is and how much storage it has.
But technology isn’t actually about any
of that.
It’s about whether the thing in our pocket
interfaces with our actual human existence
in a way that does something better than we
can or takes the pressure off our bodies.
Technology was a word that was barely used
in English until the Second Industrial Revolution.
It just wasn’t what we used to talk about
innovations.
And throughout the 20th century, technology
increasingly described machines that extended
our capabilities and allowed us to do more
work.
That’s still true, but it’s more than
that now.
In his 1904 book “The Theory of Business
Enterprise,” economist and sociologist Thorstein
Veblen theorized that social institutions
determined how technology was used.
In 2020, the institution is… you.
And as always, thanks for watching.
I don’t know why I did that.
