Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér.
These days, we see so many amazing uses for
learning-based algorithms, from enhancing
computer animations, teaching virtual animals
to walk, to teaching self-driving cars depth
perception, and more. It truly feels like
no field of science is left untouched by these
new techniques, including the medical sciences.
You see, in medical imaging, a common problem
is that we have so many diagnostic images
out there in the wild that it makes it more
and more infeasible for doctors to look at
all of them. What you see here is a work from
scientists at DeepMind Health that we covered
a few hundred episodes ago. The training part
takes about 14 thousand optical coherence
tomography scans, this is the OCT label you
see on the the left, these images are cross
sections of the human retina.
We first start out with this OCT scan, then,
a manual segmentation step follows, where
a doctor marks up this image to show where
the most relevant parts, like the retinal
fluids or the elevations of retinal pigments
are.
After the learning process, this method can
reproduce these segmentations really well
by itself, without the doctor’s supervision,
and you see here that the two images are almost
identical in these tests.
Now that we have the segmentation map, it
is time to perform classification. This means
that we look at this map and assign a probability
to each possible condition that may be present.
Finally, based on these, a final verdict is
made whether the patient needs to be urgently
seen, or just a routine check, or perhaps
no check is required.
This was an absolutely incredible piece of
work. However, it is of utmost importance
to evaluate these tools together with experienced
doctors, and hopefully, on international datasets.
Since then, in this new work, DeepMind has
knocked the evaluation out of the park for
a system they developed to detect breast cancer
as early as possible. Let’s briefly talk
about the technique, and then I’ll try to
explain why it is sinfully difficult to evaluate
it properly.
So, onto the new problem. These mammograms
contain four images that show the breasts
from two different angles, and the goal is
to predict whether the biopsy taken later
will be positive for cancer or not. This is
especially important because early detection
is key for treating these patients.
And the key question is, how does it compare
to the experts? Have a look here. This is
a case of cancer that was missed by all six
experts in the study, but was correctly identified
by the AI. And what about this one? This case
didn’t work so well — it was caught by
all six experts but was missed by the AI.
So, one reassuring sample, and one failed
sample. And with this, we have arrived to
the central thesis of the paper, which asks
the question, “what does it really take
to say that an AI system surpassed human experts”?
To even have a fighting chance in tackling
this, we have to measure false positives and
false negatives. A false positive means that
the AI mistakenly predicts that the sample
is positive, when in reality, it is negative.
A false negative means that the AI thinks
that the sample is negative, whereas it is
positive in reality.
The key is that in every decision domain,
the permissible rates for false negatives
and positives is different. Let me try to
explain this through this example. In cancer
detection, if we have a sick patient who gets
classified as healthy is a grave mistake that
can lead to serious consequences. But, if
we have a healthy patient who is misclassified
as sick, the positive cases get a second look
from a doctor, who can easily identify the
mistake. The consequences, in this case, are
much less problematic, and can be remedied
by spending a little time checking the samples
that the AI was less confident about.The bottom
line is that there are many different ways
to interpret the data, and, it is by no means
trivial to find out which one is the right
way to do so.
And now, hold on to your papers because here
comes the best part. If we compare the predictions
of the AI to the human experts, we see that
the false positive cases in the US have been
reduced by 5.7 percent, while the false negative
cases have been reduced by 9.7%. That is the
holy grail! That is the holy grail! We don’t
need to consider the cost of false positives
or negatives here, because it reduced false
positives and false negatives at the same
time. Spectacular! Another important detail
is that these numbers came out of an independent
evaluation. It means that the results did
not come from the scientists who wrote the
algorithm and have been thoroughly checked
by independent experts who have no vested
interest in this project. This is the reason
why you see so many authors on this paper.
Excellent.
Another interesting tidbit is that the AI
was trained on subjects from the UK, and the
question was how well does this knowledge
generalize for subjects from other places,
for instance, the United States. Is this UK
knowledge reusable in the US? I have been
quite surprised by the answer, because it
never saw a sample from anyone in the US,
and still did better than the experts on US
data. This is a very reassuring property,
and I hope to see some more studies that show
how general the knowledge is that these systems
are able to obtain through training.
And, perhaps the most important. If you remember
one thing from this video, let it be the following.
This work, much like most other AI-infused
medical solutions are not made to replace
human doctors. The goal is, instead, to empower
them, and take off as much weight from their
shoulders as possible. We have hard numbers
for this, as the results concluded that this
work reduces this workload of the doctors
by 88%, which is an incredible result. Among
other far-reaching consequences, I would like
to mention that this would substantially help
not only the work of doctors in wealthier,
more developed countries, but it may single-handedly
enable proper cancer detection in more developing
countries who can not afford to check these
scans.
And note that in this video, we truly have
just scratched the surface, whatever we talk
about here in a few minutes cannot be a description
as rigorous and accurate as the paper itself,
so make sure to check it out in the video
description. And with that, I hope you now
have a good feel of the pace of progress in
machine learning research. The retina fluid
project was the state of the art in 2018,
and now, less than two years later, we have
a proper, independently evaluated AI-based
detection for breast cancer. Bravo DeepMind.
What a time to be alive!
This episode has been supported by Linode.
Linode is the world’s largest independent
cloud computing provider. Unlike entry-level
hosting services, Linode gives you full backend
access to your server, which is your step
up to powerful, fast, fully configurable cloud
computing. Linode also has One-Click Apps
that streamline your ability to deploy websites,
personal VPNs, game servers, and more. If
you need something as small as a personal
online portfolio, Linode has your back, and
if you need to manage tons of client’s websites
and reliably serve them to millions of visitors,
Linode can do that too. What’s more, they
offer affordable GPU instances featuring the
Quadro RTX 6000 which is tailor-made for AI,
scientific computing and computer graphics
projects. If only I had access to a tool like
this while I was working on my last few papers!
To receive $20 in credit on your new Linode
account, visit linode.com/papers or click
the link in the description and give it a
try today! Our thanks to Linode for supporting
the series and helping us make better videos
for you.
Thanks for watching and for your generous
support, and I'll see you next time!
