[MUSIC PLAYING]
HANS BOEHM: Hello, welcome.
I'm Hans Boehm.
I'm a software engineer in
the Android Runtime Group,
and I'm here to talk to you
about how to manage native C++
from memory and Android.
Generally, most
Android applications
are written in Java today,
maybe Kotlin in the future.
But there are sometimes
reasons to also write pieces
of the application
in C or C++ code.
It may be the case that you can
implement some algorithm more
efficiently in a
language like C++.
Or you may have a native library
that already exists that you
wanted to use or the like.
Here, we're really talking about
multi-language applications
that combines Java and C++.
It turns out that even if you're
in the business of writing
100% Java apps, as
you may be, some
of what I'm going to
talk about may still
be an issue in certain
isolated cases.
For example, if you look
at the Android platform
implementation, the Big
Integer implementation
that you are using-- if you're
using Big Integer at all,
is actually implemented in
terms of native code underneath.
And in other cases,
that sort of thing
may actually show through.
So that was one of the reasons
I got involved in this.
I actually spent
some time working
on the Android
calculator app which
happens to be a major
kind of Big Integer.
Here we're going to
use a running example
of a hypothetical C++ package
that we want to access from
Java language code that
manipulates polynomials over
GF2.
Does anybody know
what that means?
Good.
You're not supposed to.
It doesn't matter.
As far as we are
concerned, these
are basically bit vectors
with an odd notion
of multiplication.
But I don't even
care about that.
I'm not going to show you
the C++ code actually.
This particular example is
good because it turns out
that in spite of the fact
that it seems esoteric,
modern hardware often has
hardware support for this.
So you can actually build a
blazingly fast really low level
implementation.
So what we want to do
is we have this library,
and we want to call that
library from Java code.
So what this looks
like roughly--
is the following is the
code that you see up here.
So the important part here
is we have a Java class,
which logically owns a C++
object that actually contains
the guts of the implementation,
that actually implements
the real functionality, which
I'm not going to show you.
What the Java object
holds is Java long,
which is really a C++
pointer in disguise.
What it's going to
do is for example,
when you want to
multiply two of these
is it's eventually going to call
this native multiply function.
It's going to pass
it to Java longs.
This is going to go
into some JNI code,
which will cast those Java
longs, the C++ pointers,
and manipulate the C++ objects.
And in this particular case,
we turned another Java long,
which is really a C++ pointer.
So the Java level multiplication
routine is really one that just
gives me back a new binary poly
object containing this Java
long, which really
points to C++ code,
which I obtained the underlying
C++ pointer by calling this
native multiplier routine.
So here is a
pictorial presentation
of what this looks like.
I have the Java
object at the top
of the slide, which
is the only thing
that my client actually sees.
So I want the client to be
able to treat this Java object
as though it were implemented
completely in Java,
and ignore the fact that
there's C++ code underneath.
The dot dash line
here is the Java long,
which is really a C++ pointer,
which sort of points to the C++
object representation, though
Java doesn't know that.
And inside the C++ object
representation there may be
additional pointers that
point to additional C++ data
structures.
So what's the problem
with doing this?
The problem comes
into play when we
try to think about how
memory gets managed
and how objects get allocated.
On the Java language side
we have a garbage collector
that cleans up after
us, and we generally
don't have to worry
about this too much.
When things are no longer
referenced, they go away.
When they become
unreachable they go away.
On the C++ side we have a manual
memory management discipline
usually, which means we
explicitly need to call some
delete function, some deleter in
order to deallocate the memory
when it's no longer reachable.
So how do we actually do that?
We have to arrange for somebody
to call the delete function
on the C++ object.
So here's the traditional
way to do this.
And the point of
this talk is largely
to talk you out of
doing it this way.
So the traditional
way of doing this,
which has the
attraction admittedly
that it's relatively simple to
write the code compared to what
I'm going to show you,
is that in addition
to things like the
native multiply method,
I'll have a native delete
method to which I also
pass a Java long.
And then the native delete
method will again convert that
to the C++ pointer and
invoke delete on it or invoke
the appropriate deleter on it.
I will call that native delete
lead from a Java finalizer.
The finalizer is
invoked more or less
when the object
becomes unreachable.
So that then goes ahead and
invokes the end of delete,
and allocates the
C++ object as well.
I'll show you some reasons
why this is problematic here.
So I'll go through a long
list of finalizer problems.
Finalizers have a deserved
reputation for being hazardous.
And I'll only confirm that here.
But I'll actually emphasize
some relatively lesser
known problems,
which in my opinion,
are actually the most
serious ones to try
to get to come to grips with.
And then I'll show you
how to work around those.
So the first problem is that if
two objects become unreachable,
the finalizers actually
run in arbitrary order.
That includes the case
in which two objects that
points to each other become
unreachable at the same time,
they can be finalized
in the wrong order.
Meaning that the second one
to be finalized actually
tries to access an object
that's already been finalized.
I'll go into some
more details there.
So as a result of that what can
happen is that you can then get
dangling pointers and CD
allocated C++ objects.
The thing you keep in mind here
is that as a result of dangling
pointers in this environment
is that you very often end up
corrupting the C++ heap.
But worse than that, the Java
language runtime actually
relies on that C++ heap as well.
So when this happens,
often what you end up with
is also a corrupted
Java run time.
And you may end up seeing caches
that actually look like Java
garbage caches, or the like.
So here's to illustrate
how this can go wrong,
here's a sample client
application that
actually uses the binary
poly class that we just
talked about.
And this, in combination
with the previous class,
will definitely break.
So definitely do not do this.
So what this is doing
in its finalized method,
it just doing
something innocuous
on my binary poly, which is
this field that happens to hold
a pointer to the binary poly.
Now the problem here is
that it's entirely possible
that my binary poly object
actually gets finalized first.
So what you are accessing
here is an mMyBinary poly,
which actually has
already been finalized.
So by the time you
access it at the pointer,
the C++ object underlying it
will have been deallocated.
And the native pointer
that you're using actually
points to nowhere.
So that's a bad thing.
There's a second
problem with finalizers,
which is actually
turns out to be more
complicated to work around.
And it's less
important on Android,
but it's generally
important when
writing Java language code.
So by Java language rules--
and this is not currently
true on Android,
not currently believed
to be true on Android--
object x's finalizer
may actually
be invoked while one of x's
methods is still running.
So while I'm still running
a method on that object,
that object may end
up getting finalized.
This can result in the
same sort of phenomena.
I get native heap
corruption, and as a result,
possibly Java
runtime corruption.
Let me explain why that happens.
So here's again, an excerpt
from the binary poly class.
I use multiply before.
I'll use it again here.
So I have this native
multiply method
that gets called by multiply.
If you look at what this
actually gets compiled to--
and this isn't real.
This is sort of pseudo code.
The pseudo code it gets compiled
to look something like this.
So what happens when I want
to multiply two values,
I first end up retrieving
the native handles
from both of them.
In this case, from this.
And I may put this in here
explicitly to make that clear.
So I retrieve a native
handle from this.
And I retrieve a native
handle from other.
Then I might allocate the
new binary poly object.
And then I will go ahead
and call the native method.
So the problem
with this is if we
look at the uses of
the actual Java object,
the last use of this
and other actually
happened before I allocate
the new binary poly.
And from that point
on, this method
doesn't use either
this or other anymore.
And it may so happen
that this is, in fact,
the last call on those.
So this is in fact, the point
at which those become garbage.
And the garbage collector
notices that they're
no longer reachable.
So what can happen at
that point if the ABI's
are designed to allow this?
And if all the stars
line up just right?
It can in fact happen that the
garbage collector at that point
decides that this and
other are no longer needed,
and again, for the finalizers
on both of those to get invoked.
Roughly where the new
binary poly happens or goes.
What then happens is that
when we call native multiply,
we don't need the
objects anymore,
but we still need
the native handles.
But now the native handles
have been deallocated.
So in fact, I'm
accessing objects
that are no longer around.
This is allowed by the Java
language specification,
and is something that has been
seen on occasion in the wild,
but it doesn't
happen very often.
There are more problems
with finalizers.
You can see a lot
of them by looking
at Joshua Bloch's
"Effective Java" book, which
actually has a section on
it, entitled I believe,
Avoid Finalizers.
One thing to
indicate the problems
is that the plan is I
believe, to deprecate
to finalize in JDK 9.
Another issue, which I'll
look out occasionally
here a little bit is that
for this to work correctly,
if you run an application
that allocates lots
of native memory and
relatively little Java memory,
it may actually not be the case
that the garbage collector runs
promptly enough to
actually invoke finalizers.
So the other mechanism I'll
suggest in a minute here.
So you actually
may have to invoke
these system dot GC
and system dot run
finalization occasionally,
which is tricky to do,
because if you do it too much,
it will greatly slow down
your application.
And many people have
fallen into that trap.
There's a more subtle
issue that they sometimes
finalizers actually extend the
lifetime of the Java object
for another garbage
collection cycle, which
means for generational
garbage collected,
they may actually
cause it to survive
and through the old
generation, the lifetime
may be greatly
extended as a result
of just having a finalizer.
And there are other issues
which I won't go into here.
So how do you really
delete C++ objects?
I should point out that
usual advice here, which
I'll skim over briefly
is to use explicit close
and tie with resources.
So Java has a syntactic facility
which allows essentially C++
style distraction.
When you leave a scope, you
can arrange to explicitly call
the close function.
And that works when
it's applicable.
So there are cases like file
like object, system allocated
objects, and so on where that's
generally the right approach.
And that's the main
recommendation.
On the other hand, there are
many cases in my experience
where that doesn't work.
And in general, people
already use this when
in those cases where
it's applicable.
So for example, if
I mentioned the java
lang Big Integer in the
example in the platform.
You really don't
want to have to call
Big Integer dot close every time
one of those goes out of scope.
That would be
completely untenable.
And there are many
more examples like that
in the Android platform.
I should warn you
that for the solutions
I'm talking about
here, a lot of this
is actually not fully settled.
And the community as a whole,
this is sort of beyond Android
as well, is still
trying to figure out
what the right way to do is.
These are mostly sort
of general Java language
issues that are actually
not specific to Android.
But there seems to be agreement
that you shouldn't use object
dot finalize, as
evidenced by the fact
that it's going to be deprecated
sometime in the future.
So the advice here--
and I'll go into how
to do this-- is we use
something called Java
lang phantom reference instead.
Many of you may have
encountered that.
As far as I can tell,
most people look at it
and ignore it because
it's somewhat complicated
and probably appears even more
complicated than it actually
is.
It's not very commonly
used, but it actually
is a better replacement
for finalizers.
It avoids the problem
that the finalizer
can see finalize objects
because of the auditing issues.
Phantom references ensure that
you run on the cleanup code
only when the object
really is about to go away
and nobody can use it and
nobody can see it anymore.
It certainly deals with the
finalized replication issues.
It's not going away.
It also ends up dealing with
some of the more subtle issues
there, though not all of them.
It does not deal with the
premature cleanup issue
that I mentioned earlier
that an object can
be finalized while one of
its methods is still running,
for example.
The major difficulty
with using it--
and I'll go through
an example here--
is that it's
relatively complicated
to use at the moment.
And we're in the process
of making that better.
I think various groups
are in the process
of making that better.
So Java 9 actually
has this notion
of a cleaner, which makes
this a little bit simpler.
Inside the platform
we actually have
something called native
allocation registry
that deals with some of this.
Is at the moment
not a public API,
but if you're
interested in this,
we're trying to
assess whether that
would be good to make
that a public API
and whether this is the
right API to do that with.
So what should you use instead?
Well, I mentioned these
phantom references.
So what are they?
A phantom reference is shown
here with a ghost next to it.
It's an object that's
associated with the object whose
lifetime you want to monitor.
So it doesn't actually point
to or refer to the binary poly
object that we want
monitored and then we
want to be cleaned up after.
So you can sort of think
of the phantom reference
as the last will and
testament of the Java object.
It tells you what to do,
how to clean up the object
once the Java object dies.
So the way this works
normally, in order
to actually use a
phantom reference,
you'll usually inherit from it.
And the inherited derive class,
the class derived from phantom
reference will have a
pointer to the C++ object,
along with the Java
binary poly object.
There's also a reference queue
off to the side here, which--
and I'll show you what roll
that plays in a second here.
Notice that I've put those
ground symbols in here
in a few places to indicate
that those objects that we
need to make sure don't
get garbage collected.
So we need to have some
mechanism for keeping those
around.
That applies to both the phantom
reference and the reference
queue.
So what happens?
After binary poly
becomes unreachable,
the phantom
reference itself gets
added to the associated
reference queue.
And that's basically
all that happens.
The Java object gets
immediately collected.
There's no longer
any need for it,
because the Phantom reference
itself knows how to clean up.
So I'll show you a sort of
quick implementation of this
here in this case.
This pub actually
is fairly easy.
So we've modified
binary poly to deal
with this sort of
reference clean up.
And what I've done
is on the next slide
I'll show you a BP phantom
reference implementation that
inherits from phantom reference.
It's a kind of phantom
reference that's
specialized with some
additional functionality here.
And what this does here is we
still have the native delete
method that actually
deallocates the C++ object.
Whenever I allocate
a binary object,
I now do it through this
factory method which [INAUDIBLE]
and allocates the object.
But then immediately
goes and registers it
through a static method
in BP phantom reference
the way I've
implemented it here.
Now once BP phantom reference?
So that fits on a slide in
the same way that in the sense
that the old Midas commercial,
if you remember that.
So I've stretched it to
fit it on a slide here.
So what that does is
it introduces a couple
of static data structures.
One of them is the
actual reference queue,
which will be used to enqueue
these BP phantom differences
once the corresponding Java
object is no longer needed.
And I also need to have
a concurrently set here.
Some way to just keep the
BP phantom references around
so they don't get garbage
collected themselves.
So I just keep those
around, basically until they
are explicitly removed.
So what I do then is
whenever I can select
one of these BP phantom
references, I first of all
can select the
Phantom reference,
giving it the Java object
and the reference queue.
This tells the underlying
phantom reference
implementation to
watch that Java object,
and put it on the
queue that I gave it
when the Java object goes away.
And then I also remember
the native handle,
so I can actually invoke
the deallocation function
when it's time.
So what happens when I
register one of these things?
So I registers the Java object
and the corresponding native
handle.
I create a new BP
phantom reference.
And I add it to my set
here, just to make sure
that it doesn't go away.
And this is sort of part of
the ugliness of this scheme,
unfortunately that
we're trying do address.
At some point, every once
in a while I actually
need to arrange that everything
that's on the reference
queue gets deleted.
And I've done that here by
providing a method duty leads,
which just checks
whether there's anything
available for deletion.
And if so, it goes
ahead and deletes that.
In other contexts, you may
want to do this differently.
Rather than just pulling
when something is available,
you may actually want a separate
thread for this and block.
It sort of depends
on the context.
It's a little bit tricky here
to do this well in the launch
system, because the
easy way to do this
is to create a new thread
for every class that
does this, which may or may
not be acceptable depending
on how many of these you have.
So then in order to
actually make this work,
my application needs to do the
following things periodically.
In simple cases, you can skip
the system dot GC and system
dot run finalization.
If he noticed that the garbage
collector isn't running enough
because you're not allocating
enough Java objects in order
to actually trigger it, you
may have to explicitly do that.
But usually, that's
not an issue.
But you do have
to regularly call
BP phantom reference dot
do Deletes to actually do
the cleanup the way
I've done things here.
So the next problem
here that I mentioned
is the prematuring queueing
while a method is still
running.
As I said, this is not
actually an issue for Android,
but it is an issue
for portable code.
So there's a partial solution
to this in actually in Java 9,
which is this thing called
a reachability fence, which
you can explicitly invoke to
tell the implementation don't
let the argument go away yet.
It should still be live
to the garbage collector
even though it might
not look like it.
That's not really
available basically,
in any implementation that
you can use at the moment.
So the best solution that my
colleagues and I could come up
with at the moment
is the following,
which is relatively
simple, but not
exactly performance neutral.
Is that instead of have a
simple methods like multiply,
that just invoke the native
method with the native handles.
What we do instead is we
invoke the native method
with both the native handles
and the Java objects.
So what I do instead is I have
a native multiply that takes
the two J objects as well.
And current implementations,
though the spec
doesn't 100% guarantee that,
current implementations are
essentially guaranteed
that this and other turns
into a local ref, which tells
the garbage collector to keep
these around as long as the
native method is running
so things will actually
work out correctly
at the expense of passing
additional parameters through J
and I. Again, this
is recommended
if you write portable code.
For Android, I currently
wouldn't recommend doing that.
Eventually sometime
in the future,
I think better we will probably
have a better solution to this.
So one more hazard that I
want to go over quickly,
in part because this is
near and dear to my heart.
Because it took several months
of several people's time
to debug platform code
that had this issue.
If you are using C++
code in this way,
you need to be really careful
that you are actually calling
it correctly and not violating
the C++ rules underneath.
And that can be quite
tricky to do correctly.
So in particular, if you're
calling C++ code from Java
threads, it still has to be
thread safe as though you were
calling it from
multiple C++ threads.
And you have to make sure
to follow those rules.
This is a bit aggravated by
the fact that some Android
framework classes actually
use C++ code internally.
So this gets back to the point
I was making at the beginning
that you have to make sure to
use those frameworks classes
correctly, because otherwise
you can run into this problem
even without actually
writing native code.
One issue that's particularly
subtle on the C++ side that
actually contributed to this
problem that we spend lots
of time on is that often on the
C++ side when you call delete,
you actually end up invoking
some reference counting
mechanism that then takes care
of the underlying C++ objects
that are indirectly referenced.
You have to be really careful
in doing that correctly.
I highly recommend using an
expert developed implementation
of reference counting.
Not to [INAUDIBLE] your own.
I spent a lot of
time recently fixing
bugs in the Android
platform reference counting
implementations.
So it's not too likely.
I think that most people
will get this right
implementing it themselves.
There are a whole bunch
of different issues
here having to do with
memory altering bugs and self
assignments and so on.
It only is a tiny
amount of code,
but it's a really
tricky piece of code.
But assuming you have a
correct reference counting
implementation, it's still
hard to use it correctly.
So one thing you
have to remember,
the rules vary a
little bit, depending
on the implementation.
But for something
like std share ptr,
when you create a
shared ptr to something,
the object will be
deallocated when
the reference count associated
with copies of that shared ptr
goes to 0.
So creating multiple
shared ptr's
corresponding to the same
underlying [INAUDIBLE] pointer
it's not likely to work well.
It's also not
likely to work well
to generate a reference
counted pointer
to this in the constructor
for a similar sort of reason.
Because probably by the time
you leave the constructor,
that reference count
will have gone to 0,
and you will
deallocate the object
before you ever leave the
constructor, which is not good.
So you have to be careful
with that sort of thing.
The thread safety rules are
in some ways even more subtle,
and also something that you
really want to watch out for.
So assuming here I have a bunch
of shared ptr's, it's actually
OK, according to the normal
C++ thread safety rules,
if I take x, the shared ptr
x, and copy it simultaneously
in two different threads to
two different other pointers,
this will [INAUDIBLE]
simultaneously update
the reference count
associated with x.
But that's OK.
It's the implementation's
job to make sure
that that works correctly.
So that's fine.
What's not fine,
and what in fact
caused this sort of
long standing bug
that we have to deal with here,
is the last thing on the slide
here, simultaneous assignments
to the same reference
counted pointers.
If I simultaneously assign
even null to the same reference
counted pointer, that
looks pretty benign.
I'm assigning the same
thing concurrently
to two different threads,
what could go wrong.
The problem is it's disallowed
by C++ rules for good reason.
Because what happens
here is both threads
will try to simultaneously
[INAUDIBLE] the reference
count associated with
the original value of P.
If they erase each other just
right so they both do this
before either one actually
assigns null to the pointer,
they will end up
both [INAUDIBLE]
that reference count, which
originally had a value of 1
because it was referenced
by P. That doesn't go well.
In the [INAUDIBLE]
reference count decrement
is always death.
So summarizing here, important
point is avoid finalization.
Use Java lang phantom
reference instead.
Currently, that's a
little bit clumsy,
but it avoids you also all
sort of potential headaches.
If you want to read
up on this more,
I suggest you actually look up
the Java 9 reachability fence
construct and discussion.
That'll give you a
little more insight
as to why some of these
things are a problem,
and particular why you have to
worry about premature clean up
with the Java semantics.
Again, not so much on Android.
And stay tuned for future
improvements in this area here.
If you allocate a
lot of C++ objects,
if that's where most of your
heap space ends up getting
spent, then you should think
about explicit GC triggering.
But doing it very carefully.
Make sure you keep track
of how much C++ memory
you've allocated.
And when you've allocated a
lot, so you think it might be
useful, then you can invoke the
garbage collector so phantom
references for dead objects
getting queued at that point,
so you can actually do the
C++ clean up at that point.
Otherwise, if you are not
allocating any Java memory,
usually that Java
garbage collector
will not otherwise
get triggered.
Be careful with C++ memory
management and understand
the rules.
So we actually have a little
bit of time for questions
here, if there are any.
That's the end of it.
Thank you very
much for attending.
[APPLAUSE]
[MUSIC PLAYING]
