Good afternoon, everyone.
I'm Karan Ahuja,
a third year PhD
at Carnegie Mellon University.
Today I'm going to be presenting
my work on LightAnchors.
This work was done with
the amazing collaboration
of Sujeath Pareddy, Robert
Xiao, Chris Harrison, and Mayank
Goel.
So we have already heard a
bit of motivation from Jackie.
So I'm not going to spend
too much time on it,
but I'll just
briefly go over it.
Augmented reality
requires overlay
of digital information
and interactive content
onto scenes and objects.
This creates the need for
fast and robust in view
anchoring and data transmission.
While we expect this transfer of
data to be seamless, in reality
it often requires marker tags
that are large and pronounced.
For example, large
QR codes on trains
that transmit their schedule,
or ArUco codes in art galleries
that help us to do image
overlays on top of them.
Apart from being
visually obtrusive,
these tags can also transmit
only a fixed data payload.
Alternatively, another approach
is to pre-scan the environment
with markerless techniques.
For example, this is the Apple
ARKit object registration
pipeline.
So first you have
to scan the object,
define a bounding box around
it, and after you scan it
you can adjust the origin
of it, and then export it
so that people can use it.
But as you can see, this
process is very tedious
and time consuming.
And every time you
move into a new room,
you have to do this
again and again.
So what we ideally
want, is a system
that requires no
pre-registration
and no special
instrumentation of objects.
It should also be unobtrusive
or at least attractive.
We also want the ability to
do many simultaneous object
recognition together,
while having no handshake
and this being in a completely
cloudless environment.
We also want the ability to
support dynamic data payloads.
All at the same time,
having a downloadable app
for a smartphone
so that everybody
can use it in a
software only approach.
So let's go over
how we can do this.
So we noticed that
many devices around us
have some sort of status
LEDs on top of them.
For example, the smoke
alarm mounted to the wall
uses lights to tell the
user about the smoke
status in the room.
Or this power strip relays state
information through the LED
below its switch.
Keep in mind that these devices
have no internet, Bluetooth,
or any connectivity
with the outside world.
It would be great if we
could give these dumb devices
a voice in the AR world in a
completely cheap and cloudless
manner.
Another interesting
development over the last years
is in the camera
space technology.
In recent years cameras have
come up with leaps and bounds.
A particularly
interesting development
is the prevalence of
high speed cameras,
which have become commodotized.
These can not only capture
at 240 frames per second,
but also process data
at real time as well.
Thus, utilizing the
pervasiveness of point light
sources and high
speed cameras, we
developed LightAnchors, a
new method for displaying
spatially anchored data.
Unlike most prior
tracking methods
with instrument
objects with markers,
we take advantage
of these lights,
such as LEDs and light bulbs for
both in view anchoring and data
transmission, as can be seen in
this payment terminal example.
So we are not the first ones
to think along this direction.
LightAnchors overlaps with
several different literature.
We will now briefly
look at different parts
of this research areas.
So in terms of marker
based strategies,
there have been
approaches that use
fiducial markers and QR codes,
such as AR tags and ArUco tags.
There have also been approaches
that use custom patterns
and overlay those custom
patterns with information,
based on their known
placement in the environment.
Markerless strategies
rely on computer vision,
such as registration in
papers snapped to it,
in which they
pre-registered the object
and then detect the object.
And based on that, load
a specific web page.
Or of euphoria,
where they pull up
an information on a pre-scanned
markerless environment.
In the visible light
communication domain,
there have been approaches that
use the mobile's photodiodes
to transmit information and
communicate between devices.
And also, approaches that
use the rolling shutter
of the camera to sort
of transmit data.
But all these approaches cannot
localize the source and rely
on a single light in
a particular room,
and cannot scale well to
simultaneously transmitting
light sources.
So now let's go where the
implementation step off--
and the algorithm
of LightAnchors.
So for every incoming
frame of video,
we create an image pyramid
such that lights big or small,
close or far, are
guaranteed to be one
pixel that size, at one level.
Our algorithm then
searches for LightAnchors
using a max pooling
template that
finds bright pixels
surrounded by darker pixels.
This process produces
many candidate anchors
such as the buttons on
a keypad, which are also
light in color surrounded
by darker material, as you
can see over here.
To filter these, we track these
candidate anchors over time,
looking for a blend
binary pattern.
All the candidates with
the correct preamble
are accepted, after which
we can decode their payload.
As high speed cameras are now
becoming common in smartphones,
our approach can run at
very high frame rates.
On the iPhone 10, our process
runs 120 frames per second,
allowing us to decode a 32-bit
LightAnchor in roughly quarter
of a second.
Now I'm quickly going to
switch over and try to get--
and show you a live demo
of the system working.
Just give me a second.
Let's see.
OK.
Just give me a second.
Yeah.
There we go.
OK, great.
Sorry for the delay guys.
OK.
So now this is the live demo.
So now you can see
the camera feed.
Let me take you through
this step by step.
So here you can see the
candidate LightAnchors
that are being transmitted.
Now these are all the candidate
detections of the LightAnchor.
Now I'm going to
show you the tracking
phase of the algorithm.
As you can see, the
LightAnchor that
is on the conference
phone over here,
the green LED, if I move
it, it is tracked over time.
Now once we detect it
and track it over time,
we can also decode its payload.
So this shows the
decoded payload--
wait.
Yeah, this will show the decoded
payload of this LightAnchor.
And then we can also
use this information
and also show an example
application on top.
Like we can see that this
is a conference phone.
OK.
Yep, we can see this
conference phone.
The light is not
long enough, sorry.
So for the further demos, please
come to our stand and I'll
show you the demos in person.
This cable is not long
enough and if I extend it,
I guess it will snap.
OK.
Now let's go over the evaluation
protocol of our system.
So what we do is, we use an
iPhone 7 sampling at 720p
and 120 frames per
second for our system.
We evaluate our system in three
environments-- a workshop,
classroom, and office--
across three
lighting conditions--
lights off, artificial light
only, and artificial light
with diffuse sunlight.
We captured this data across
two motion conditions,
still and walking.
All this across six distances
ranging from 2 meters,
to 12 meters.
We do this for two LED sizes
to see the effect of size
on our data capture rate, which
are 3 millimeter LED, and 100
by 100 millimeter LED matrix.
We find that for the
detection, we have a system--
we designed a system such
that it has high recall.
The rationale was that
bright light sources
can be filtered with the
preamble matching step.
We see that our true positive
rate for our system is 99.6%.
And we found that most false
positives were actually
caused by the reflections
of the LightAnchor
that is already being
blinked in the system.
And that's why it showed up with
the correct payload as well.
So what we deal to encounter
this, is we have a--
we take the payload with a
highest signal to noise ratio.
Across all conditions
and distances,
we found a mean bit
error rate of 5.2%, which
roughly means 1 in every 20
transmitted bit was incorrect.
This level of
corruption can typically
be handled by error
correction techniques.
We can see from this graph that
LightAnchors perform better
when there is low
ambient illumination,
since a LightAnchor is brighter
compared to the background.
Also, when we see the
performance across light sizes,
we see that bigger
LightAnchors can be detected
from a further distance.
And this is due to
the lower SNR ratio.
We also found that the
performance is better
for when the person is walk--
the performance is better
for when the person is still,
rather than walking, due to--
because there is motion blur
when the person is walking.
So across all conditions we
found a mean recognition time
of 464 milliseconds.
Either LightAnchors
are 22 bits long.
They take a minimum
of 103 milliseconds
to quantimate under 120 fps.
Because there is
no synchronization,
detection of a light
anchor is almost certainly
going to start in the
middle of a transition,
meaning that the system
will have to wait
and we transmit again
after a change in phase,
for a correct decode
the next time.
So let me conclude with
some example use cases.
First and more straightforward
is to transmit fixed data.
For example, the name of a
building and its opening hours.
More interesting is
the ability to decode
a small dynamic payload.
For example, this
glue gun provides
its current temperature.
Sorry-- and this power strip
provides its live temperature
drop, while the smoke alarm
shows its battery life.
Apart from just
instrumenting the devices
and instrumenting them
with microprocessors,
many devices existing
today already
have microprocessors
that can control
status LEDs inside them.
We can now exploit these and use
them to transmit dynamic data.
You can imagine a
scenario in which you
walk into your Starbucks
cafe and it has a router
mounted on the side of it.
You just point your
phone towards the router,
and click on the connect button
after you see the LightAnchor,
and you can connect to it
through the guest Wi-Fi.
So this way you are the only
one who can connect to it
and sort of nobody else
can steal the internet
in Starbucks, because you're
physically located in that
space and have line of sight.
Similarly, a security camera
can share its privacy policy.
So whenever you
walk into a building
and you have the security
camera you can point to it
and see whether it's recording
audio, video, or whatever
the case it is.
So LightAnchor
payloads can also be
used to provide
connection information.
For example, let's take the
case of this thermostat.
And once you are using
a LightAnchor payload
you can simply point
your phone towards it,
and this can open a gateway
towards a Wi-Fi connection
or something else.
Now if can send over a
little app through which you
can actually control
your thermostat.
For example, over
here you can see
me increasing the temperature.
A similar example holds
for smart light switches,
where it can provide you with an
IP address allowing you to open
a socket to your light bulb.
And there, there is no
need for any tedious
IoT pairing anymore.
So we are currently
open source--
our iOS app is opensource.
Please email me and I will
send you a TestFlight link.
There is also a starter
code for Arduino,
and there is more information
on the LightAnchor's website.
And also please come and see
our demo today at the demo
reception, where we'll be
showing these and more demos
running real time.
So thank you and now
I'll take questions.
Thank you.
Hi Karan.
Thanks for the talk, and bonus
points for the live demo.
Always appreciated.
Can you talk a little bit
about discoverability, right.
You said you wanted
the things to be
attractive in your
opening statement,
but then we're pretty well
conditioned to scan QR codes
and whatnot, and we're also
pretty well conditioned
to ignore the status LEDs
in most of our devices.
That's a very good question.
So yes, we are.
So what I'm imagining is a case
where most of these devices
are already
augmented with these.
So it will be as simple as
when you take open your device.
In this markerless
AR world, you do not
know where the markers
are anyway, right.
It's the same as that
markerless strategy,
but now instead of this
markerless, new environment,
it's the LED which provides
you that experience.
Thanks for your talk.
Can you talk a little
bit about authentication.
So communication flowing
the other direction.
Let's go with a
thermostat example.
Maybe in my home it's
realistic that anyone
who has physical access
to the thermostat
can change the temperature, but
in an institutional setting,
that's not true, right.
Only the building
manager should be
able to change the temperature
and not anyone who walks
into a classroom, or this room.
That's a very good question.
So actually what I'm more
envisioning is, for example,
in a home use case scenario, you
can have no security settings.
So the idea is once
you connect to it
or once you have this connection
payload, it opens a web page.
And in that web page you can
imagine it asks you for maybe
your Wi-Fi--
for your credentials
to log on to it.
Thank you.
How long is your
payload and how do you
imagine, for example,
a privacy policy
to be transmitted from a camera.
What was the second part, sorry?
The first part is, how long is
your payload for your package
and how long does a
package take to transmit?
And then how would you
imagine the privacy policy
to be transmitted
from the camera?
So our data-- let me answer
that question in bit.
So we have a 6 bit
preamble and then
an 8 bit data payload in
our current implementation.
And it takes roughly-- and we
have a latency of around 1/3
of a second currently.
And in that privacy
policy, I'm assuming
you're talking about
the example application.
So the idea is simple
that you point your phone
towards the camera
and based on that,
the camera will transmit
to you the privacy policy.
It can be for
example, that hey, I'm
a security webcam over here.
I'm blurring faces.
I'm not storing
any of your data.
[INAUDIBLE]
Yes.
That is very true.
So the idea is that
with an 8 bit payload,
you can open another
connection, through which you
can send more data.
Or we are also working
on another implementation
in which we can transmit
at higher frame rates.
So you can imagine once you
have at least 3 or 4 bytes,
or you can transmit over a
longer period, that way you can
transfer more data towards it.
