My name's Yuan Li.
I'm a third year PhD
student from Virginia Tech.
And the title has already
been said, so let's get to it.
First and foremost,
it is believed
that awareness
information is pretty
important in any
collaborative work.
In here, we refer to
the joint attention
of object of mutual interest
is a common requirement
for many collaborative tasks.
One example could
be surveyors try
to triangulate a distant
target simultaneously.
And just like many
human collaboration,
things could get much easier
when the collaborators are
close to each other and to the
target, where one pointer could
just point to the
target and then confirm
via vocal communication.
However, at a greater distance
between the participant
and to the target,
a.k.a. a wide area,
things could get
much more complex,
because you have limited
channel of communication.
And in real life,
a laser pointer
is a commonly used tool.
The effectiveness
of laser pointer
will be effective by
the lighting conditions,
the distance to the target,
and visual information
from the pointer.
Even when the light
is too bright,
and you cannot see the laser,
the observer can still get
the reference to object without
even looking at the pointer,
and that's in small space.
However, when you move
it to the wider area,
things can become much harder,
because of the intersection
point might just simply
be too tiny to spot.
And the pointer is too
far to give enough cues
of the pointing direction.
Now, we think that by
using augmented reality,
we can help with this, as
the populated virtual ray
techniques in many
VR applications.
So to determine the gaze
redirection in ideal AR
would just be for the AR system
to display the virtual ray
and stop right at
the target and even
highlight on the
target intersection.
However, this would
require the system
to know where the
target is and require
a model of the environment.
In the world we're living
in, when it's too far away,
we just don't have a-- a
reliable geometric model.
So in the end, the
virtual AR system
can only display a
virtual ray like this.
And what the observer will
see in his point of view
will be a ray crossing all
the objects in the scene.
So he would just be confused.
Now here is an example of
the problem in actual AR.
Now, please pay
attention to the rays.
The observer is now gradually
walking closer to the pointer.
And the ray is fixed in space,
but you can see perceptually
how it keeps realigned
into the actual target.
Now, see the problem?
So the term
model-free is not only
described when the model is
missing but also specifies them
when the model is
not accurate enough.
And in the case of exchanging
object-to-reference
information, having
an inaccurate model
is as bad as just
not having a model,
because the observer will get
the reference wrong eventually.
And this could happen in
some high-stakes scenarios
of extreme
uncertainty, like when
you try to rescue stranded
people in the wilderness, where
to acquire an accurate
environment, a model,
is pretty challenged.
Now, a model-free AR, here
we gave two definitions
on the ambiguities
that actually affect
the effectiveness of the race.
Visual ambiguity,
which we describe
as the ray that visually
appears to be occluding
all the objects in the
scene, and spatial ambiguity
is that if the user
can somehow perceive
the direction of the rays.
But because the target is
surrounded by other objects
in the scene, he still
cannot make a correct spatial
judgment.
And we use the term Bayes
Ray in this project because
in virtual environment, so
the ray can be tracked--
can be generated by a
lot of tracking devices.
Hand tracking, head
tracking, eye tracking.
And head tracking
is more available
in the state-of-the-art devices.
And the user can
just focus the target
at the center of
his field of view.
So to some extent, you can
use the hand orientation
to get approximate
gaze direction.
And so in this paper, we
use the term gaze array
to describe a head ray
that is approximated.
So the general
research question here
is how can we design gaze ray
visualizations for wide area,
model free, augmented reality?
And let's take a look at our
proposed gaze ray techniques.
Now, since visual
ambiguity is introduced
when a single ray is
appearing to cross everything
in the scene, then the
Doppler Ray technique is based
on the assumption that if
we ask the pointer to point
to more features in the
scene, like in this case, two
features, normally the top
and bottom of the target,
then it actually, in
the observer's view,
it will form a bracket, in fact.
Statistically, this
will be a much more
strict visual condition to
match than the single ray,
and therefore it can
reduce the visual ambiguity
to some extent.
Now, of course, if
we use more rays,
it will be even better
in terms of accuracy.
But at the same time,
you would ask the pointer
to do more work, and you
would need a more complicated
pre-agreement between the
pointer and of the observer,
and also you would require
more geometric features
on the target.
So in this work, we focus on the
two-ray because of its greater
applicability.
Now, another
technique is designed
to overcome the
spatial ambiguity,
as we gave it definition before.
The assumption is that
if the user can somehow
perceive the
direction of the ray,
he can make the
spatial adjustment.
But due to the late
nature of model-free AR,
the system does not
have information
from the environment to create
depth cues like occlusion.
So we have to create
our artificial depth
cues our orientation cues.
Now, while this AR system
has little knowledge
of the environment, it has the
knowledge of the 3D information
of the rays.
So we explored multiple cues.
You can read that in the paper.
But eventually we settled down
with the paddle bar technique.
It works as this.
If, based on the
orientation of the gaze ray,
we can create a virtual
bar at the chest
height of the observer that
is parallel to the single ray,
and then based on the
center of that bar,
and we can calculate
a direction that
is perpendicular
to the gaze ray,
and then replicate the
first bar and place
those along the direction,
along that direction.
So that the user
can both directly
perceive the
orientation of the ray,
by looking at the
parallel bar and also can
get a rough idea of the
distance by counting
how many bars are
between, if we use fixed
distance in between, right?
So here is an actual example
of our AR implementation.
So this is a single ray case.
When the observers
switch to double ray,
it creates the
bracket effect that
will make the target more
prompt than the other decoys.
And this is the parallel bar
where the user can align, get
a rough idea of the direction.
So another thing is
specifying the gaze rays
is relatively easy in our
current implementation.
Figure one is the
target, and figure two
is the pointer's view,
with the crosshair.
So the center crosshair is used
to generate the single ray.
And here are another two
cursors, a top and bottom,
which the user can adjust
the distance in between
to create the double rays.
And the observer's view is
in figure three and four.
But we tested both double
ray and parallel bars
in both indoor and outdoor
AR and saw its potential,
but we also found that the
current limitation of the AR
displays and tracking
could significantly
affect the user experience
and the task performance.
So the study, to
study our techniques
in a more valid and
controlled fashion,
we designed an experiment
using simulated AR.
Here is an overview
of the experiment.
The overall goal is to
evaluate the effectiveness
of the gaze ray visualizations
with varying levels
of visual ambiguity
and spatial ambiguity.
And to do that, we conducted
a user study in mixed reality
simulation for
systematic control
and avoiding tracking errors
and mixed the-- a reality
simulation is basically using VR
to simulate AR, in which either
the experimental
control was critical
or technological
limitation was the hurdle.
And in this experiment,
human participants always
assumed the role of
a passive observer,
and the active pointer is
simulated by a computer.
And the experiment is
implemented in unity.
And the main thing is an eight
by eight giant chessboard that
was designed to enhance
the perception of distance,
both to the perception of the
collaborator and to the target.
The objects, or the targeting
objects, are red spheres.
At any given time,
there a total of six red
spheres in the scene.
Only one of them is
the actual target.
And on the left
side of the screen
is the simulated
collaborator, which
is a human-sized yellow capsule
with an orange bordered box
on his face to give the
participant a better
idea of its facing direction.
And the collaborator
can cast a perfect ray,
pointing to the center
of the targeted sphere
in the single ray case, or
to the top and bottom poles
in the double ray case.
And it is the human
participant's job
to identify the correct target
via one of the given gaze ray
techniques.
Now, we use an HTC
Vive Pro headset
for both viewing and tracking
the user's head and also
Xbox controller for a simple
systematic input, like come
from the selection.
Now, the study,
followed by a two
by two by three by two
within subject design.
The two levels of
ray just basically
means we use single
ray or double ray.
And two levels of
parallel bar means
we use parallel bars or not.
And the three levels
of visual ambiguity.
For the low level
visual ambiguity,
we lifted the
collaborator slightly up,
so that four using a
single ray, the single ray
will only cross the center of
one and only one red sphere.
So it should be the
simplest condition.
And for the medium and
high, the single ray
will cross the center
of all the spheres,
whereas for the medium
visual ambiguity,
the double ray will
only bracket one target.
And for the high, the double
ray will bracket, actually,
two red spheres to make
more visual ambiguity
in the double ray case.
And as for spatial ambiguity, in
the low spatial ambiguity case,
we placed no decoys
within 15 degrees
of the line between the actual
target and the collaborator.
And in the high
spatial ambiguity case,
we placed one decoy within 15
degrees but outside 10 degrees.
And we implemented four
repetitions to the three
by two ambiguity
conditions, making
24 trials per gaze ray technique
and 94 trials per participant.
The dependent
variables were many,
including the error rate, the
successful task completion
time, and we gathered simple
subject feedback from the user
through afterwards interview.
And we recruited 24
participants for this study.
And here is a video of
the environment scene.
So we first asked the user to
stand at the specific point.
And then after they
press the white button,
to start the trial.
That's when we
started the timer.
And then they used
the crosshair to look
for the referenced object.
So this is actually a high
visual ambiguity case,
because there are two
spheres that is bracketed.
Now, let's see the
experiment results.
And so here is an overview.
We gathered 576 data
points in total.
The yellow colors are for
the double ray techniques.
And the less saturated colors
are for no parallel bar
condition.
So you can see the
performance game
for using double ray
is quite visible.
But you cannot tell the same for
whether or not we use parallel
bar.
Here is another plot for
successful task completion
time.
And again, the performance
game for use in double ray
is pretty dominant.
We used-- based on
the test results,
we tested different hypotheses.
The first one, we think
that double ray will
be effective at
eliminating or decreasing
the effect of visual
ambiguity, which
is confirmed from the
study, where we actually
find that double ray is
better than single ray
in all visual
ambiguity conditions.
Originally, we
thought that it's only
effective in the
medium and high,
because in low, single
ray should work.
But we did find a
significant improvement
for using double ray technique,
even at low ambiguity cases.
Meaning that the
bracket's visual effect
is more prompt than the crossing
the center visual effect.
The second hypothesis
we tested is
that we thought a single
ray would only be effective
at low visual
ambiguity, basically
to test if our definition of
the ambiguities actually holds.
So the study did find
that on one hand,
single ray is only effective
at low visual ambiguity.
On the other hand,
the double ray
is equally accurate
in both low and high,
meaning that our definition
for the ambiguities
actually meets our requirement.
So, the third
hypothesis we tested
is that we think the parallel
bar can help increase
the accuracy of the
overall task performance,
but meanwhile, the user will
take a longer time trying
to interpret the parallel bars
to get the spatial reference
orientation information.
Well, what we did actually find
is that parallel bars did not
really help with
improving accuracy,
but it did help
slow the user down.
So think optimistically.
The user definitely
took their time
to use that parallel
bar techniques,
but its effectiveness is not
confirmed in the current study.
The fourth hypothesis
we tested is
that based on the
user feedback, we
would expect the double
rays to be favorable,
which is shown here.
People do prefer double
ray over the parallel bar.
But still, 13 out
of 24 participants
think parallel bar
would help in the medium
or high visual ambiguity cases.
So the last hypothesis
we tested is
that user with high
spatial orientation ability
will better use the
parallel bar technique.
Spatial orientation ability
describes the ability
to imagine the
appearance of objects
from different locations.
So we suspected it could
affect how the user would
perceive the
parallel bar and how
they use the bars in their--
during the experiment.
We did find-- what
we did find, we
use a perspective-taking
task to maintain--
to give a score of their
spatial orientation ability.
And we did find that people
with better spatial orientation
ability performed
generally better.
However, we only observed
that the interaction
between perspective-taking
score and parallel bar
is approaching the
borderline of significance.
Now, to summarize,
our work proved
that assuming reliable
tracking and perfect pointing,
double ray is highly
effective and efficient.
And it is difficult to interpret
the spatial information
from parallel bar in its current
implementation and setup.
And we think that the
findings should further
be confirmed a more
ecologically valid evaluation.
Thank you.
[INAUDIBLE]
At least for me, it looked
like there was fixed distance
in those bars [INAUDIBLE].
Did you consider to investigate
the [INAUDIBLE] distances
of the performance
technique in some,
let's say, pre-studies to
investigate the parameter?
I couldn't [INAUDIBLE].
So thank you.
That's a good question.
In this MR simulation study,
we did test varying distance.
And what I couldn't
mention in much detail
is the scale of the experiment.
So it's an almost 160
meter large chessboard.
So we use 20 meters
as the fixed distance,
so that it will inform the
user about the orientation.
And the distance between the
bars makes them quite sparse,
so that they don't create
a visual cluttering.
And in the AR evaluation,
which I'm running right now,
we did enable the varying
distance technique, which
the user can adjust the
length of the parallel bar,
adjust the height
of the parallel bar,
and the distance in between.
[INAUDIBLE]
Yes, yes.
[INAUDIBLE]
So if I understand
this correctly,
the person pointing needs
to spend time specifying
the size of the object?
How much time does that take
for you to evaluate that?
We didn't evaluate
to the pointer side
at all in this study.
So in the MR simulation, we
use a computer to simulate
the role of that pointer.
And I agree that if we included
a human subject to do that job,
we can measure that.
But the downside
is that we should
expect a further performance
drop, because error
could happen both on the
pointing side and the viewing
side.
And the purpose of this
study, as I mentioned,
is to evaluate the effectiveness
of the visualizations.
So we should assume that
introducing a human subject
would, like you said,
help us to measure
more ecologically valid
performance of the task, which
I agree.
Which is what I am
running right now.
So we are setting
up the experiment
outside, with 100 meters away,
two hollow lenses, and two
human users.
One would specify the ray, and
one would try to comprehend.
Excellent point.
[INAUDIBLE]
I understand that you're
trying to do model-free,
but do you think that
using a known ground plane
could help with this problem?
If you assume the
ground plane is known?
I beg your pardon?
Could you assume that the ground
plane is known in many cases,
and could that be used as a way
for visualization additionally?
Yes.
I actually thought about that.
But the downside for
that is in real world,
I don't know if you've--
at least in Blacksburg,
where I come from,
to find a perfectly plane
ground is really hard.
And you will encounter
with various issues
in the perspective views.
So yes, I think that
it would be hard to do.
I know what you're
trying to say, but yes.
[INAUDIBLE]
