- That's Huy Viet Le and Niels Henze.
And it's about how to
estimate finger orientation
from the capacitive screen
using neural networks.
And as I think everybody of
you have already used today
some touchscreen and
it's basically limited
to X and Y coordinates of
the position of the finger.
So in the last year Apple
came up with this idea of
using force as an additional input.
And I think many researchers
already tried to
establish other dimensions
to enlarge the input on touchscreens.
And this work is using
the finger orientation,
and here Wang et al. already back in 2009,
they used a tabletop
with a back protection,
and a camera to estimate
the finger orientation
to overcome the distance between
the finger and the target.
And then later in 2013,
Kratz et al. came up with the
idea of using a depth camera
in order to estimate the
finger's orientation,
and here they mounted the
depth camera onto a tablet
which that the first portable version
of an enhanced interaction
for the touchscreen.
So with the ability to use the
pitch and yaw of the finger.
And then two years ago in Madera,
Xiao presented the first idea of using
the raw capacitive value
of the touch sensor
to estimate the finger's pitch and yaw,
and they used Gaussian processes,
and I will later describe
why we think this
is the first step and
can extend this work.
So what we did in the
first step in our paper,
we collected ground-truth data,
where we hacked a Nexus
5, as Xiao et al. did,
to get the extra capacitative
data values from the phone.
And we equipped the surroundings
with a motion-tracking system
as we saw in the last presentation
to track the finger's orientation
as ground-truth values
for our learning model.
So in our study, we asked participants
to explore the range
between a very flat finger
and a steep finger, so 90 degrees,
and from all the way to
the left and the right.
So from minus 90 to 90 degrees,
so they explored the full
space with the right hand.
And during the exploration in the study,
the experimenter was able to observe
which areas already have been explored
and then the experimenter
could hint the participant
that there was an under-explored area,
so to that participant, please
explore this area as well
so that we capture all of the whole volume
of the pitch and yaw
to have good ground-truth for
our machine learning model.
And we conducted a study
with 33 participants
with a age range from 20 to 33.
And what we then did,
so these on the right side,
you can see a simple touch,
what we get from the capacitive values.
Here's a index finger touching the surface
highlighted here in yellow.
And as you also can see,
there is some filtering
we need to do to really get
that one blob of the finger
so we first did outlier detection,
so sometimes the capacitive
values are not perfect,
so we have some peaks
and we got rid of them,
and then we did center blob detection
that ensured that there was
a finger on the surface.
Because in the study,
participants, every once in awhile,
just lifted their finger.
So and then after filtering
and the blob detection,
we ended up with like, roughly
half a million sampled labels
so where we have the
capacitive data values.
And the pitch and yaw of the finger.
And then in the following slides
I'm gonna show you what we
did with machine learning.
The machine learning part
is done with TensorFlow,
and to ensure the quality of the models,
we used 26 participants for training,
and seven participants for testing.
So there were no mix between
the test and the training sets.
Going back to what's out
there, so what Xiao et al. did,
they used this capacitive matrix
and then they extracted,
in the paper, 42 features,
which they then feed into
Gaussian processes models
to estimate the finger's pitch and yaw.
And Gaussian processes
here have one limitation,
the space required for
training is an O of N squared,
which for our dataset is not
feasible, and a decent machine.
And so that was one problem I'm gonna show
and how we tackle this
problem in the next slide.
Our new approaches, what we
did, we used machine learning.
And furthermore, we not
extracted features here,
we used the raw values
from the capacitive matrix
and piped them in as an input.
And we had two baselines
for representation learning
that we had k-nearest neighbor approach,
and the random forest approach.
For both we trained for pitch and yaw.
And then we used deep neural networks,
and convolutional neural networks,
and we did some of them.
All the models I'm gonna show
you in the following slide,
we used a grid search to
estimate the best parameters
and achieve those results.
So as I said, Gaussian
processes have this problem
of the quite big space
they use for training,
and therefore we basically
did two implementations
of the Xiao model.
We first used the
Gaussian processes models,
but only with 1.4% of the data,
which is roughly the
amount that Xiao et al.
used in the paper.
And then this worked out
for our implementation.
So we have this Gaussian
processes reimplementation,
and then additionally, to
make use of the big dataset
which we recorded, we swapped
out the Gaussian processes
learning and used deep
neural network here.
So three layer, actually.
And then we have our
k-nearest neighbor model,
the random forest,
then we have one deep neural network.
And the first deep neural
network, the DNN here,
is actually two deep neural networks.
Once predicts pitch and one predicts yaw.
And then in the later steps,
we figured that it's also a good idea
to train one model for pitch
and yaw at the same time.
So we trained a combined
deep neural network
with two output neurons for pitch and yaw.
And three CNN models at the
bottom of the slide there
all combine, they're all training
pitch and yaw at the same time.
This worked out better for us.
And then we tweaked the
CNNs with L2 regularizations
and batch normalization.
Yeah, these are the results,
the overall results.
Root-mean-squares and
absolute means and SDs,
and what stands out here in
comparison to the Xiao et al.
model is that we can reduce the pitch.
So the deep neural network worked better
than the Gaussian processes
for both pitch and yaw.
And from there on we see
that the k-nearest neighbor
and the random forest with
the feature representation
were also better as
the feature extraction,
which was used by Xiao et
al., and then we have the DNN,
which worked slightly better than the k,
and not better than the random forest.
And we went on with the
combined neural network.
Which improved the DNN.
And then we switched to CNN
which really got us a boost.
And here we were better
than the random forest,
and then at the end the CNN
L2 worked out best for us.
And it's 8.9% achievement
against the pseudo implementation
of Xiao in pitch, and 45.7% in yaw.
So what we actually
did, so our best model,
that's what I'm gonna present to you now,
we started with a momentum optimizer.
Basically you can also
achieve the same results
we achieved with the momentum optimizer
with the gradient descent,
but with a way longer learning time.
So you need way more pokes
than with momentum optimizer.
And then we also, for the weights,
we used Xavier initizilization,
which also got the time boost here
to basically setting some
good, well-use for weights,
estimated weights that
are fitting into the range
where you want to end up.
And then we used constant biases of .01.
And for the training, we used
exponential declining rate.
The rate is .01 and one.
And as I said, the CNN
with L2 regularization
worked out best for us, so we applied
the L2 regularization of .015.
So these are the hyperparameters
we used for our model,
and the model structure
itself we used three
convolutional neural network layers.
After each layer we
used the pooling layer.
And the filter size was always the same
for the convolutional neural networks,
and we applied 32, 72, 160 filters.
And we achieved all those
values by applying a grid search
and really narrowing down
from the wide variety,
as this worked out the best for us.
And at the end of our model we applied
fully connected layers
with softplus activation function
and in the middle we had 2,000 neurons.
And as I said, two
output neurons at the end
to estimate the pitch and the yaw.
So looking at the results
of the model, then,
how did we actually achieve this average?
When looking at pitch, I
clearly see there's a problem
from the degrees, so from
a flat finger zero to 10,
and basically this is due
to an underexplored area
of the flat finger on
the phone in our study,
so we have two little
samples here in that range.
Basically because it was
really hard in the study
to perform zero degrees.
But besides that first
peak from zero to 10,
across, the values were quite stable.
And then looking at the yaw results,
we have almost the same.
Having some chitter around that,
but overall we got a stable rate.
So to conclude, we
achieved 8.9% achievement,
or reduction in error for pitch,
and 45.7 reduction in error in yaw,
over the Xiao implementation
by applying feature
representation learning
to a convolutional neural network
with two regularization.
And all the data we
recorded, plus training,
and model, and testing,
they're all available on GitHub
and you can see the
link here on the slide.
And in the future steps,
we plan to move that used
or trained model to a phone
and actually implement it
and use it in some use cases.
And as we used TensorFlow,
this is a rather easy step.
So if you want to do that as well,
you're encouraged to use
the model from GitHub,
and just apply it to your phone.
Thank you for your attention
and I'm happy to take your questions.
(audience applauds)
- [Man] Hello.
Thank you for going over
all those different options,
CNN, DNN, and all those...
You did mention that the
limitation of the previous work,
which you have marked 41 there,
in the training was the running time,
I think you said, was N squared?
Is that what you said, for the training?
- No, it's the space you need
on your drive in your app.
- [Man] So what is the running
time for your classification?
- I don't know since we
haven't ported it into a phone.
- [Man] All right, so it's
just a model right now.
- Yep.
- [Man] All right,
thank you for your work.
- Thank you.
- [Man] Thanks for your work.
I have a question about
interaction paradigm.
You decided to monitor.
It looks like it's maybe
two hand interaction
and just one finger over the mobile phone.
- Yeah.
- [Man] I was wondering if
you know what would happen
if the user is using
just one hand interaction
and maybe with thumb instead of the index,
and whether the model
would either tell you
that might be like seven
fingers, or yeah...
- Yeah, so with one handed interaction,
I guess it's only
possible to use the thumb
in a very limited way, so
you can't really like...
You may be able to like,
have different pitches,
but different yaws is very hard.
So that's why we went with
the two handed interaction.
If you want to go with
the one handed interaction
and apply that model, I guess
it wouldn't work that well.
Since the blobs or the shape
of the thumb on the touchscreen
is very different from the
index finger which we used.
Yeah, please.
- [Man] Hello.
Is the framed model only
usable on the Nexus 5,
or can we also use different phones?
Or do we have to retrain
it for every phone?
- The problem I guess is
not using a different phone,
it's more of actually getting
those values from the phone.
So I think that's more the problem.
And then when scaling to the same size,
I would say that would work.
So if the pixels are...
The real world pixel size is the same
or you scale it to the same,
I would say the model works.
But it's more of the problem
of getting those actual values.
It's quite hard.
- [Man] Could probably use
transfer learning, I guess,
if you have a different finger
or different sizes maybe.
Any other questions?
- [Layna] I'm Layna from Waterloo.
I may have missed it.
How do you create or obtain
the dataset to train...
(Layna mumbles)
- Like from the phone?
- [Layna] Yes, yeah.
- So you can make some
kernel changes to the Nexus 5
and flush the phone and then
you can livestream the data
from the kernelled phone into Android,
and then you can save it.
And there's a link in the paper
where it explains how to do that.
- [Layna] Okay, thank you.
- [Man] Okay, no more questions.
So let's thank our speaker again.
(audience applauds)
