- My talk is really about tactile sensing
and how can we build better
sensors for a variety of
different applications.
So, let's look at typical
tactile sensing scenario.
Now, we have an object
that we are trying to grasp
the given object and
essentially our tactile sensors
in our hand sort of
give us sort of capture
this information and they give us
some feedback how we should
basically be interacting
with this object.
So basically we have tons
of sensors in our hand
and basically what we are
getting here is very dense
tactile information from the sensors.
Today in HCI if you look at
the different types of sensors
they are kind of bulky,
not exactly usable, so this
is the current state of the art.
In robotics there's tons of
different ways of effectively
trying to perform the grasping
so for example, vision-based
techniques are extremely
important and they are very
prominent today.
Lots of 3D sensors, cameras
that are typically captured
and coupled with current
robotic systems to essentially
guide these robotic systems.
There are also a number
of mechanical sensors that
essentially allow you to
capture the information.
Typically the sensor resolution
is relatively limited
in these cases.
And then we also have the
recent wave of optical sensors.
You are going to see them
talk about this later,
later this afternoon.
So, this is also very interesting
technology that emerges
from this field.
Okay, so how our group tries
to improve tactical sensing.
So, we have essentially came
up with a new type of sensors
and we are striving to
deliver the following
three components.
So, first the sensors
needs to be high density.
The second one the sensor
in our opinion should be
flexible and stretchable
so it can be essentially
applied to different types of surfaces.
And, then finally, it should
be robust and durable.
So, basically it should work
in the field for extended
periods of time.
So, these are the properties
of the sensors that
we would like to have.
So, this is essentially
how the sensor that we have
built looks like we have
tried many many different
things, you know?
And, this is sorta what
we have converged on.
So, in general you know as
you can see this particular
incarnation has 32x32 resolution.
It's flexible, it's
stretchable, and basically it
allows you to measure the
information for extended
periods of time.
So, essentially what you
are going to see from this
kind of sensor is this type of map.
So, for example if you apply
this sensor to a human hand
you see that the, you know,
the areas that have larger
these sort of colors or larger
circles that correspond to
larger forces the areas that's
why that means essentially
that there is little forces essentially
applied to these areas.
So, how is type of sensor built?
Well, basically it is very very simple.
You effectively use a
physio resistant film that
is inserted in between
two different types of
two different electrodes.
Conductive electrodes and
there is some protective
cover on each sheet.
So, the architecture is
embarrassingly simple that
you have here and can be really built from
almost off the shelf components.
You know, this type of
sensor can be built for
less than $10.
So, this is really widely accessible.
You can really start
using it and building it.
So, this is how the sensor looks like.
This is the data capture
from the sensor as you can
see, it's highly flexible.
It is you know, it can
really it captures the data
through the set of wires
so that building a proper
connector in this case
of course was a quite of
pretty challenging.
But, in general, if you
try to look at this type of
capture scenario this is the
sensor model on the human
hand and in this case the
person is folding a paper plate.
Over there what is really I
think the most interesting
sort of in the corner, it
shows you the readout from
the sensor.
So, in general you can see
as the person is manipulating
with this object, you have
this vast amount of data,
tactile data that's being captured.
Okay so, we also have looked
at a number of these different
properties.
We have very carefully evaluated
the performance of this
type of sensor.
This can sense the forces
from roughly 30 milli newtons
to 0.5 newtons.
So, this is a pretty good range.
We typically repeat the sensor
over thousands of cycles
so the repetitions and the
performance doesn't degrade
and it has a pretty wide
operating temperature.
Essentially up to 80 degrees Celsius.
So, this a pretty wide
wide range of temperature
over which the sensor can operate.
We have also worked looked
at the auxetic designs of
these sensors.
What does auxetic mean?
Basically, you can sort of
cut holes within the given
sheet and in this case the
sensor can be stretchable
and basically the performance
will still stay the same.
So, this is how this sort
of looks like if you cut out
these sort of small auxetic
structures and this sort
of shows you even if you
pull on different corners of
the sensor array, that it
still captures the quality
of the data fairly well.
So, the summary so far?
We have a pressure sensor
that can be essentially
that is very inexpensive, very robust.
Operates over a wide range
of temperatures and basically
we can apply this type of
sensor to human hands and
different types of robotic grippers.
So, the question is you
know, can we if we have these
types of sensors, can we
start combining it with modern
machine learning techniques?
So, we are going to show you
a few different applications.
Obviously, I'll show you
that we can combine with
other machine learning techniques.
So, I would like to show you
some relatively simple tasks.
So, we would like to apply
this to understanding
how human grasps objects.
So, first of all, you know
people can really perform
simple tasks like for
example, classification of the
object or estimating weight of the object
and we can do it just on
the touch sensing alone.
So, I'm going to show you
that using modern machine
learning methods, we can
build computation models that
basically are can essentially
perform these tasks
almost equally well.
So, first of all, we have
built a very big dataset.
So, in this case we have
captured we have identified
26 different objects.
And people are interacting
with these objects and
effectively we have captured
130,000 tactile frames from
all of this object interactions.
So, this is how the typical
interaction session looks like.
As you can see, the participant
is instructed to basically
manipulate the object, touch
it from different sides
and sort of perform typical
tasks with this object type.
As you can see kind of from
the close up, this is a
very very rich way of representing data.
You can sort of see the
pressure maps directly captured
as the person is
manipulating with the object.
So, what is the touch
data representation here?
Well, basically we have
sort you can sort of think
of this almost having an
array or two dimensional array
of data.
So, basically sensor locations
are approximated on a 2D grid
and effectively it almost
looks like an image, right?
So, in the areas where you
have data you have basically
you have essentially put it
on a two dimensional grid
and you can almost think of
this as a two dimensional
image of all the tactile
information and in this case
a new peak is essentially
set to zero values.
So, what how are we going
to essentially look at this
type of data?
We are going to use a modern
neuro network architecture.
This is our resident
architecture we have applied
to this type of data and
multiple frames in this case
are treated as sort of
multiples of inputs to the
systems and then the output
from the system essentially
is a classification vector
that tells us which type
of objects we have interacted with.
So, the input is just the
tactile information and
these pressure maps and
the output is what type of
objects we're interacting.
So, and then essentially
you get this classification
vector that tells you in the
highest value in this vector
tells you which object it is.
So, lets look at some of the results.
So, this is typical input
that would be sampled one of
these that would be some of
these images and the output
in this case would be
this vector that tells you
which type of object we
have interacted with.
So, if you look at this
type of data, this neuro
network is going to tell
you, correctly in this case,
that we have interacted with a mug.
So, this is the correct object.
If you look at another
type of frame in this case
the mark is sort of slightly
lower, but still pretty high,
but slightly lower, but you
can sort of see that this
is the correct object, but
there are two other objects
that are predicted higher in this case.
These are the Coke cans.
So, it's not really the
basically the object this
neuro network architecture
does things reasonable well.
Basically it tries to
predict very similar objects
in this case.
What happens when extend
to multiple frames?
Well, it works better basically.
You extend this architecture
to where more frames are
feed to this type of neural
system and this just shows
you that if you apply
larger number of frames the
classification adversely goes
as higher and this sort of
black curve shows you just
if you are if the object
is essentially the best,
this is the correct object.
The red curve in this case
shows you whether the correct
object is within the
best 3 predictors that
we have, that we have
that the neuro network has
essentially told you.
So, let's look at another
sort of example here.
So, in this case this is a
weight estimation problem.
Same type of data.
The person is sort of
interacting with the objects
and the in this case neuro
network is supposed to
tell you what is the actual
what is the weight of
this object.
So, again we have worked
we have trained this data.
We have trained the neuro
network on this data
and as you can see in this
case that the network predicts
the performance or predicts
the weight of the object
more accurately than just
summing over all of the forces
and it's reasonably good
compared to the grand truth.
What is also very interesting,
this is another extension
of this work.
It's actually looked at
what the network is actually
learning.
So, typically in computer
vision tasks people sort of
have when they train these
network, the lower level
layers of these networks,
learn special types of
filters.
So, in this the filters
have pretty much learned
completely automatically.
In this case we started with
a completely blank slate and
we wanted to figure out what
do the pressure maps that we
captured in the corresponding
networks, what kind of
filters did they learn?
As you can see, you know,
the features that are learned
from this data are quite
similar to what typical networks
for visual classifications
capture and self learn.
So this is actually a
very interesting finding.
There are some small
discrepancies between these
as well.
So, the other sort of
interesting question are
what is in the data?
So, what is what happens
here is that when you capture
pressure maps, you really
in fact have a mixture of
different forces.
So, one of them is basically
object hand interactions.
The other one is essentially
object signature, right?
So, this object interaction
tells you that the object
signature and there's also
hand to hand interaction.
So, hand basically...
the pose of the current pose,
tells you something that
is embedded here in this data.
So, basically for each
type of frame you also have
a given hand pose signature.
So, the question here is can
we somehow decode this data?
So, in this case as you
can see, depending on the
pose of your hand, you will
see that the response of
the sensor is slightly different.
And what we do in this case
we are trying to decouple
these two types this type
of information streams.
Basically, the way to do
this is to just look at the
time of the captures and
we can in this case see
in this case we can see
we can observe the whole
stream of the data,
temporal stream of the data,
which tells you the data
that just before the object
was grasped and after
the object was grabbed.
So, in these cases the pose
really doesn't change and
the only thing that sort
of changes is whether the
object was grasped by the given hand.
As you can see, we can in
this case decouple these
two data streams and
essentially almost subtract the
hand pose in these cases.
What is also interesting in
this data is sensor correlation.
So, if you sort of analyze
all of these data streams
after interacting with
different types of objects you
will see that different
areas of the hand are pretty
correlated and this is sort
of embedded directly in
the data.
So, we can run the analysis
that tells you if you
essentially attaching, if
we are working with a given
part of the hand, what are
the other areas of the hand
that sort of also help you there.
So, for example if you
sort of highlight on these
fingertips you will see
that other areas of the hand
also help you when you
preform complicated tasks.
So, the other sort of areas,
whenever you are attaching
something with this type
of, with the tips the green
area tells you what are
the other sort of sensors
that typically in this
case help you in this case.
All right, so I would like to
sort of summarize this talk.
I have shown you how to build
a high resolution tactile
sensing that is inexpensive,
that is robust and operates
over a wide range of different conditions.
I also showed you how to
build machine learning
workflow to utilize this data
and we believe that these
new types of hardware and
software opens many new
different applications,
applications HCI, robotics,
prosthetics, AR, and VR.
I would like to acknowledge
our colleagues that help us
in this project and participated
of course in this project
and I would also to
acknowledge the support from
National Science Foundation and from
Toyota Research Institute.
Thank you and this is the
website were you can access all
of the data about the project.
(audience applause)
