>> Kevin Patrick: So, it's 
a real -- it's a real pleasure
to be here.
We appreciate this
opportunity to talk to such
a distinguished group 
on the exposome and how
our research is
influencing this.
So there you see 
a picture of our building, 
and this is the 
title of my talk.
[unintelligible]
introduced me.
Our laboratory is a highly
interdisciplinary laboratory
here in the 
Qualcomm Institute 
with representatives from the
School of Medicine, 
the School of Engineering,
School of Public Health
across the way at 
San Diego State University, 
and, of course, the wonderful 
group of Ph.D. students, 
and post-docs that
make things happen.
This is the way I like to
depict the type of research
that we do.
We do work in health and all
the health related sciences.
We bring in technology to
support what it is that
we're doing -- the different
types of technologies from
mobile devices to wearable
sensors to cloud computing
to different forms
of data analytics.
And then we do a lot of
intervention research.
So a lot of our work
focuses on how to get these
technologies to be used by
individuals in places where
they live and in ways that
hopefully allow -- support
their health.
A very important thing, and
you'll see this diagram 
a couple times in this -- in
both my presentation
and Jacqueline's.
The thing that influences
our research is a really
very multifactorial approach
to envisioning what
contributes to health.
It's a 
sales-to-society approach.
And what we try and focus on
is essentially everything in
this stack of data that
we're talking about -- that
we'll talk about today.
So, again, I'm going to
cover three projects:
the Health Data Exploration Project,
the CitiSense and MetaSense projects,
and then
a project where we're trying
to pull the data together
from the -- these various
levels in the stack.
So, what is the Health
Data Exploration Project?
This is a project funded by  
the Robert Wood Johnson 
Foundation and the subtitle is, 
"Personal Data for thePublic Good."
And this is predicated on
this notion that health
happens where we live,
learn, work, and play.
The vast majority of things
that influence our health
are things outside of the
traditional medical care system.
And the way that we've
understood these influences
on health have been through
things like randomized
control trials, or periodic
population level surveys,
or electronic medical records,
or the various biomarkers
that we can use to determine
what people might have been
exposed to.
Well, things are
changing a lot.
This notion of people being
online all the time with
various wearable devices and
sensors -- it's just amazing
what's happened over
the last decade.
And it's going to
continue to change.
And I love this
Economist cover.
You know, by the end of this
decade 80 percent of adults
are going to have a
supercomputer in their pocket.
So this is just an
extraordinary explosion 
of capability to measure
and monitor things.
And this is increasingly
called the Internet of things,
or what Qualcomm,
our local big tech company,
calls the Internet
of everything.
Essentially, everything is
getting connected, and this
also is just an
extraordinary number 
of devices generating lots 
and lots of traces, 
digital traces, of everyday life,
essentially pixilating 
a very rich picture of what
people are doing on
a daily basis.
So, we have this
increasingly diverse and
expanding ecosystem of
devices, apps, and services
that are generating
lots and lots of data.
So -- and this ecosystem
of data involves the
individuals who are
producing it, the companies
who are largely collecting
it and -- for obviously
their own commercial
purposes.
But then, as researchers,
those of us who want to
understand the exposome, 
we have an interest in getting 
at the -- at these data.
But that's a
non-trivial issue.
And so, we have lots of
questions about how can we
use these data to
improve health research.
There are new models of
inquiry that are raised.
There are ethical issues
with respect to accessing
and using these data --
privacy, informed consent, 
a variety of things.
But can we use these data to
address disparities and can
we use these data to address
this explosion of runaway
costs in healthcare?
Lots and lots of issues that
come up -- again, research
methods -- what sorts
of designs do we use?
How do we handle the
scale of the data?
What about the
quality of this data?
Is -- are the data
representative of the things
that we might want -- that
we might want to know?
What about ownership of the
data, in privacy, ethics,
informed consent, et cetera?
So, this is just one example
of kind of the way we're
looking at this.
This notion of the
traditional research data
that we might use in a
clinical study compared to
these new forms
of personal data.
The context of collection in
everyday life as opposed 
to the studies.
These data are cheap, rather
than expensive, but they're
-- but they're often and
almost always unvalidated,
especially at scale.
And the specificity --
comparability, completeness.
Was informed consent
used to capture the data?
Is it not?
Again, when we do our
traditional studies these
things are often
very, very clear.
So, these comparisons of how
we look at these data are
really informing what
we're doing as a project.
And this project is building
a network of researchers who
are interested in the use
of these data for research.
We have 130 companies,
researchers and
strategic partners.
We're particularly
interested in getting
companies engaged in
this, because, again, the
companies who have these
data are often, again, just
using them for a
single purpose.
But imagine if we can bring
data together from several
companies and ask questions
about what contributes to,
again, quality of life in
any particular community, or
in a particular age group.
And so, we're -- again, our
project is moving forward
with core research
activities and activities
across the network.
We've got a terrific group
of advisors on this now,
ranging from up in the upper
west coast there -- Julie
Kientz, who's a researcher
in computer science at the
University of Washington, to
the qualified self folks in
the bay area, all the way
across the country to John
Brownstein, who some of
you might actually know.
He might have even given
one of your webinars.
John's the computational
epidemiologist at Harvard,
who's doing wonderful work
on social networks
and surveillance.
Tanzeem Choudhury at Cornell
Tech is, again, another
computer scientist doing
some terrific work in
affective assessment using
wearable technologies.
And then, patients like
me, groups of people with
similar diseases sharing
information on what it is
that might contribute to
either their health or
their illness.
So, our project has core
activities and we use this
Venn diagram to
describe this.
But we're looking
at essentially
representativeness of the
data, exploring methods and
metrics for how one might
think about using person
health data in research, and
then the utility and safety
of this -- is -- how can
we push the envelope a bit
further to understand,
again, how we as researchers
who are used to using
traditional forms of data
might use these
new types of data?
So, an example of this --
we're doing a systematic
review right now of the
validity of wearable devices
currently on the market.
There's a surprising paucity
of information out there
about just how valid these
are in terms of comparison
of the data that they
generate to what we would
consider as gold
standards in our world.
But there is some data.
So, we're in the middle
of -- we actually just
completed this review
right now -- of reviewing
everything that we can
find including published
literature, grey
literature, etcetera.
And so, you'll be
hearing more about this.
We'll be posting the results
of this on our website as
well as we hope publishing
in a very strong journal.
But this is, again,
something that's important
if we're going to be
thinking about using
these devices.
We also -- I mentioned we're
funding the --
a few agile projects.
These are short pilot
studies that are -- that
involve researchers
here using this.
And I've given you a couple
examples of what I thought
you folks might
be interested in.
This is a project that Julie
and Tanzeem are doing on
passive sensing of circadian
rhythms for models of
cognitive performance, and
how can we use some of these
new forms of data to capture
what happens to people over
any given day, and whether
they're likely to be up and
whether they're likely to be
having problems with their
performance or
their function.
And then, another really
interesting study that Ruby
Chanara [phonetic sp] is
doing with Runkeeper data is
looking, again, at
population level
relationships between
Runkeeper data.
And Runkeeper has data on 27
million individuals -- lots
and lots of data.
And they're actually very
proactive in terms of trying
to explore the implications
of their data with respect
to, again, the health
of their users.
And so, this is, again, I
think an excellent example
of, you know, what we might
be able to do with Fitbit
data or with JogRun
[phonetic sp] data, or others.
Another thing we're doing is
this -- is exploring this
issue of data ownership.
This always comes up.
Technically, if any of you
use these devices, when you
sign the terms and
conditions to use a wearable
device you're giving up the
ownership of data to the
company that
captures the data.
But what does ownership
really mean in terms of
health related data?
Is that actually a
construct that's useful?
And we had Barbara out at
a meeting a few weeks ago
talking about the fact that
these types of data might be
considered as a new natural
resource, something that
could actually be very
helpful to all of us, to the
commonwealth if you will.
So, we're digging into this
because there's surprisingly
little scholarly
work in this area.
So, the next project I'm
going to talk about is
really -- is a project we've
been working on for the last
few years called the
CitiSense project.
And I think some of you -- I
know David has got
aware of this.
So, this is focusing on
air quality monitoring.
And in San Diego County the
current state of monitoring
for 3.2 million residents in
4,000 square miles are 10
monitoring sites
that are used.
And these monitoring sites
-- if you go online right
now you can find your
local air quality district
information where you are.
In our area it puts
up a map like this.
And so, these are not very
specific maps with respect
to this.
So, the vision for this
project -- this is funded by
the National Science
Foundation -- was
participatory sensing with
individuals carrying around
a device that would allow
them to capture air quality.
And, actually, more
than one individual.
Imagine having a variety
of individuals across the
community who are sensing,
contributing, and bringing
the data up into a cloud
that could be used by
others, and, on bad days,
be used by public health
researchers or even
clinicians or hobbyists, if
hobbyists are interested in
going online and getting an
understanding of what's
happening in their
neighborhood --
or activists.
And you'll hear a little bit
about that in a moment or two.
So, on this project, we've
developed this platform that
consists of a sensor, carbon
monoxide, NO2, ozone,
humidity, pressure,
and temperature, which
essentially in real time on
a mobile app can give you an
air quality reading --
that's this mobile air
quality reading.
And you can see it can be
shared, and Tweeted, or
shared with your
friends on Facebook.
Bringing these data up,
getting -- drilling down,
getting a bit more
information on this -- but,
importantly, bringing these
data back up into a back end
server that allows
tracking of the data.
So, you -- so, the path of
someone capturing these data
over any given day -- and
the graph on the bottom is
essentially the area under
the curve of what they might
have been exposed to as far
as the aggregate or
quality index.
And so, we conducted a study
a couple of years ago with
commuters to UCSD, people
who commuted via a variety
of methods because we were
very interested in what they
might have been exposed to
-- and commuting at least 20
minutes because we wanted
to actually understand what
they were exposed to.
And this gives you an
example of what we found.
So, what we found were a
number of users, and that
black line across the bottom
is actually the closest air
quality monitoring
station to UCSD.
So, if you went to the air
quality monitoring station
and dialed in that
particular one you would
find a black line.
And you can see what the
various users were in fact
exposed to, quite a bit
more on some occasions.
And so, what's interesting
with this is if you get
these data and then begin to
use methods of interpolation
and use statistical Kriging,
which is what one of our
machine learning folks did
with this, you get a much,
much more granular ability
to understand, "Well, if
your house, or if your
building is located in this
particular area, this is
what it is that shows up,"
as opposed to a map
that looks like this.
And so, we felt that
this is very promising.
We presented this at the
wireless health meeting,
again, a couple years ago.
Won -- it actually
won the best paper.
And we were excited, but one
of the problems with
this is calibration.
The CitiSense device is
actually calibrated -- I
think we sent it back to
your institute as a
matter of fact.
There was an effort to
calibrate it -- or was it
-- yeah.
It was one or the other.
It was either your group or
EPA was doing this -- the NIHS.
And so, what we felt we
wanted to do was to increase
the ability to calibrate
on the fly if you will.
So, we now have a new
CitiSense grant called --
that we're calling
MetaSense, and we brought in
a new investigator, Mike
Hannigan from University of
Colorado, who's sort of an
expert on this notion of
calibrating environmental
sensors in the field.
He's -- he had not done this
with these kinds of sensors,
but we added him to
our CitiSense team.
And what we're trying to do
is to take the sensors that
we have available -- and
we've actually upgraded this.
I'll comment on
this in a moment.
And then, the 10 monitoring
sites that we've got in the
community, and improve our
ability to calibrate these
devices because we know that
they get out of calibration.
Even though we would
calibrate them under the
hood before they went to the
field, we know that at some
points in time they lost
that calibration and that's
an issue.
So, can we use the software
that's on the device?
Can use software that's in
the cloud or on our machines
here at UCSD to
improve this?
And so, this is a schematic
that our research group has
put together about the
fact that we're doing
measurements and
we've got noise.
This noise has certain
components that are related
to various other factors as
we understand this is mostly
electrochemical sensing --
extracting this information
from the noise and using
that combined with other
data that we might have,
historical data, from
previous readings from days
like that to develop a model
of what -- of what the
outputs should be on that
particular day.
So, this depicts what
we're doing in each year.
We're developing.
We're actually building a
new board right now and
developing the architecture
for this system.
The second year of the
project's going to be very
important because here's
where we're going to be
collecting lots of data
and we -- one of our
coinvestigators, Chandra
Dasgupta's a machine
learning expert who helped
us with the first
version of this.
And so, we're going to be
learning from the data,
essentially using machine
learning approaches to
understand what patterns
might be, both from people
using these devices and
patterns in the background,
collecting these data, and
experimenting with this
self-calibration pollutant
discovery, and ability to,
again, fix things
on the fly.
And the system actually
supports this because we've
got a replaceable
sensor port.
We're doing a
much better board.
When we first built this
port almost five years ago
the technologies
were only so good.
Now, we're building a better
board to support iterative
improvements of this with
replaceable and new sensors.
We're also improving the
software that's on the board
to support the calibration
algorithms that are
necessary on the board.
We have a replaceable
communication module
because, also, the radios
that we're using for this
are improving over time.
And then, the CI, which is
the cyber infrastructure,
essentially the
computational processes on
this, are really distributed
between the board, the
phone, and the cloud.
And so, this notion of
doing this in these three
locations allows us to,
among other things, improve
the algorithms, but also
very importantly address
battery related issues to
optimize how quickly or
frequently or not --
infrequently we can do any
updating that we need to do
because power is always the
issue, especially with
any portable device.
So, we're actually really
excited about this project
and with luck we will have
a much richer and a better
system to assess things.
So, again, this constant
feed of data, creating a map
of the sensor readings,
calibration performed by the
phone in the field, and
then ultimately
improving confidence.
And, importantly, these
types of systems don't have
to be used by a
lot of people.
They just need to be used by
enough people distributed
across the population to
begin to get a better
understanding of what's
happening in that
environment in a
real time basis.
So, one thing I wanted to
show you is that we've got a
proposal actually under
review right now where in
addition to building this
platform we're working with
a colleague over in the
school of public health,
Elva Arredondo, and Jenny
Quintana, who's actually an
exposure scientist in
environmental health over
there, and John Elder, who's
a behavior specialist, to
look at community action.
There's a call for
proposals now about getting
communities to better
understand themselves, what
it is that's going on
in their community.
And this is in South Bay,
San Diego, if you know our
area -- the San Ysidro
community approaching the
Mexican border is heavily
congested with traffic, with
trucks, with problems.
But it's also of a very high
Latino community with lots
of kids and lots of schools
that are close to freeways
and whatnot.
So, this project is really
focusing on whether we can
deploy a system like
CitiSense, and we're going
make -- we're actually
working with college
students who grew up in
San Ysidro, working with
elementary school students
in their science classes to
get them to better
understand what's going on
with the air quality in
their area, and when it
might be healthy for them
to go outside and play, and
when it might not be.
And then, we're looking at
different level influences
on this, from the
individual, to the school,
to the community --
ultimately, we hope,
influencing policy in terms
of what it is that might
happen, including policy
with respect to how traffic
is managed in that
particular environment.
So, this is right now under
review with this initiative.
And we're really excited
about this, as is the
community because it's
really -- getting tools like
this into the hands of
communities is actually
very, very exciting.
And so, putting all of this
together -- and I'm going --
my -- the final project I
want to talk about is how do
we think about personal
data, environmental data,
and other kinds of data that
we might have, and think
about bringing them together
in a way that can help
accomplish personalized
population health?
And so, this is an NSF
project that we started a
couple of years ago.
It's in the smart and
connected health portfolio.
The National Science
Foundation doesn't usually
fund much in health, but
over the last few years
they have.
And the aim of this is
actually summed up nicely in
a -- in a -- in a jam up
commentary that came up
towards the end
of last year.
And it's this notion of
models of data capture
expanding from across all
levels, from genomic,
molecular, cellular,
organ levels, etcetera --
electronic health records,
of course we know what's
happened with electronic
health records and
digitization of them.
But at the same time
physical activity,
lifestyles, this economic
and environmental data are
being captured -- there're
ongoing discussions about
ways to harness this.
So, how do we put -- how do
we bring all these together
in some meaningful way?
Well, there're two important
trends, big data -- big data
approaches that include
personal health data, and
then supercomputing -- new
tools to actually handle
what, you know, a decade ago
was simply unthinkable in
terms of how one
could do this.
And leading the way,
obviously, have not been
those of us in health.
It's been Google, and
Amazon, and others that have
set the stage for this.
But we have this terrific
opportunity to take
advantage of this.
So, again, many, many forms
of data -- genomic data,
microbiome data, medical
record data, personal health
data, environmental data,
pollution and whatnot from
devices like CitiSense.
And then public health and
social determinants data.
So, how do we wrap our mind
around actually getting a
look at all of these data
in ways that allow us to
understand the relationships
between and among these?
Providing healthcare,
population health, requires
reasoning across these
layers of data, sort of like
putting these in
a sort of stack.
Again, this is another way
of looking at that other stack.
And this notion of just
simple things like managing
diabetes, you know?
You've got to know
everything from what's
somebody's HbA1C might be,
from their medical records,
to are they getting
enough physical activity?
Why is their diabetes
out of whack?
And, well, maybe they were
fully sedentary this last week.
And then, if you're managing
their diabetes, do they have
access to appropriate food?
Asthma care -- of interest
to obviously to the
community of us who are
doing work in exposure and
air quality.
To really optimize asthma
care, you've got know about
things across multiple
levels of data.
And, even just something as
simple as tracking obesity
for public health -- it's
not enough to just do
periodic self-report
surveys, or even look at
medical records.
You've got to look at a
variety of things that can
actually get a better
understanding of what's
happening both at any point
of time, and across time.
So, most of these data are
ignored, or functionally
unavailable because they're
collected and maintained by
different entities,
and different groups.
Physical activity
by one set of folks.
Social interaction data
-- very, very important
collected by another
set of folks.
Weight data
collected by others.
Genomic data -- how in the
world do we think about
putting this into the
context of understanding
epigenetic processes.
Nutritional data, medical
record data, and
air quality data.
So, what's exciting is we're
actually at a time when it's
thinkable to connect the
dots between these
forms of data.
And so, the vision of this
project, when we pitched it,
was that we would try and
integrate these data into
some sort of a single
uniform data base, implement
analytics and visualization
across the top of it -- and
I'll show that in here in a
second, and then open this
up-- very importantly,
open this up for other
researchers because we're
also at a time in history
when we know that no one
group, and no group of
groups is actually
sufficient.
There's just a terrific
opportunity to crowd source,
and to open things up and
have many, many people
working on different aspects
of the problem at the same
time to improve things.
So, what we've done here in
San Diego, is we're modeling
how this might be done in a
community like San Diego.
And so, we've pulled in
partners -- Qualcomm, our
San Diego Association
of Governments.
We have one of the health
information exchanges here,
the San Diego
Health Connect.
We actually have one of the
more successful groups.
It's not as successful as
many of us would like, but
it's actually doing
reasonably well connecting
the various competitors in
the health care environment
and allowing sharing of
health related information
that foreshadows what we
might be able to do -- and
this is medical information
-- with respect to
registries of that
information, and potentially
registries over time.
And then, we have a very
proactive public health
department that's benefitted
from two or three CDC
grants, and real visionary
leadership to basically pull
everyone together,
and then connect.
We're connecting in with the
entrepreneurial community.
We've got a pretty strong
life sciences, and bio life
sciences community.
And if we can get
entrepreneurs to look at
this issue, we think that
that would be important.
So, the architecture for
this project is, again,
thinking of all these
sources of data that we
might have, bringing these
data into what we're calling
a whole health
information model.
We're modeling this -- the
way we might think about
these data.
And then, importantly, as
shown on the right side
here, opening this up for
three broad types of uses,
those individual kinds
of uses that patients or
parents -- you know.
Many of you have a health
app on your phone, or a
fitness app, or whatever.
So, you know, what if you
wanted to localize some of
those data for your
community in ways that could
benefit others in
your community?
Medical personnel -- so,
the whole notion of opening
these data up for people in
medical settings who might
want to use it.
And then, population health
-- so, this would be public
health personnel who might
be interested in population
statistics and analytics.
And so, the modeling that
we're doing on this project
-- we actually have more
data scientists than we have
health scientists.
And these data scientist
have done this kind of
modeling work in marketing,
in transportation, and
other areas.
And so, we're trying to
follow that lead and think
about how we can model the
data in ways that allow
people to make sense of it.
Have -- we have an interface
in the platform that we're
developing, because sources
of data change pretty quickly.
So, this allows new sources
of data to be registered
to this.
And then, we have this API.
We're -- we call this whole
health application interface
where we can open his up to,
again, this ecosystem of
people who might be wanting
to access the data.
And then, importantly, both
analytics and visualization
-- so, the analytics, again,
we'll populate this with
some analytical things
that we develop.
But we also want others to
be adding in things to this
over time, and sharing this
through an open market so
that they can -- they
can learn from it.
And then, very importantly,
visualization -- when we
have large amounts of data,
sometimes the first and most
important thing to do is
just step back and take a
look at it and generate some
hypotheses from this to get
a better understanding
of what it might mean.
So, the research challenges
on this are many, and I
don't have the time to
dig into all of them.
But they essentially sum
in to new data types, and,
again, highly different
forms of data at a spatial
level, at a temporal level,
at a -- just a type level.
I mean, they're very,
very different
dynamic environments.
It's, you know -- that
train's left the station.
There's a lot that's going
to continue to happen.
We need to either
address that or not.
And then, modeling this --
again, very important, and
you'll hear this from
Jaqueline in her talk.
Again, a lot of what we work
on is the fact that you've
got to be thinking about
how this is anchored into a
particular location.
And this is what's exciting
about talking to people
interested in the exposome
because place influences health.
You've got to be doing this
in ways that take into
account local phenomenon.
So, we've got two use cases
that we're working on right now.
Asthma is one.
And, obviously, that's a
logical extension of what
we've done with the
CitiSense project, where
we're using data from
sensors like CitiSense, and
capturing environmental
data, and medical record
data to develop improved
abilities for people to
understand what's going
on with their asthma.
And we're actually in
partnership right now with
Kaiser [phonetic sp], local
Kaiser, to explore sort of
explore what some of the
requirements for this might
be, and developing an asthma
app that could be used.
And what we want to include
in this are things that
right now are really very
difficult to include
because, again, the data
sources are not easily
capturable and
[unintelligible].
And, you know, everybody's
heard about what Propeller
Health has done with, you
know, GPS devices, and
asthma inhalers,
and whatnot.
But that's only
part of the picture.
I mean the big part of the
picture is giving real time
data to people combining
things like their activity,
what the local air quality
might be in their particular
neighborhood, what their
medical -- their individual
medical characteristics
might be.
Are they on a medication,
or are they not?
How are they
responding to that?
And so, this is something
that we've -- that we're
teasing out right now.
And then our second use case
is -- this is with support
from the Robert Wood Johnson
Foundation on another project.
We're working with the
Virginia Commonwealth
University Center for
Society and Health on
looking at how we can look
at the entire community of
San Diego with as many
different kinds of data as
we can.
And here you see the
structure of what we're
looking at -- social and
economic data, health
systems data, the physical
and social environment,
public policies, and
individual behaviors.
And they've done some
preliminary work across the
state of California looking
at life expectancy at the
census track level using
both traditional techniques
of population --
epidemiological research --
but then, also, some new
methods of machine learning
that can take into account
many, many more variables
than we might ordinarily,
and really looking for both
expected, but then
unexpected influences on health.
And we're trying to
replicate that in a more
granular and more specific
level here in San Diego.
And, again, what this does
at the broadest level was
allow us to take into
account what this exposome
is all about.
And then, we'll be looking
at asthma, diabetes, cancer,
Alzheimer's
disease, etcetera.
So, essentially, we're
trying to set up an
environment where we can
step back and look at the
big picture of all
influences on health,
because we think this is
what policy makers want to
know -- is like
where do we invest?
Do we invest in
safer sidewalks?
Do we invest in medical
care for things?
Do we invest in
immunization programs?
What's really the most
important thing to put our
resources in?
And as I say, and the final
slide here, is importantly
to do this in way that
demonstrates that this is
possible to be done in a
community the size of San Diego.
We are already talking about
replicating some of these in
Virginia and other areas.
But again, we think that San
Diego's kind of, you know,
not too hot, not too cold.
It's sort of just the right
size to sort of be able to
do this, and in part because
we have a variety of the
data sources that are
pretty rich here locally.
And we can also send out
new kinds of things like
CitiSense and other things.
And I think you'll hear how
Jaqueline's [unintelligible]
-- how she's characterizing
this as well.
So, with that, I will close.
There's a view from
the left coast.
And I'll open it
up to questions.
>> Jacqueline Kerr: So, I'll
get launched straightaway
into this MIPARC study.
And I think what you'll see
-- Kevin was able to provide
a very large, big
picture view of this.
And I'll answer some very
specific research questions
in my presentation today.
So, I won't talk in too much
detail about this
MIPARC intervention.
And you can look it up.
There's a very nice
video on YouTube.
So, if you type M-I-P-A-R-C
into YouTube, you could see
that video, including
perspective from
our participants.
But this is really one
of the earlier trials.
And now, there's an NAH call
for proposals to do more
multilevel interventions.
But this intervention
was based in retirement
communities, so we knew that
we'd be able to expose our
participants to the
intervention that we were
delivering because they
lived in the location we
were working, and if we
influenced any of the
neighborhood around those
locations, again, they would
be exposed to those.
So, this sort of retirement
community environment was
like a microcosm for
a larger community.
And that's still
the challenge.
How do you take this type
of intervention into a much
broader community?
But this is the first
step for learning.
And, essentially, following
the ecological model, we
were doing interventions
at all different levels.
So, the more novel and
challenging intervention
components were
saying, "Okay.
If you're walking, can we
get you to walk further into
your community?"
Now, these are older adults.
The average age was 84.
We had folks from 67
to over 100 years old.
So, of course we start by
encouraging them to walk
indoors, where it's safest,
on their campus, depending
on the campus size,
where it also feels safe.
But we want to give them
that opportunity to walk in
the neighborhood, and
particularly for older
adults, because purposeful
walking is important.
If they're able to walk to
the grocery store, and walk
for transportation that's
extremely important to them.
So, we worked with a local
community advocacy agency,
Walk San Diego, who are now
called Circulate San Diego.
And, basically, advocated
around the intervention
sites -- first go out and
do a walking audit and see,
"What are the barriers
to walking in
these communities?"
And on the top left, for
example, you can see a very
clear sidewalk.
And that was only made
possible because we put in
the blocks to stop the cars
going over the sidewalk.
Originally, all the cars
were blocking the sidewalk
and participants would not
walk in this area, even
though it provided access
to great walking routes.
In another community, they
didn't have an auditory, or
a countdown signal
for the crossing.
It's a large road that
is challenging to cross.
We couldn't do much about
the speed limit on that road
or the width of the road,
but certainly the local
transportation agency came
in, immediately put in
auditory and time
down crossings.
And this was a key access
point to a local
shopping center.
And then, the example, for
example, on the bridge here,
was that this bridge had
been completely overgrown,
full with trash -- again, a
key access point to a local
shopping center.
And otherwise, participants
would have had to cross over
roads, roads that sometimes
didn't have cross walks or
cross timing.
So, this bridge was
a key access point.
And it wasn't until the
older adults advocated and
got this letter on the
mayor's desk, did they work
out, "Okay.
Who is responsible for the
maintenance of this bridge,
and then actually
to get it cleaned?"
So, the question is, "If
we're going to the effort to
do these interventions, and
to try and encourage people
to walk further, and walk
to different places, can we
actually measure that?"
So, these are the key
exposome questions that I'll
be talking to
you about today.
So, one -- our participants
wore accelerometers and GPS
in these studies.
So, the first question for
me was, "Are our traditional
accelerometer thresholds
appropriate to identify
walking in these
older adults?"
As I say, average age 84.
So, older than many studies
that have been performed.
And also in an
intervention context.
Can we actually measure
change in these participants
with these tools?
And then, can we assess if
older adults changed where
they walked because
of our intervention?
And does where they walk
impact their health in any way?
So, I'm going to start
with the first one.
So, essentially, our
participants do a 400 meter
walking test as part of
their functional fitness test.
And, during that test, we
asked them to wear
their accelerometers.
And, straightaway, from
baseline data as we started
this study, we could see
that depending on their age
their average counts, that
is normally in metric that's
used in physical activity
research, varied greatly by
the age groups.
So, the three different
colors are the three
different age groups.
Now, the black lines
crossing those graphs that
are round about the 2000
mark, is -- what we would
use is a traditional
accelerometer cut point to
identify physical activity.
So, these people were
performing their walk test,
they're out to walk as
quickly and as comfortably
as they can.
We know they're walking.
The behavior we're trying
to encourage by
this intervention.
But the majority of
participants were not
walking at that level.
And I've also included a
dotted line which is an
alternative cut point that
has been proposed for
older adults.
And, again, not appropriate
for many of our over
90 year olds.
And when we then looked at
these different amounts that
-- speeds at which they
walked during their 400
meter walking test, we then
saw -- applied those speeds
and those kind of -- we
calibrated this and made an
individual cut point, and
then looked at them during
their seven days of wear
after this particular test.
And, particularly the
younger folks, even though
they could walk at this
intensity during their 400
meter walk, they never
reached that intensity when
they were walking
in free living.
So, this does not reflect
what people are doing in
their everyday lives.
So, that straightaway
alerted us to some issues.
So, essentially, this is
what then we used to try and
develop training data to
understand how we could use
machine learn methods
to better assess these
accelerometer data.
So, rather than using a
crude threshold, can we use
more information from the
accelerometer to actually
inform what we're doing?
And as accelerometers
improved, raw data became
available on three axis.
It enabled those
techniques to be used.
But, what's important for
us, is that we have some
sort of ground truth to tell
us what these participants
are doing.
And as I say, we don't want
the ground truths to be a
400 meter walk that they're
doing in a artificial setting.
We want the ground truth to
be what they do in their
everyday lives.
So, that's why we have this
SenseCam that's an outward
facing camera and shows you
what the participant is doing.
So, these are example
images, and we developed a
coding system to be able to
code what were the behaviors
participants were doing.
And, as you can see from the
picture, the example sensors
-- and we had sensors
on the hip and wrist.
The wrist accelerometer is
now being used by NHANES, is
able to capture 24 hours of
data, and as you can see the
sort of size and -- of the
GPS data that we're using.
And, although some of these
devices are now available in
phones, particularly for our
older adult population we
focused on these research
grade devices that we could
be sure captured the data
we needed all the time.
So, essentially, these are
the type of studies where we
have this mobile
management of behaviors.
The important part, really,
is that we've been able to
have diverse samples wear
these SenseCams and the hip
and wrist GPS, and that we
have over 400 participants
wearing these devices.
We're still coding a lot of
this data, but we'll have
great training sets that
hopefully, in my mind, will
be able to develop an
algorithm that is robust to
cover many
population groups.
So, obviously, coding this
data is exceptionally time
consuming by humans.
So, we have also invested in
machine learning techniques
to help us better -- more
quickly identify what might
be occurring in the image,
and then still having human
coders to verify if that's
the correct classification.
So, the machine learning
technique we use is
basically comparing the --
training the machine on the
known truths that you
have, and then testing
the classification.
And we create 41 features
from the GPS and the
accelerometer data and then
we use a random forest
classifier as the first step
in the machine learning process.
And then we basically also
have a time smoothing using
a Markov modeling, hidden
Markov modeling, as well.
And these are the examples
of the types of features
that you -- that are
important in the process.
And, although the data I'm
going to show you is just
actually based on
accelerometer outcomes, you
can see that the GPS that's
in green also contributes to
us being able to better
classify these behaviors.
So, the examples I'm
actually going to show you
just to make the point about
why it's so important to
have free living data, we're
actually comparing the
performance of classifiers
based on a study where we
collected 500 different
trips around San Diego to
then two studies where we
had cyclists and overweight
females being --
wearing these devices.
So, these were the first
cohorts that wore the
SenseCam, and we now have
data as well from the older
adults that will enable us
to apply these to them.
But, some of the points I
want to make is when it's
free living you don't
necessarily have equal
examples of all the
behaviors you might be
wanting to study.
So, for example, when we did
this prescribed particular
transportation study, had a
lot more riding in vehicle
time, more walking time.
And certainly in the cyclist
cohort we got some cycling,
but the overweight
females we didn't.
And it's very important when
you're developing a machine
learned classifier, it
benefits if there is a
balance of behaviors.
And -- but the point is in
free living there isn't that
balance of behaviors.
So, is it -- it's going to
be more likely to predict a
certain behavior based on
the number of training
examples you provided.
So, as you can see, this
is another box plot of the
intensity of physical
activity that occurred when
we knew that these people
were doing these particular
behaviors, walking, or
running, or cycling.
So, again, it just
demonstrates to you how the
existing cut point isn't
functioning very well.
But, importantly, the box
plot on the far left of your
screen is the folks that
were asked to do these
different transportation
modes around the city and
note what they were doing.
So, this is often what the
other researchers are doing
is encouraging people to do
these prescribed activities.
Now, honestly, these
researchers that did this
had 500 trips to make, so
they may have been motivated
to walk a little bit more --
a bit faster to actually get
these trips finished.
So, again, we don't really
know what is occurring when
we ask people not to do
these things in those
naturalistic settings.
But, again, you see from
the other points that in
naturalistic settings it's
definitely -- these cut
points aren't working.
And you can see across the
three different studies that
had three different cohorts
of people very different
levels of walking.
And, particularly here, the
cycling is just totally
missed by these cut points.
And, again, is cycling
a relevant behavior?
To me yes.
It really could be something
that could contribute to
public health if we could
build better environments to
support this.
So, our study essentially
built this machine
learned algorithm.
And then, we tested it.
So, the first study --
basically, when you have
prescribed data you can see
that the algorithm performed
very well for these
different transportation modes.
And you get to up to 93
percent accuracy, compared
to when you develop a
classifier in free living,
perhaps you lose a few
percent of accuracy,
because, again, it's a
much harder environment to
predict in.
But what's really important
is when you use the
classifier trained
on the prescribed.
So, when you have study one
training data prescribed to
study three, the overweight
women, it loses its accuracy
by 13 percent.
So, essentially, something
that's trained in a
laboratory, or trained in
a false environment -- it
doesn't predict the free
living behavior so well.
And you can also even see
that the cycling cohort did
not look the same as the
overweight obese women.
And, again, the classifier
did not perform as well on
these women.
So, this shows you in terms
of the types of minutes that
is predicted and
mispredicted.
And one of the things you
can see is the cycling --
that cohort study two
predicted cycling in these
women even though there was
no cycling in these women.
So, again, if you train a
classifier on a cohort where
they cycle a lot, the
classifier definitely is
looking for that type
of behavior to predict.
So, that's why it's so
important to be having the
right training data set.
And these algorithms work
equally well whether you're
using them on
the hip or wrist.
And, again, they perform
better than other algorithms
developed in a
laboratory setting.
So, the GGIR classifier were
developed in a laboratory
setting, so we're more
confident in these free
living algorithms.
So, in terms of answering
the exposome question, "Are
traditional accelerometer
thresholds appropriate for
older adults,"
the answer is no.
And machine learning
techniques can improve our
classification accuracy.
But the training data really
needs to be collected on the
populist [phonetic sp]
notion of interest and in a
free living context.
So, moving to the next
exposome questions, "Can we
assess if older adults
changed where they walked?
And, what's the impact
on their health?"
Essentially we're focusing
on their GPS data to do this.
So, the first part of being
able to analyze GPS data,
essentially came from a
GEI, the genome environment
initiative, funded
system called PALMS.
And now, PALMS -- it's taken
many years to develop, but
it really is now being used
by 147 different users
around the world.
We have data on over 16,000
participants in this system,
and over 2.4 billion
observations.
Another important part of
this process -- this is all
demonstrating it's a
usable system, but are the
algorithms and the
classifications we put into
this system actually valid?
So, again, the prescribed
trips study that we did was
able to help us validate the
mode, the indoor/outdoor,
and the length of the trip.
But also, our SenseCam data
has been able to help us
validate whether trips in
fact are trips, and the mode
that they're occurring,
whether it's
indoor/outdoor time.
And then, we also even had
in a preschool observations
of indoor/outdoor time.
So, we have many papers now
that are validating the
algorithms within this
system, and then over 30
papers of people using the
system in their research.
So, again, to understand
this GPS and accelerometer
data it wouldn't have been
possible without investments
in a system like this that
helps us process and clean
the data.
So, we've collected GPS data
over a number of studies
including the MIPARC
intervention, and, as I say,
over multiple time points.
So, again, what happens when
you first of look at the data?
Well, there is missing data
when you have GPS data
simply because there is
signal interference.
Other reasons for
missing data are because
unfortunately at the moment
you still have to charge
your GPS data every
night, whereas with the
accelerometer we can deploy
that for nine days and not
have any problems with
battery life or memory.
So, one of the first things
[unintelligible] looking at
is can we input some of this
missing data and what is the
implications if we do?
So, again, fantastic to have
the SenseCam, because it
allowed us to say, "Okay.
If we see a point -- we lose
data and then we see a point
and it's very close to where
the original data loss
occurred, could we be
confident that between those
two time points the person
didn't actually change
location; that simply the
building is interfering with
the signal?"
And the SenseCam data was
-- enabled us to confirm
whether that was the case.
And certainly in about 17
percent of the cases that
was the case and we were
able to input the data.
But you can see from these
graphs that there are very
different patterns of data
imputation depending on how
much time the person is
spending in these large
buildings or sort of high
rise buildings where there
might be more signal loss.
So, it matters -- it effects
who has imputations, but
also then, if you're looking
at sedentary behavior, you
would lose a lot of
knowledge about where that
sedentary behavior occurred
if you didn't input some of
these variables.
So, that certainly was an
important first stage for
us, to understand what we're
missing and what happens
when we input that data.
And then we also have to
develop a geodatabase
[phonetic sp] to help us
actually analyze the data.
So, you know, previous
studies had looked at
perhaps signal journeys
and small numbers
of participants.
But, when you actually want
to know every point in time,
and exactly where it's
occurring and what sort of
locations it might be
occurring in, it soon was
really taxing the
systems like our GIS.
And instead you have to
create this separate
geodatabase using Python and
post SQL [phonetic sp] to
help this process
happen more quickly.
And so, we invested in
that, and in particular a
supercomputer center
supported HIPAA compliant cloud.
So, we could confident that
those GPS data was being
stored and handled securely.
So, again, we were able to
leverage funds from other
grants to enable us to make
this next step so that we
could match GIS
and GPS data.
And then, as we're doing all
this work, we still need to
understand what it
is we're looking at.
And so, a framework was
something we definitely had
to develop and have more
theoretical understanding of
what it was we
were looking at.
And in particular,
what is exposure?
Because that's what we're
now starting to look at.
So, is exposure to do
with time, in particular
something like time domains?
And I'll show you an example
of a time domain with this
life space idea.
Or is it to do
with location?
And then, what is actually
location or is it to do
with behaviors?
And as we move and start
to understand that our
traditional studies have
really been about access --
when you look at somebody's
neighborhood and say,
"What's around them, and
does that influence their
behavior," that is
to do with access.
And even if you use your GPS
data and create something
like an activity space, or
a standard deviational list
[phonetic sp] that's still
not exactly where you're going.
So, what we're trying to get
to now is looking at how we
can use kernel buffers that
might be weighted for the
speed or the transportation
mode, and actually get at
measures of exposure.
And, again, as we do that,
and we're talking about
every single data point, it
has great implications for
our statistics as well,
because we're looking now at
three levels of nesting,
which is minutes within days
and days within people.
So, it's definitely been
challenging from that
perspective as well.
So, here's an example if
you think about location.
So, when the PALMS system
picks at locations, you can
see along the pink dots
that are a GPS trace.
There are three
green dots here.
So, this is how PALMS would
cluster and say, "This is
the location."
And our SenseCam provides
us images of what those
locations are.
So, there're periods in
time, say 20 minutes or so,
when someone has clearly
stopped and is taking a break.
And that would be
considered a location.
There's also another system
you can use which is a
kernel density that
looks at locations.
And this seems to have much
smaller periods of time, but
also perhaps still
meaningful places.
And then, another way of
thinking about locations is
with an amoeba approach.
And that basically says that
this whole hiking trip
is a location.
So, if you think about it,
if we're thinking about
walking in the community, is
something like China Town a
destination and a location?
It's not a single point.
It's a whole area.
So, that's even sort
of showing some of the
challenges of simply trying
to define something
like location.
It really depends on
your research question.
So, what we look at in our
older adults is
these life spaces.
And people have shown that
self-reported life space --
so, for example, "Do
you leave your bedroom?
Do you leave your house?
Do you leave your porch?
Do you leave your garden?
Do you leave your
neighborhood" -- that those
things are related to
mortality and cognition.
So, what we can do in our
older adults, in these
retirement communities, is
look at the percent of time
at home, on campus, in the
neighborhood, and beyond
the neighborhood.
And then, we can also see
the percent time walking in
those domains.
So, this is a way of us
being able to aggregate this
data across the whole group
and, for example, compare
control and
intervention groups.
Other questions that we're
looking at is -- comes down
more into the
individual level.
So, for example, here you
can see where a participant
walked in the red at
baseline, and then where
they walked in the green
after they followed some of
the mapped routes we had.
Now, this is still at the
individual level, and so,
our challenge is, "how can
we have metrics of this
across the whole sample?"
And then, also, here this is
an example where you can see
a change in the walking due
to installation of a safer
cross walk.
Just going to pause here,
and maybe show some
animations if we have time.
Yeah.
Okay.
So, again, these animations
are not -- are showing
aggregate data, which is
important, but they're not
necessarily giving a
statistical metric.
So, what you can see here is
the blue line identifies the
campus of this retirement
community, which as you can
see -- beautiful location, and a very walkable neighborhood.
But there's really
not much activity.
The orange dots are when
activity occurs --
the tire intensity.
The yellow dots are probably
walking speed in older adults.
And, even despite the
walkable community, at
baseline there's not much
physical activity occurring.
And these retirement
communities are also well
resourced with physical
activity resources.
So, let's go to the next
animation, which is at the
three month time point.
And, as I say, although this
isn't statistical metrics
that we can report on yet,
these types of animations, I
think, are very important
for policy makers to be able
to see, because I think
these types of animations
are motivating.
So, here you can now see
that our participants are
taking walks that are of
a decent intensity, and
walking out into the
community and using their
community more.
So, although, as I say, I'm
still trying say how can I
capture this statistically
so that it -- I can show
that there's a significant
spatial and physical
activity difference, it's
very apparent visually.
And if I was talking to my
policy makers down at SANDAG
who I meet with regularly, I
would use an animation like
this to make that point more
clearly than perhaps any of
my P values could
demonstrate.
So, we're really excited to
have this capacity and to
see this.
So, we can now go back
to the presentation.
Thank you, Kevin.
So, essentially, answering
this question, "Can we
assess [unintelligible]
adults changed where they
walked because of
my intervention?"
Yes.
I feel like the GPS do have
the accuracy these days to
be able to help us do that,
but the data preparation and
the analysis is
still challenging.
So, the next question is,
"Does where they walk impact
their health?"
So, this is one analysis --
again, we've only done it at
baseline at the moment,
but essentially we grouped
people into whether they --
in the black bars -- whether
they spent less than 30
minutes outdoors and whether
they had 30 minutes of
physical activity in the day.
And going through to those
that had some activity, but
not outdoors, or just
some outdoor time, but no
physical activity, to the
group in the white bars who
are those that
basically did both.
They spent more than 30
minutes outdoors and they
had 30 minutes of
activity in a day.
And, essentially, you can
see across these different
variables -- fear of falling
[unintelligible], cognitive
functioning tests -- again
their 400 meter walk time,
or the self-reported
functioning where higher
scores are better.
In the other ones lower
scores are better.
So, you can really see this
difference between the
groups -- that those that
are indoors doing no
physical activity are
definitely not doing as well
as those who are outdoors
and doing physical activity.
And, you know, when these
type of studies are covered
in the press I get a lot of
older adults calling me and
saying, you know, "Can we --
can we show that our hiking
group is doing better?"
Or people saying, "I do
indoors -- should I
be going outdoors?"
To me, my first message
is always about safety.
And, again, if we can
demonstrate that the outdoor
activity does have some
benefits, then I think that
would be really important
for policy makers to make
those outdoor
environments safe.
So, until we have a
randomized control trial
actually testing this, you
know, I'm very careful about
what I claim here.
But I think there are also
new thoughts that we can
think about.
So when our new colleague
here, Rob Knight, came to
UCSD, he was very interested
in the fact that we could
capture indoor/outdoor
time across populations.
And this basically can
be related to the
gut microbiome.
So, going forward, thinking
about ways of including
those sorts of data
in our analyses.
So, that takes me to, then,
the next study that we had
funded from NCI, which is
really trying to capture
both the home environment
through traditional GIS, but
also, then, our exposure
environment GPS that this is
getting from a cohort in
San Diego, half of
them Hispanic.
But we're really getting
that variation in
environment from the start
from where they live.
And we're taking it through
to being able to have blood
biomarkers of insulin
resistance and information.
So, there are very few
neighborhood studies that
have these types of
biomarker outcomes.
And when they are they're
about the neighborhood, not
about total exposure.
So, this study is crossing
the range from 35 to
85 year olds.
And so, what's also then --
we've been able to leverage
from this data set -- is in
a small sample they will be
wearing the SenseCam again
to help validate the
algorithms we'll be applying
to this population to get
more data about context.
We're going to have a small
sample also having the phone
and -- including audio in
that, because our machined
learned colleagues are
very interested in that.
But, as Kevin said, we're
bringing in the genome and
the microbiome as well, and
we're really excited that
this will be small, but
one of the first data sets
that's actually able to look
at many more layers of this
issue than in the past.
So, just to finish, I've
been focusing mostly on
physical activity in space,
but I would be amiss if I
didn't remind you of the
media attention that has
been attributed to sitting.
And, again, there are good
ways now with these thigh
worn inclinometers to
measure sitting behaviors,
and we've done several
studies looking at this.
So, for me I think, it
really just takes us back to
the public health picture,
which is we need better
measurement of
the 24 hour day.
I want to be able to see
where and when behaviors
occur and how they
might be interrelated.
So, for example, time of day
and location of walking, 
for example, how is that
related to sleep?
And, really, looking to see
if we can increase the piece
of the pie that is exercise,
and reduce the piece of the
pie that is sitting.
And so, it's important we
get these 24 hours of the day.
So, that's me.
Thank you.
