so my thesis topic is designing
interactive systems for community citizen
science, and during my talk I will
introduce what's the concept of
community citizen science first, and
then will talk about research questions
and my contributions, then I will talk
about two projects, one is the shenango
channel, which is the air quality
monitoring system, and another one is
smell Pittsburgh, and then I will conclude
with design implications, so citizen
science in general refers to
empowering amateurs and professionals to
form partnerships and produce scientific
knowledge, and I want to emphasize the
importance of partnerships, traditionally
when people think about science, it's
like scientific projects conducted by
experts in research institutes
but right now citizen science is really
different because it emphasizes the
collaborations between amateurs, which is common people or citizens, and also
the experts, in general citizen science
has two strands, one is more research
oriented, another one is more community
oriented, and I'm going to use our
project to explain this, so the research
oriented one for example can be the Galaxy
Zoo, so that's the project for
classifying a lot of galaxies online, and
in this type of projects, one really
important thing is the research question
is often defined by scientists, so it's
asking a question about how common
people can help scientists in collecting
a lot of data, usually these type of
problems are really large that
scientists can now solve this alone, so
they invite citizens to participate in this,
the second strand is called community
oriented project, and this is actually
what I care more about in thesis, for
example it is called Louisiana bucket
brigade, so that project was started by
an attorney on about 1995, when the
attorney was working on suing a local
polluters, the attorney worked with several
engineers
to create these low cost sensing
devices, so citizens can take the device
to collect air quality samples, and then
use that as an evidence, so they can take
that air bags to the lab for further
analysis, this type of project is really
different than the research oriented one,
because now the research question is
defined by citizens, and we are asking
the question right now about how
professional scientists can help
citizens, so you can see these two
approaches are real different, ok so we
can in general think about citizen
science from three different aspects,
first one is the value, so which is the
one that I just described, it's about
scientific research versus participatory
democracy values, the second dimension we can think about it as participation
levels from citizen as tools to citizens as
scientists, so in the galaxy project it
make more sense to treat citizens as
sensors to provide labels for a lot
large amount of galaxies, and in citizens
as scientists part it makes sense for
doing this because communities has local
attachments to their concerns
so actually make more sense to treat
citizens as scientists that can help or
design the system features with experts
together, the third dimension is called
governance structures, usually
research-oriented project, because the
research question is defined by
scientist, so usually is top-down it's
like from the academic Institute or
governments, another one the community
oriented part is more grass rooted so it
makes sense to use the bottom up or
multi-party structure, ok so my research
scope which is the community citizen
science, is a community oriented one that
I just described
it focuses on empowering citizens to
produce scientific knowledge, one really
important property about this type of
problem is called wicked problem
so usually the problem community
citizen science is facing is complicated
for example air quality problems, this
type problem is at a really large scale
also it has no clear definition in every
cities, the conditions are really
different like for example the number of
pollution sources, the scales the
structures between the communities, and
the regulators, also wicked problems they
are really depending on context, which is
in every cities is different and then it
has no room for trial and error, for example
if we want to deploy a system in like
one city, we can just deploy it and say
like does it work or not because every
time when we deploy the system it causes
a lot of financial resources and labors
so there's no room for trial and error, also there is no right or wrong solutions to these
kind of problems, so because of the
complexity of this type problem we need
to collect a lot of evidence to support
and produce the scientific knowledge, and
that's why we need the information
technology for doing this, in general
empowering citizens to form scientific knowledge with information technology
poses three challenges, here are three
challenges
the first one is data quality, for
example it is really difficult to have
people monitoring air quality 24/7, so we
need to have ways to somehow automate
their correction process, the second one
is during the collection usually we
collect a lot of sense like personal
informations like your GPS location
somehow we need to analyze them, the
second challenge is related to
communication, for example if we collect
a lot of data from multiple sources, how
do we visualize them together, and the
second one besides just visualizing data
how can we further extract knowledge
from it to use it as evidence
the third one is called evaluation
metrics, so how do we measure the social
impact after we deploy the system, and
then this is my main research question
how can we design interactive systems
with visualization crowdsourcing and AI
techniques to support lifecycle in
community citizen science, so now I'm
going to explain the life cycle, so
remember I talked about the wicked problem
and the wicked problem is what makes it
hard to use traditional software design
principles to do this, and this idea is
actually inspired by architecture design
so when architects they are solving
problems like building a public space
for communities, this type of problem is
also a wicked problem
the architects they use buildings to
solve this type of problem, and now I'm
using interactive systems, so that's the
similarity of it, and when architects
design buildings or like the public
space, they need to understand
communities needs, also when they build
something it is important that they keep
observing the behaviors of how people
use the space and then change their
design methods accordingly, so they
can use it for another design
iteration, the idea is inspire me to
consider the cycle, so the first step is
how do we how do we initiate the
community engagement, initiation we can
visualize data and also we can break a
large task into small tasks for crowdsourcing
the data, the second one is called
maintenance, so after we initiate
engagement how do we maintain the
participation, for example how do we
automate repetitive tasks when people
are using the data to collect evidence
another one is can we actually extract
knowledge from the data and show it to
people, so it is important to this step
because a lot of citizen science
projects they stop or like died in the
middle because they they have no
enough participation,
the third one is called evaluation, so
after we deploy the system, how do we
evaluate the impact of these kind of
systems, for example we can measure
behavior changes or measure attitude
changes, and then this evaluation, they can
provide design insights or design
implications that inspires another
design iteration, so that's the idea of
the cycle, here are my contributions, my
contributions are to the sustainable HCI
field which studies the intervention of
using interactive systems to influence
the attitudes and behaviors toward
sustainability, there are two type of
contributions, one is methodological
contribution, so I have four systems in
the thesis and after that I provide
empirical generalizable design implications
these are the four systems, and during
this talk I will briefly mention the
air quality monitoring project, which is the
shenango channel and then I will talk
about the Smell Pittsburgh in more detail
the current research on designing
interactive systems usually focused on
either human generated or machine
generate data, and these two type of data
they actually they are really different,
and then they all serve as a fragment of
evidence, for example human generate data, they can show how residents are affected
by local issues, but they are typically
hard to quantify and they are typically
noisy, these data, for example
Ushahidi is a platform for
crowdsourcing a lot of crisis information
online, so people can send information by
SMS, like text messages or emails, then
the information can be posted at posted
on the website, so this type data is
really useful when the mainstream media
is filled with doubts and rumors, and
then citizens can submit like the
actual information to the platform, for
example there's like post violence after
the election, and then machine generated data, it is
good in providing temporal dense
information, but it fails to explain how
residents experience the environment, for
example this is Speck sensor
so it's an indoor air quality sensor,
these type of sensors you can place it
indoors, and then you can when you cook
you can look at the sensor readings and
you can see okay the sensor reading is
rising, then we know we need to turn on
the fan, so that's like how we use
machine generated data so now I'm going
to talk about an example about how to
integrate a lot of types of data, which is
the air quality monitoring system, it
integrates images sensor readings and
also smell reports from the community, so
back to long time ago we work with ACCAN in
solving a local pollution problem, so CMU
is there, and then we work with the community
which is on the west north of Pittsburgh,
so the community suffers from air
pollution problem for a long time, and
that a lot of the air pollution comes
from the coke refinery there because
when they emit pollutions, when the wind
blows to the community, then they got all
the pollutions, so we want to help them to
collect strong evidence, and this
evidence are actually data driven, so
they are driven by sensor readings and
cameras, we help people to deploy cameras
in the communities and sensors, and these
sensors and cameras are actually
provided by us, but locations are
provided by the community members, and we
want to present them to the government
of media, and raises the public awareness
of air quality issues, and these are like
images
okay so now I'm going to give a demo of
the system, here you can see there's a
zoomable and pannable timelapse,
so you can pan and zoom, there's an area
there that has a lot of pollution, and we
have a computer vision algorithm that is
looking at that location, so instead of
having people looking and checking all
the video frames, we now have the computer check it, and that is calendar cameras, and there
are smell reports that we get from
community members, and also the sensor
readings, that's the smoke detection part,
so this algorithm allows people to just
quickly check which part has smoke,
so they don't need to go through all the
images, because we have years of data and it will take a lot of time for people to
go to all of them, and they can also fast
forward
so as you can see, you can fast-forward
through this without seeing the entire
video, and then go further we actually
provide smoke images from the system, so
people don't really have to generate
all of them manually, so system is
able to provide a lot of images, and
people can just choose one and then put
them inside the document to collect it for
evidence, okay so this is the result on
a November 2015 meeting, we project this
on a large screen in front of a meeting
with the EPA and ACHD, which is the
Allegheny County Health Department
health department in the Pittsburgh area, and when we project this, this actually
influences the attitudes of regulators,
and the idea is the community members
right now are able to tell a story by
using the data, for example they can look
at the videos to say what's
happening there, and then they can report
smell to say how does the pollution
affect their area, the sensor readings
provides information that when the wind
blows to the community, these sensor
reading rises, and that's the evidence of
how pollutions affects the community, and
then there's the smoke detection for
people to collect evidence, so all these
forms a really strong evidence, the
community members are able to use it
with their personal stories, so that's
the news from the Post Gazette, I just
quote it from there,
so one the acting director of EPA Region
three actually pointed at this image and
said it's totally unacceptable, so you
can see now the data is powerful, when
people use it they can actually change
the attitudes of regulators
okay so the previous project is on a
community scale, now I'm going to
describe a project that is operating on
city scale, which is Pittsburgh, so we
take the part of the smell report and we
want to expand it to the city scale to
empower the entire Pittsburgh citizens
to submit smell reports, just to
give you an example about how
bad the Pittsburgh quality is, the
American Lung Association lists Pittsburgh
in the top 10 most polluted cities by
year round particle pollution, so this is a map created by Albert Presto in 2016, that's
the black carbon map, and which are a
combination of pollutants like fine
particles of PM 2.5, the red area
indicates bad parts, so as you can
see we have a lot of red areas there, and
our goal is right now to empower the entire
Pittsburgh citizens to contribute
scientific evidence, but now we have
different type of challenges, because the
scale is not larger, we have more
residents in this area, and also in the
previous one we have one known pollution
source, but right now we don't really
know which one causes the problem in
Pittsburgh,
moreover currently when people report
smells to the government, it's actually not
transparent and the data quality is
doubtful, this is the website for
reporting smells right now, so you can go
to the health department's website and actually enter the information
here inside a text box, but the problem
is when people they experience the smell
asking people to report
retrospectively is going to cause
problem because a lot of people may
forget the exact time the smell
event happens, or they will forget the
exact location, and if these information
are not correct, then they introduce
extra noise when we analyze the data, and
that's that will causes the findings to
be misleading
so that part is really important for
data quality, also it's not transparent
because when people report it, it is like
going into a black box or black hole
no one knows like what happened except
the people who get the report, for
example if you submit it, then it's hard
to know if the problem is on a local
scale or on a city scale, we don't really
know how others think about this, it
could be just your neighbors problem, or
could be like the city's problem, so we
design this system, and here is the demo, people can select the smell
ratings from one to five, and then they
can also submit the source of smell
symptoms or any comments to Health
Department, and then when they press the
submit button actually the smell report
will be sent to the Health Department
and there are setting parts that people
can use it to provide contact
information for the Health Department
also they can receive notifications like smell report alerts or air quality alerts
on the visualization you can pan and
zoom on that, and then the triangle icons
indicate smell report, the circle icons
indicate sensors, so you can click on the
icons to see the detail of them
and you can also animate all the smell reports, so check the sensors on the right
when it becomes red and the wind blows north, you see there's a lot of
smell reports popping up, so this visualization actually
provides a way of knowing what happens
there, if we just provide a static image
then it will be kind of hard to know
what happens in that day, so just a some
numbers for comparison, on 2016 the
health department collects about eight
hundred smell report, all the complaints
but our system is able to collect ten fold
increases it's about 8,700 reports
moreover among these eight hundred reports collected by the Health Department, forty
five percent of them has missing
location information, but in our data, we
have no missing location information, and
we have all the timestamps correct, the
system usage study, which is something I
did
after we collect all of the data, I checked
the server logs, and then check all the
distribution of smell reports to analyze
the pattern, the studies showed that our
user contributions were highly skewed,
and most of the contributions come from
less than half of the users, and that's
the table that shows three
different type of users, this type of
user
submit smell reports and also
interacting with the system, there are
other two type of users, they either
submit reports or interact with the
system, when we see
that they are like 47 percent, but they
contribute to about 91 percent of smell reports, and 94 precent of the content,
about 76 precent of the interaction events, so these interaction events I got them
from the Google Analytics, so it's a way
of tracking how people use the system
like when you click on a smell report to
see what happens
another one is if you look at the
distribution, so this is a box plot so I
compute the number of smell reports
submitted by each user, and then plot it
here,
so you can see a lot of them they are
all skewed towards the zero side, this is
this red bar means the median, median is
50 percent, so you can see actually almost like it's about half of the people, so 50 percent
they submit less than three reports, that
means a small amount of users contribute
a lot of data, and this is actually a
pattern in citizen science projects, we
also found it here, same thing happens
for the interaction events, the second
findings is although our users grew
over 11 months after soft and official
launch, there was a decrease in
engagement recently, so when we look at the
data, the x axis indicates the time, so I
am computing the number of smell reports
for that month, so I'm writing them
across the time axis, and the y axis
means the numbers, so I'm plotting the
number of smell reports, number of events
and also a number of unique users, these
two bars indicates a soft launch and official
launch,
soft launch means we only release the
system to a part of the users, official
launch means we release it publicly
to everyone, we can see from the soft
lunch and there's usually a bump here
that's because we did a media release on
this
month, and also here you can see also a
huge bump, overall our user increases all
time, but recently we found a decrease in
engagement, so the number of smell reports
drops, and also the number of interactions
drops
I also did a text analysis of all the
data that people submitted to the smell
reports, and the result shows that these data are highly related to industrial
pollutions, especially the hydrogen
sulfide, so if you look at this
that's the symptoms, that's the
description in comments, and the red part
that I marked here, they are related to
hydrogen sulfide, for example these symptoms
they are related to a long-term exposure
to hydrogen sulfide, and the description
here you can see the rotten egg smell,
that's actually the hydrogen sulfide, and
also sulfur and industrial, so this idea
actually inspires me to further
investigate the effects of hydrogen
sulfide on the communities, okay so
remember when I talked about our system,
there's a push notification, now I'm
going to describe how that works, so we
want to send push notifications to
encourage engagement, and we also want to use that to inform users about
potential smell events, for example you
can see on the phone this message
like this, a smell event alert, it will
tell you that local weather and
pollution data indicates there may
be a smell Pittsburgh event, and just pay
attention and keep a nose out, these are
the regions that we are interested in
forecasting, so the numbers indicates the
number of smell reports, that black dot
indicates cmu, so that's where we
are right now, and the regions indicates the
zip code, so these are regions we are
interested in
so to enable the prediction, we want
to estimate a function, that map's the
input to output, the input are quality
sensors, the outputs are smell events
that's the function that we want to estimate, so what's the input, the input is the
air quality data that we get from a lot
of monitoring stations near Pittsburgh
it's hourly readings, because they are
provided by the county, and I'm including
all the readings for the current time,
and also the previous three hours, we've
collected about 13,000 samples, so
samples are observations, so we have
about this number of observations, and
then that's the predictors, so we have
about 195 predictors, predictors mean
like particles or some chemicals like
h2s or wind directions, and what's the
output, that's "y" so that's called a
smelly event, and the way that I'm
computing it is I sum the smell report
ratings over the next eight hours, and
see if it's value is larger than 40, then
I'll say yes there's an event, so it's
basically yes or no, and just to give you
an proportion of the yes and no, there are
only eight percent of samples saying
that there is a smell event, about 92
percent are no, so this actually in a
more technical term is called imbalanced
dataset, it means our dataset has yes and no
really imbalanced, now I'm going to describe the model, to model this, so that's the F
of X part, it's called random forest, and
the idea is, it's a collection of trees
that makes decisions, we want to use
the air quality data to predict the
smell event, so idea is here, we take all data, and we first divide them into
different subsets, this data include
something that I just talked about, like
the data provided by the county, they are
SO2 CO NO PM, etc, for
each random subset, we train a tree to
describe that, and that tree is like a
committee, so this is like a committee
running the prediction, each tree is like
one person, it's looking at only a part
of the data, so to give you an idea about
how the tree works, you can look at the
node, for example it's like, maybe that
expert will first inspect the H2S, and
then check if its larger than zero, if it is
yes, then it goes that way, and check if
it's like larger than 0.1, if
it's larger than 0.1, if it's a no
then we go this way, and eventually you
will reach the end of the tree, it's like
how doctors make decisions, like when you
go to a medical office, then maybe the
doctor will ask you, do you have a fever
something to determine if you have a flu,
so it's like this process, and eventually
you reach the end and get a solution, so
each person gives different labels, and
said like yes right now we have events, no
we don't have events, and then all the
results are aggregated together to form
the final answer, so it's like voting, and
this is the distribution of smell
reports over time, the x axis means hour
of date from 0 to 23, the y axis means
the day of week, we only predict smell
reports, we only predict events during
the daytime because you can see here
there's not really a lot of events at
nighttime, because people are sleeping, so
when we are doing the eight-hour prediction, we do it from 5 a.m. to 11 a.m., and this
covers the time for 5 a.m. to 7 p.m. because we are
doing the eight-hour prediction, and the reports are concentrated around 9 p.m.,  a.m. sorry, so now I'm
going to describe how we evaluate the model,
the model is evaluated by first computing
true positive, false positive, and false
negative, I'm going to describe how these
works, so that's the, the top one is
actually the crowdsourced event, it's ground truth
that we get from the data, the bottom
line is the prediction, the x axis is the
time, for example it's like 5 a.m. to 7 p.m., the y axis indicates no event, or yes there's an event, so for
example if we predict something happens,
and there's actually no event, then me
say this is a true positive, that means
we say yes and there's yes, something
happened, if we predict something but
there is actually no event, then we say
it's a false positive, because when we
say yes and there's actually no, if we
miss an event, and that's called a false
negative, because we missed it we didn't
predict anything about it,
so that's the metric that we are using
first for computing the performance,
after that we compute something called
precision recall, here so just to give
you an intuitive explanation, precision
means when our system says yes how
likely that it will be a yes, like we
could we can make false prediction right,
when we say yes there could be no, and if
our precision is really high, that
usually means when we make prediction
then it's pretty accurate, for recall, that
means if we missed a lot of events or
not, so if the recall is really high, that
means we didn't miss a lot of events, so
there's usually a balance between these
two metrics, like for one extreme if the
precision is really high and recall is
really low, that means we
could only make predictions
when the sky is a completely dark or
smoky or hazy, and that's not good
because we are too conservative in the
prediction, if the recall is really high
but the precision is really low, that
means we are predicting every event, so
I'm sending a lot of notifications to
people, say yes there is a problem, but a lot
of them could be wrong, so that's two
extremes, but we want to balance them, we
want to find something in the middle, and
that's called the F-score measure, so it's like a combination of these two metrics
and I'm using that measurement as to measure the performance, just quick clarification you're not
dealing with the duration of the event,
so you don't you you still count as
corrective if you predict that there was
an event any time overlapping the true
event is that, there's more about the
last slide, but yeah like so those two
on the left side right, that's a that's a
true positive because that predicted
event overlapped with the real, even
though it didn't, the duration obviously
didn't match, yes, so that's the
reason why I do this is because that's I
think it's more reflective to the real
situation, that when there is an event happens if
somehow we can predict at one point,
then I say it's good enough
okay then I'm going to explain now how do I evaluate this dataset, so the
metric the method I'm using is called
cross-validation, and the idea is we
separate the entire dataset into a
training dataset and testing dataset
so testing dataset is something that
we use that to train the model, and we
evaluated the model on a testing dataset, so we have in total 79 weeks of
data, and I take the first 48 weeks,
train the model, and then test the model
on one week of data, and I keep iterating
this for 31 times, and then I run the entire
experiment for 100 times, then this is
the final result, so on the baseline that
I'm using here is called always yes, that
means I could make a prediction by
always telling you that there is a
problem, like every day if I tell you see
you like there's always a problem, what's
the measurement, so if it is always yes
then I'm not missing any events, so the
recall is one, pretty good, but the precision
is really bad, because now I'm giving a
lot of false predictions
then the precision is really low, and you can see here
the combination measurement is
really bad, it's like 0.33, so that measure is
the minimum number is zero, and maximum number is one, our model is able to do
this performance, so we can reach the
precision up to 0.88, which is a
lot better than this one, and our F-score is 0.76, we still miss a lot of events, we somehow
the model is unable to pick up
because actually in crowdsourcing data
there are actually a lot of noises in it
okay so next when we predict the data,
I'm trying to understand what's
happening in it, instead of just
providing a prediction, and the idea is I
want to understand what's the
environmental factors that matter the
most, and what's their joint effect, so
remember these are the input and output
of the data, before that we use a model, I
call it black box model
because it's pretty hard to analyze, we
don't really know like what happened
inside the model, it's pretty hard to
understand how that make decisions, so
instead I take only a part of the data,
and a train a white box model, white box
model means something that we can easily
analyze it, this part I'm just using the
knowledge that I get from the
community members, so I talked to several
people in the community to know that
actually VOCs, volatile organic compound
has an effect on smell events, also from
the text analysis I found that hydrogen
sulfur is really important because
people are mentioning, so I just select
them, and then for the samples, remember I talked about the "yes" there are 8%
of "yes" and 92% of "no", somehow
I use a way to select a group of samples
that are similar to each other, and I use
that in my explanation, so the details
for selecting these samples, these
positive samples, and the
predictors are in my document,
now I am just providing the high-level concept
I am going to jump to the conclusion immediately,
so my model is able to explain about half
of the smell events,
which is the join effect of wind direction and
hydrogen sulfide, I'm going to talk to
you about the patterns right now, so this
is actually the pattern, when we are
looking at the model, so the model is
actually a decision tree, we can trace it
from the top to the bottom, the first
node is this one, it's the Parkway East
wind direction in the north and south
direction,
multiplied by the Liberty monitor two hours ago
that's the first node, that usually means
is the most important one, and then the
second node is the interaction effect of
the Lawrenceville monitoring station
multiplied by the H2S reading, that one is in a east and west direction
I'm just plotting three levels of the
tree, there are a lot of more data but
basically after that, they are just
combinations of wind directions, so this
actually forms the pattern, if you look
at that, that actually means if there's an
emission from that monitor station, if it's detecting some H2S, and the wind
somehow matches the pattern, then the air
could be trapped in Pittsburgh areas, and
then we got a lot of smell reports, so
that's basically the pattern inside
the data that I am able to find it, but I am
only able to explain about half of them
for another half there could be noises,
like woodsmoke or something else, after I
deployed the system, I did a study to
understand the impact of the system, so it's a survey
study, the survey tool is developed by
the Cornell Lab of Ornithology
it's a validate tool, so we measure two
things, one is self-efficacy
another thing is motivation, so
self-efficacy means if the community is
confident in that in the goal that they
can achieve, here is the result, we found
increases in the self-efficacy in about one
Likert scale
another one is the motivations, so we ask
internal motivation and external
motivation questions, and we found that
the motivations are mainly driven by
internal factors, so external factors are
like rewards or fames that people want
to get, internal motivations are more like
they want to contribute data, they think
this is good, to go further to understand
the motivations, we actually look at open
response fields, and we found about 36% of participants mentioned that the system
allowed them to collect data easily and
contribute scientific evidence, so for
example they feel that they can
contribute to taking data, and now they
don't have to call the health department
because they can easily make they can
easily report data by using the system
about six participants mentioned
altruism as the main motivation, which is
the concern of the welfare of others, for
example there's one comment about that
the participant uses the system to raise
other to demonstrate this to
other people and show how they can use
it to raise their awareness, because there
are a lot of people live in the area
that has pollution for a long time
and they to them this like normal, and they
don't really sense the pollution anymore
and that's dangerous, the second one is
about building momentums in the air quality
community, so for new activists they can
use the system and they can see that
there are a lot of people are also
concerned about the question of air
quality, so they are not alone, about four
participants which is 16% of them mention that
the tool is good for validating their
personal experience, for example now they
can use the tool and see they are not
alone, and it validates and confirms
what they are seeing, because when they
submit it they can now
check the app and see if other people
also experience the same problem
about 8% of people use personal resources to help promoting the system, for example one
participant run the Google AdWords,
another person actually print the
screen and bring them to public meetings
to gather attention of regulators, okay
so now I'm going to sum up with design
implications, there are three design
implications here, the first one is I
think it is important to treat community
members as co-designers like what I
mentioned before, not tools or sensors
and I also want to emphasize these two,
citizens as scientists is one direction
but I want to emphasize the importance
of scientists as citizens, so we are also
part of people who are also experiencing the problem, so it's also important to think
about this way, also it's important to
ask about what the community really
needs but not what the researchers think
the community needs, these two are really
different, so when we design the system,
it is important to attend community
meetings and talk to people and
understand what people really need, for
example in the shenango channel project,
which is the air quality monitoring system
I actually we actually have several people
in the lab, attend the community meetings
show them how the system works,
developing the features with them that
suits their needs,
the smell Pittsburgh actually the smell
ratings are designed together with the
community, because the community they
know the ratings about exactly what
happens of these ratings, so they were
able to come up with this, also for me to
identify the joint effect of hydrogen
sulfur and wind direction, I actually use
the community knowledge, so these are the
knowledge that I get when I meet with
people doing like informal meetings
the second one is about contextualizing
scientific evidence, so it's about forming
the context by using evidence from
different perspectives, it's important to
integrate both type of data, for human
generate data or machine generate data
for example in the air quality monitoring
system, I talked about how I integrated
sensors, reports and videos, so community
members can use it to tell stories
the second one is, in the smell Pittsburgh
project, I talked about how to not just
perform prediction but also
understand the structure of the data, the
implications in it, so that's something I
did before for interpreting the patterns
about hydrogen sulfide and wind
direction, the third one is we use
computer vision or some other ways to
automate the tasks when people are
collecting the evidence, and that's
actually in the air quality monitoring
system, it's impossible or maybe not
impossible it's like really difficult to
collect all the evidence by looking at
all the images, somehow we can automate
this and speed this up, the third one
is about evaluating the impact of the
system, it is important to think about "is
the system influential" rather than
"is the system useful", I am not saying useful is not important, but I'm just saying in
thesis I care more about "is the system
influential", so when people talk about
evaluations of a interactive system, some
people refer to usability, and usability
actually means if the system is
effective in helping people completing a
task, so it's like how much time do you
need to generate an image, and that's as
the measurement of usability, right now
I care more about the impact, which is
does the system actually impacts the
community in some way, and it turns out
it is difficult to evaluate, because if
we want to say if when we
deploy the system it actually influences
the community in some way, then we are
making a causal statement, and that's
something that requires experiments, like
we have to do a randomized experiment to
actually select a group of people that
has access to the technology, and another
group of people that has no access to
technology, and doing this and all at
random, and this is this will cost a lot
of resources, also it could be unethical
for doing this, so instead I'm asking the question about does the community think
that the system is influential, or how can
the system be influential, so that's why
I'm doing all the studies after deploying the system, in the air quality monitoring
project, I found that both manual and
automatic approaches are both important
so I won't recommend automate everything,
because it is important for people
to actually participate in the project,
and actually use the system to generate
images to see like the complexity in the
problem, and then later on we can
automate this for them, the second one is in the survey study of smell Pittsburgh
I actually found that the system
allows people to contribute data easily
and also validate their own experience,
and also altruism or motivations, so
these are important for us to do another
design iteration, to like design another
system features, so thank you for coming
and I would like to thank my
committee members and all the lab members, and also the ACCAN, and also Ryan from the GCC, he
helped me a lot in writing the thesis, and
also my wife, thank you
