First off, thanks everybody for joining
us today. We have an exciting
topic. It's kind of my favorite topic of
all time, is around data privacy data
ethics and we're focusing on this
specifically in the time of coronavirus.
The series we started a couple weeks ago
where we wanted to try to take Tech
Salons, which are normally in-person
events, we wanted to see how we can move
them to an online space and still keep
up as much of the quality as possible
from a face-to-face interaction. So we've
been doing Tech Salons for about 10 years
in different places around the world DC,
New York City, San Francisco, Bangkok,
London, Helsinki, Osaka. We have some folks
here today that have been working on
other city Salons which is great. We're
really focused at bringing together the
tech sector and the development and
humanitarian and aid sector to try to
figure out how we can do a better job
working together and kind of take
advantage of the skills that that each
sector has and also take advantage of
all the people who are kind of right at
the center who work on both of those
both of those areas. So the salon idea is
really an informal conversation if you
want to have breakfast or lunch or I
think people in the UK might be ready
for happy hour so feel free to you know
just be comfortable be yourselves on the
call. So I wanted to just introduce the
organizers you've heard Andy. Andy's been
doing a lot of the backend stuff and
then we have Joy and Allana and Aly and
I have also been organizing the series.
These are the folks who are currently
supporting this series of salons. We have
DIAL, and the Principles for Digital
Development, Pivotal Act, ThoughtWorks,
Github, and The Engine Room. Big thanks to
all of these folks for helping us put these on because it's actually just as much work
to do an online event as it is to do an
offline event it seems. So lots of time
and effort put in. So thanks to
everybody there. The first event we did
was really a big picture event around
the tech sector overall and what the
kinds of responses might be from the
tech sector in response to COVID, and
then last week
Lizzie led a really great salon with
some speakers from Uganda, Pakistan, and
Ghana talking about what life was like you
know with organizations local
organizations that are preparing and
getting ready for COVID in their
environments and their context. Today as
I mentioned we're going to talk about
data ethics and privacy. Next salon on
May 14th, is going to be around how to best volunteer your tech skills during the
COVID crisis and then we're looking at
a last salon probably around the end of
May that's going to be around what the
corporate and foundation response could
be to the COVID crisis. After
that we're thinking about doing some
that would be some focused on monitoring
and evaluation and how to do that type
of work remotely because people can't be
in face-to-face contact with each other
anymore. This is our topic today and
this first part of this alone is being
recorded and then we will post that
along with a report on the Tech Salon
website so that folks who couldn't be
here today can watch the recording.
If you want to tweet and share about
this first part that's absolutely fine
you can use either of those hashtags, but
then for the second hour we're going to
go into moderated off-the-record
discussions because we've again trying
to replicate this idea of these more
informal discussions we have so many
people here today that have amazing
amounts of expertise on this topic and I
want to make sure that those folks also
get a chance to kind of chat with each
other and meet each other here at the
salon. So that part of the salon
is off the record it's not going to be
recorded we will shut off capacity to
record so that is really, really off the
record. We would just ask that you don't
share what happens in that second part
you can talk about what you learned but
don't attribute names or organizations
without permission. Our general rules of
engagement at salons are really again
just to get people together to share
challenges and learning. We also ask you to bring yourself, bring your open mind. This
isn't a space for pitching. If you have a
great project that you're trying to get
funding for that's awesome, but this
isn't the place to talk about it and to
pitch it. This is really a space for
learning. We also ask for speakers as
well as participants to try to limit the
use of acronyms because we have people
here across various sectors and we want
to make sure that everyone can
understand each other. We'll go ahead
and get started
As I mentioned we have five speakers today. We have Zara Rahman who is the Deputy Director at
The Engine Room. Zara also runs the
responsible data list which if you're
not part of that list I encourage you to
join that because there's great discussion
happens on the list around this topic of
responsibility and ethics with data. Then
we have Sean who's the co-founder of
Digital Public and the CEO of
FrontlineSMS who's got quite a lot of
experience around data governance and
pretty strong opinions about contact
tracing and some of the downstream
effects of some of these new suggested
technology solutions that are being
proposed for the COVID time. Reema Patel
comes from the Ada Lovelace Institute
and Ada Lovelace has just put out a really
interesting report about the privacy
concerns and ethics concerns on some of
these new proposed technology solutions
for COVID. Then we have Tracey who is
currently independent she's going to a new
job soon and I understand and she was
at DataKind UK before.
Tracy is going to talk a little bit
about the use of data and artificial
intelligence and some of the challenges
with those types of approaches in these
times. Then Amanda Makulec is at
Excella and also from the Data
Visualization Society. She's going to
talk about some of the ways to make sure
that you're doing responsible data
visualization and not kind of
encouraging bad decisions because you
have data visualization
that is not not responsible or not
accurate or is not well explained. So
having said all that, I'm going to pass over
to Zara. Thank you so much Linda, and
thanks everyone for coming. I know
there's a lot of competition right now
with online events and webinars and I
hope you all managing to take care of
yourselves. I guess I'll
just do a brief introduction to a
bunch of issues that I hope that we can
kind of dive into in more detail on the
breakout rooms. These are drawn from
lots of discussions on the responsible
data list as well as you know things
that have been happening around the
place. So I'll dive right in I guess
to start off with one big challenge that
I see is this big rush for more new data
almost as a
silver bullet to solving this all
that's going on. There are obviously a
lot of people in leadership positions a
lot of governments really scrambling
what to do to know what to do and many
of them are really seeing more data as
the answer. I think the big problem with
that is that they don't quite know what
they need. They don't quite know what
to do with it when they get it and in the
bigger picture they don't know
what question to ask when they
try and design or gather data. I'm
sure Sean's gonna go into this in more
detail but for example with contact
contact tracing examples, there's
evidence to suggest that using people to
do contact tracing has worked, but
relatively little to suggest that using
kind of data and smartphones will
necessarily meet the goals that they're
trying to do with reducing kind of the
spread of COVID obviously. There's
also a lot more to be said about the
kind of privacy implications those apps
that I'm sure Sean will go into in more
detail. Secondly, I think there's a big
lack of transparency and accountability
in public-private partnerships that are
being made. As I said as kind of
government's and leadership people in
leadership positions are scrambling for
to know what to do to know where to
go for answers. There are a lot of, it
feels like scrutiny is almost at an
all-time low instead of actually the
opposite. I guess but it should be
just because of the crisis situation
that we're in. There've been so
many interesting writings about how
transparency really needs to be an
all-time high, but that for that to
happen it needs people in those
positions of leadership to be kind of
opening themselves up to that
transparency and that accountability and
we've not been seeing that as much of
that. For example, Palantir is
partnering with the National Health
Service in the UK for the first time. I
feel like in a normal situation that
would be that would raise a lot more
questions or a lot more attention than
it has right now just because everyone's
at capacity and there's so much stuff
going on right now. Thirdly, and this
is a really big but simple one is
increased surveillance. More data means more surveillance by someone depending
on how the data is being gathered. There
obviously been efforts to do kind of
more decentralized approaches that would
build in more controls and kind of
access or yeah access controls to that
but also countries that are taking a
much more centralized approach even
approaches where intelligence agencies
are working directly with the people in
public health to build apps or to gather
data which is deeply concerning. There's also more kind of indirect
consequences aside from the really kind
of obvious ones of people trying to
gather data to map or to reduce the
spread of COVID such as many places are
moving to more digital payments instead
of cash. Generally a lot more is
happening in digital spaces instead of
offline and yeah and I guess we'll we'll
continue to see the kind of side effects
or kind of less direct consequences but
generally it feels like as we can tell
from being in a digital event there's
lots more happening in the digital space.
Then I guess just the long just
thinking long term, even though that is a
little bit tricky right now. The
long-term impact of fast moving or
really quickly made decisions. So
wondering if you know new sets of data
are being gathered, what will happen to
that data after this is over if it's
over in some you know depending on what
it being over means. In the partnerships
what does that mean for apps that are
being built together what kind of access
has been been granted now in order to
make those things happen and how will
that change afterwards. Generally it
seems clear that they'll do seem to
be kind of multiple purposes of data
being gathered. Whether that's locations
of people or yeah other things that I'm
sure intelligence agencies would love to
gather hands on or other kind of yeah
there are other uses. Just being
there seems to be really yeah just like
a lack of focus on what the specific
purpose is and a lack of commitment or
lack of commitment that we can hold them
to, of what those commitments would
be or about what those specific
purposes would be. Then one other
concern is just a bigger, like an over
focus on the end results, or the the kind
of shiny data visualizations or the big
picture numbers instead of on the
process. I think we'll hear more
about the about data visualization later
in this in this call. Of course
methods matter just as much as an output.
How the data is being gathered
makes a huge difference to what those
numbers look like. We've seen that
with you know different countries
gathering or collecting and counting. For
example in the UK when we've not been
collecting the number of people who've
been dying in care homes. Whereas in
Belgium they've been collecting the
number of people who might have died of
COVID but it's not confirmed.
So those date that makes out those kind
of data sets far harder to compare. It just requires a lot
more attention on process that I think a
lot of people just aren't paying
attention to right now. There's a really
good book Data Feminism that was
published recently that really
goes into some of the kind of power
dynamics behind and what happens. A
more behind the scenes aspects of
gathering data that I would really
recommend as well. Then just I guess
I mean I don't know if this is quite the
audience for it, but I wonder a lot about
that all this focus on data. There's
a lot of bad leadership decisions that
are happening that go directly against
evidence that's being gathered. It
makes me wonder whether hard data or
more facts and numbers will actually
help not to say that we shouldn't be
trying to gather that, but what is the
perfect like what are we trying to do
we'll be trying to hold power
accountable for those poor decisions
perhaps after the fact or during the
fact or do we expect that that
will actually inform what's happening
right now. There's been
a lot of kind of sociological research
into how facts don't change minds or
behavior but it's other things around
that. I think it's important to
always put that data in context and not
forget about the bigger yeah what the
bigger goal of why are we using that
data and how we're trying to use that to
inform people.
With that I guess all I'll hand
back over to Linda. Thanks Zara I was
really, really, really interesting and
super useful. I love the the links
and the resources that you shared as
well. Folks, I've shared those in the chat
if you're interested in getting links to
any of those books or links to the
responsible data forum that that The
Engine Room runs. I'm gonna hand off to
Sean. So I thought I'd start maybe by
taking like one one step back in
thinking through context and just the
environment that all of this is
happening in. So currently 84 separate
countries have declared some form of
emergency or have suspended basic
rights in some way. A number of other
countries have informally done that or
it were managed encourage lockdowns.
Essentially we're also seeing quite a
lot of a lockdown start to become a law
enforcement issue. So we're also
seeing that a lot of what becomes or
starts as medical advice, can often
become direct physical violence and
conflict in some places and it has
already started to it has already
resulted in death in a number of places
around the world. So, I think that the
thing that I really want to focus on
here is and really drawing in on Zara's
point is the importance of process and
infrastructure and context in the way
that we do these things. In the way
that we intervene with tech and the way
that we intervene with data. One of the really interesting
observations I think that we see are the
real boundaries of fields here so much
of the commentary is you know
epidemiologists or computer security
experts or statisticians people were
very good at algorithmic modeling. Some
people are very good at you know
deploying tech during emergencies.
All of these fields need to be talking
together right now. So there are
these real real both real-time
translation issues and that each of
these fields has it sort of own approach
to making sure that the things that it
produces are moral
and ethical and fit for purpose and
actually good. Right there at their core
validated approaches to something. And
right now both because those approaches
are often very disconnected and also
because the public authorities that
would normally be you know focusing on
scrutiny here are instead repurposed or
rededicated and focusing on the response.
We have this real set of institutional
gaps where we're not seeing the kind of
quality control the quiet the kind of
due diligence and the kind of basic
oversight both from an ethical
perspective and also from a quality
control perspective that you would hope.
This happens quite often during
disasters. A few years ago I got really
concerned about this with Frontline and
started looking into this work
around Ebola. Essentially you know
the main problem with Ebola was that
proximity was not a very good indicator
for transmission. So when people were
talking about using mobile phone data to
track proximity and to track contacts it
was a really weak signal indicator.
Here we have that same problem where
proximity alone is not a terribly good
indicator of transmission. So a lot
of the apps, a lot of the technologies
that we're hearing about are sort of
trying to reverse engineer the
significance of proximity data more than
they are necessarily trying to solve an
important healthcare system problem. So you know a lot of what we see in
these systems and in these approaches are
trying to learn new things trying to to
solve the unsolved. So we see a lot
of really interesting and valuable
experimentation but I think the a lot of
the challenges arise out of that
experimentation. Not necessarily having a
barrier to market or being able to move
really fluidly from lab to application
in the kind of chaos you know
that Zara has described. In that chaos
I think one looks to every kind of tool
that that you can use to try and arrange
and secure relationships. So I
think you know a lot of what we're
seeing
in addition to large app deployments and
large kind of talk large conversations
about use of data, is this move to
figuring out what might accountability look
like, both in process. How do we
rebuild some of those quality control
and ultimately
accountability guardrails. I think
that where we'll see a lot of that
play out. In one place that I would
really encourage this community to look
is in the contracting and in the way
that we secure and formalize those
relationships, because there really is
both a lot of good opportunity there
and a lot of very concerning open-ended
pieces. So it's a real it's a real
opportunity to engage with data rights
and responsibility right now in real
time, and and to both make a positive
difference in terms of what kinds of
proposals were considering from
technologists and from you know Public
Health responses. As well as you know how
we might hope you architect
accountability when we get to whatever
end state we get to where those
things feel possible. I would just offer
that some of you may have read a piece
called Distracted by Data by Elizabeth
Renieris, which is just a really
wonderful short piece that I'd recommend.
Ultimately, the point is that these
are all power relationships. When we're
talking about data, we're talking about
sharing both the sort of
information and you know that we all
know to be contained in it but we're
also to an extent sharing the social
license to act on that data. That we
as a responsible data community in the
way that we talk about apps, and the way
that we talk about how we use data, also
has an impact on what everybody helps
sort of thinks is acceptable and is good
practice. So there is this huge
amount of opportunity as well to
start taking the statements of
principle that we, that many of the
organizations on this call and in this
community have worked very hard on and
turning them into the kinds of
operational practices that actually
have teeth in supply chains and build
out some of these data governance
frameworks. Just to say that
there's a lot of daunting technology out
there but ultimately the power
relationships aren't all that
complicated. There are a lot of
ways that we can intervene by both you
know strategically negotiating and
renegotiating some of the terms of those
relationships, but also engaging with the
communities that are as capacity support
for the you know the institutions
that are trying to deal with that in real
time. So with that I'll stop.
Great job and thank you so much. We're
gonna pass over to Reema now to tell us a
little bit about some of the research
that that Ada Lovelace has just put
together on some of these specific apps
and kind of a nice feed in from
what Sean was just talking about. 
Thank you, so I think I too will also
want to take a bit of a step back and
I'll take you like some of the reasons
for why we published the research and
why we did this work. So we recently
published a report called Exit Through
the App Store. We were looking at
three key digital technologies that
looked like they were going to be quite
important subject of debate during this
time. One was on tracking and one
with digital contact tracing. The other
which we're having less of a
conversation about, but is really
interesting is immunity certification.
The reason we felt that we wanted to
have a look at these technologies is
because it became immediately apparent
that the conversation wasn't really
about technology, the conversation was
about the implication for rights and the
implication for our society and our
democracy, should we turn to the
technology. And the work that we're doing
looks at these issues through the lenses
of questions such as purpose limitations.
So actually with with the technology is
it clear what the purpose is and
actually what the limitation to the
purpose is? Is it clear what
the data being shared or accessed
or controlled is or not? I think
relators that there's a really important
point about mission and scope creep so
the rate of change in many of these
technologies means that you could start
off with something that had quite
relatively decent legitimation for
instant through people voluntarily
accessing it, but keep them into
something that lacks that clear remit
all functionality that that escalated
out of control have had some mission
creep associated with the app. So you
could start off with something that
people voluntarily accessed or
downloaded by way of consents and then
and up with something much more
problematic. I mandatorily required that
you access or have to be part of
and thats before you even get into the
issues around accuracy and quality.
Then the point that Sean was making
very well I thought, which were the
issues about the app. So even addressing
the problem or the challenge. In a
public health crisis
if one deploys technology just
without really thinking about whether
you know make sure of a challenge it
feels that not really has societal
debates or societal conversation about
why a particular app has being deployed.
Other than the point that the app had just
been introduced as it as a kind of
knee-jerk panic
induced reaction. Zara mentioned this
wider point which draws upon some of
the other research we've been doing with
the foundation of fairness for health
data sharing particularly when private
sector organizations are involved in.
Palantir and the NHS example is a really
good example when there's a real risk of
power asymmetry being exacerbated being
worsened.
That already exists between the private
sector and the public sector. In this case the NHS because the NHS is
incredibly overstretched particularly at
this time. We're pretty concerned about
that and we ran some citizen juries in
order to support the government to
understand a bit better what public privacy
concerns were about that. Off the
back of that research report which was
published before we went into lockdown.
We called on the UK government to be
much more transparent about nature
of these partnerships, and to enable public
engagement and participation to happen on the nature of each partnership. We've
also called on the government to tighten
up its auditing and its scrutiny of
these partnerships. And these are
we still have and we think that
the what happens at this moment in time
with initiative Palantir and
the NHS illustrate an ongoing problem
there. I think there's something there's
a few aspects around digital contact
tracing in particular that's challenging.
So one is that many state, nation states have been requiring mandatory
use of contact tracing and also there are
very particular ramifications for
nudging, and behavior control. This is
where we get out of the remitter we
want to use data in order to understand
the scale of a crisis or to understand
the to essentially conduct
surveillance for understanding nature of
the disease towards surveillance that is
about shaping people's individuals
behavior. I think this speaks to a
wider societal question which is: What is
the implication for COVID-19 in terms
of the text of democracy we want to
create and build, and how should government
legitimately behave through
technology offering remediation
enforce in relation to people? Should
government be thinking about using
technologies to shape the way people
behave? One thing that we call for
through the report was the points that's
immunity certification, lacks an
evidence base for its introduction for
various reasons. But we did identify
through the rapid review very limited
and almost non-evident that
illustrated that either digital contact
tracing or immunity certification
actually worked. The WHO have
actually put out an official policy
position that confirmed that immunity
certification in particular doesn't
clearly work because it's not clear that
if you are identified as being immune but
you wouldn't you know be immuned later on. So
one could end up being certified as
immune and then have some real problems
and recontract the disease
later on down the line. So this goes
back to this question which is if the
technology being introduced or the
technological development being
introduced for the right reasons or the
right purposes, or are there other pressures and conditions in the wider social political
landscape are shaping government's
consideration of the decision. So
I think in a nutshell the questions
about privacy and questions about data
ethics are less about their directives and
more about types of what they want to
build and the conversation that we're
having about digital tracking, digital
contact tracing, and immune certification, but are actually about what we
understand in the evidence base and what
we choose to make our decisions on. Are
we making our decisions on the evidence base? Or are we actually making our decision under pressure
quicker climate in a pressure to climate
under political pressure. I mean if
anyone wants to see the most
precise recommendations were made around
around the technology, then go to our
website and have a look at the app.
But I think that ultimately that's the
crux of the issue in the matter. Thank you.
Thank you, Rema. Yeah it's always
that it's not the only, it's never only
just the technology, It's always all
these issues of power and you know all
these different imbalances, and so I
appreciate that take on it. We're gonna
hand it over to Tracey. Tracey, over to
you. Thank You Linda, and for
the invitation to speak today, and share
the platform with so many people in the
civil society space that I admire.
There's just so much to discuss about
how to uphold data ethics and privacy during COVID-19 times. Today I just want to
focus on the fundamentals, so I'm
following the theme what Sean
and everyone else has said. So maybe that is a theme needed to reflect and step back and
think about fundamentals. So I'm going to
carry that on. I'm going to look at two
things, discuss two things really.
First, being that strange thing how
sometimes we seem to overlook what the
problem is. So let's talk a little bit
about that. Secondly, the need to
collect data to kind of fully understand
whether what we're proposing to do is
actually working. So the importance of
monitoring and evaluation within the
tech space. So I guess first off, what is
the problem? 
Frequently in the data science and AI
world, that's where I've come from
after recently leaving DataKind UK, is
that I often hear about the
technological product and their service
without hearing the clear definition of
the problem the service is trying to fix.
So it's just a classic tech solutionism
and this is my main contention is that
if we don't fully understand the problem
and the key stakeholders that it
affects then any proposed solution is
bound to fail and can increase harm to
stakeholders who are not in positions of
power and influence. Now this shouldn't
be controversial and I'm sure you'd
agree it's common sense, but the allure of
data science and AI can be intoxicating
especially from some social
organizations who want to appear to be
innovative and they associate innovation
fund inparts.
It really challenges organizations to
reflect upon their values and to weigh
up their integrity versus sustainability.
So how should we go about defining the
problem? Well you know we do have to
consult, we need to do thorough research
we know this has happened in and Sean's
already discussed some of the issues as
well about how different sectors and
have different kind of ways of speaking.
So there's definitely more that we need
to learn perhaps from the design sector
and from community leaders about how
bringing disparate groups together. One
thing that I've seen and very sort of
irritated about is in the UK we have the
government that continually states that
they are following the science. As if
science is entirely neutral and devoid
of human assumptions and choices. It does provide a useful narrative I've gotten
that we can avoid lots of the gnarly
ethical decisions by blaming this
neutral science, but pandemics and
catastrophes are inherently gnarly.
Choices have to be
within a world of imperfect knowledge
it's why transparency and accountability
is so key and I really welcome all the
civil society actors that are here today
and who are continually and consistently
working to explain and clarify
the problems we are facing and are
pushing for stronger tech and data
governance .This I think would reduce
some of the potency that some of these
technology products have. For example the
recent story of Clearview AI being in
talks with federal and state agencies in
the U.S. to track COVID-19 using facial
recognition. I mean like what problem
does this really solve? It has massive
issues of efficacy.
Not to mention privacy, transparency,
accountability. It's not a realistic
solution within the pandemic we face. We
need our COVID-19 responses to be open,
fair, and transparent. This not only helps
society to hold organizations to account,
but it also fosters trust. This also
applies to charity funders. For example, a
recent but small-scale study has
estimated that due to COVID-19, 9 out
of 10 Black, Asian, and ethnic minority
led charities in the UK are in danger of
closing within three months. Now by
publishing opening data on grant
funding, we should be able to assess the
flow of funding to the charity sector
and identify where gaps are. And the
charity 360 given in the UK that
advocates for this data, has supported
funders to tag their grants data with
COVID-19 and some charity funders are
using it as a basis to form
collaborations. So in focusing on the
problem, we need to discuss what the
potential consequences will be for
everyone especially the most
marginalized in our society, and use this
information when deciding on the various
actions we should take. Now this virus
has been said by many to be a great
equalizer and whilst it certainly can
affect anyone we must recognize that it
will affect some people more than others.
Opportunities to live healthy lives are
not equally distributed. We know poorer
communities experience worse health
outcomes than affluent ones and
I'm not surprised that early data from
the UK and US suggests disproportionate
negative effects of COVID-19 on ethnic minorities; who are largely in low
paid jobs, providing essential services
and face structural inequalities. Of
course we cannot ignore the impact on
women who make up a disproportionate
amount of the health and social care
workforce who are who are required to
care for the public, often shoulder the
core care responsibilities at home and
in the community; and carry the
frustrations of male partners indicated
by the increase in domestic violence
incidents. We must not neglect that
all these various demographic
classification classifications aren't
necessarily independent. If we don't
center the social determinants of health
and the groups who are likely to be
impacted the most, I fear that our
technological solutions will increase
inequality. This isn't easy, not saying
it is, I've laid out a lot of problems.
Those who are at the front of this
battle they may well be so stretched
that they may lack the capacity and the
state of mental health to take up the
invitation to discuss the implications
of various technologies. This
shouldn't be an excuse not to ask, nor
for us to suggest ways for how they can
be included. Inclusion must stretch
across the whole tech development
pipeline. We must collect, monitor, and
analyze demographic data: gender,
ethnicity, disability, etc. This has to be
done under strict codes of data
management and governance because. If
we're not collecting this data, both
qualitative and quantitative, we will not
understand how these groups are faring
with COVID-19. We won't know if
interventions are ethically just. I
mean that in terms of that the benefits
and risks of the solution are
distributed fairly. an While this again
seems to be common sense, throughout my career I've been frustrated by culture
and politics. So, I'm going to take an
example from the UK sector, who do to
COVID-19, sorry not the UK, the legal
sector, who do to COVID-19 are
ramping up their already planned digital
services and this includes moving in
person court hearings to remote call hearings. Now the qualities and human
rights commission has examined this and
in their recent interim report found
that video hearings can significantly
impede communication and understanding
for disabled people with certain
impairments such as: learning
disabilities, autism spectrum disorders,
and mental health conditions. And it's
expressed concern on the lack of data
currently available and is encouraging
government to begin data collection. So
the movement to digital may be affecting
disabled people's humans rights for
access to justice. While there is a
public sector equality duty, which
requires public sectors to advance
equality and eliminate discrimination,
the need for the collection of
demographic data by courts is a battle
still to be won. I know the director
of research of the Negro Education
Foundation, Natalie Byrom, is on this call
so it's great if you want to find out
more to have a chat with her after. Now
due to the time I'm just going to throw
out other sort of advice and questions
that I have, that I think is pertinent
and we can take this out to the breakout
rooms. How are we going to include the
range of stakeholders that we need in
this problem definition and throughout
the technology product service
development and implementation? I know
that Reema has already mentioned her work
on citizens juries and there's others on
this call that's going to have some
great ideas about this. So as I say,
as a minimum we must we must value
people's time to contribute. Now is not
the time to be extractive without
compensation, and nor to place the burden
on communities to continually advocate
for inclusion. So I'm actually
heartened to see a recent paper from
computer and data scientists at Google
and DeepMind, which explores how data
scientists can involve communities at
the problem definition stage. I think
that's a great move from the data
science community. We should not be
afraid to press the pause rewind button
on the use of big data and AI. I'm just going
to say, that full stop. We should take
ideas and learn from the mistakes of
previous pandemics and natural
catastrophes and not be bamboozled by a
wizzy tech. I'm sure someone has done
lots of great work on this already.
If we are pressing 'go' on a data science
AI project, understand the data and
social context: Where is it from?
What is it's history? Who stands to win and
lose? Do your research. Look at who is
building it. You know, monocultural
teams with the best intentions, are
going to miss something. So development
teams need to be diverse and they
shouldn't only be comprised of
technologists. And then finally to end, if
COVID-19 is an opportunity to make
societal changes as some leaders are
suggesting, and we are seeing some
policies that were said to be impossible
and being made possible, then we all need
to raise our voices and actively work
towards social, economic, and
environmental justice. Thank you.
That was a lot Tracey, it's so useful I think you
really hit on the heart of a lot of the
challenges before us. But all the really
critical things that I think people in
this community and outside of this
community need to be thinking thinking
about it and a lot of the work that
civil society has in terms of advocating
for these issues of inclusion and
injustice. So thank you so much for that
so we're gonna hand right off over to
Amanda to talk a little bit more about
responsible data visualization.
Thanks Linda. I am Amanda McCulloch excited to
be here today and dig into perhaps the
piece of COVID-19 in this story that
you've probably seen and heard the most
about in the news in some ways because
there's this saturation of charts and
graphs and dashboards and other
information being presented visually. And
the reason for that, as I think most of
us probably know, is because of the ways
that data visualization makes
information sticky and it helps people
kind of understand and look at and make
sense of data and information. But I want
to point today to a couple of the
challenges concerns and constraints we
face around visualizing data for COVID-19, around both being better visualizers
of that data for those of you who are
analysts who are on this call, but also
as better readers and consumers of that
information. As we figure out what we
share and how we demand more, in terms of
visualizing the uncertainty in what we
know. So first I want to tackle what are
our biggest challenges that we've seen
visualizing COVID case data and death
data. Focusing specifically on that
data since it's a core crux of a lot of
the models that are being built trying
to somehow create this crystal ball into
the future of what could or would happen
in different states or countries. And
help policymakers make sense of when
they should reopen economies and try to
weigh the differences between what the
implication might be of reopening this
week versus next month. So a couple big
constraints and challenges we've had: A) I
just want to point to that saturation
piece. There are so many charts and
graphs about COVID-19 that are in the
news right now and I've heard from
journalists that part of it is because
of just the draw that this topic has. It
is a global challenge and a global crisis.
As we look to different outlets like the
Financial Times for example or the
Washington Post, their most viewed pages
of all time on their websites are their
graphs and charts were related to COVID-19. And for the Washington Post
that's specifically the animated
explainer, the flatten the curve
graphic that Harry Stevens did. So these
visualizations certainly have wide reach,
but we kind of reached a saturation point
where I have to ask how relevant is this
data or information to me and how is it
helping me in my day-to-day
decision-making or with my own mental
health. So, I would say as we dive into
this conversation if you've seen too
many charts and graphs and are sick of
looking at log scales it's okay to step
away and decide what data serves you. We
have the challenge that a lot of data is
incomplete.
So, on that data that we get about the
cases as we know is a function of how
many tests are being done and some of
our previous conversations in this talk
today have really focused on some of the
gaps and challenges we have in the data
the issue with the fact that we don't
always have good disaggregated data and
the challenges that we face when we have
both known unknowns when you look at a
table of information on different cases
in the US, for example, from the CDC, and
you can see that a number of cases don't
have data on hospitalization or on race
that are actually recorded in the
current data that's being reported out
so there are known unknowns which we
should demand that we see and understand
so we can see what the denominators are
in some of these charts and graphs that
are being presented but there's also all
of the unknown unknowns we can't do a
really firm estimate of how many cases
there are or infected persons there are
that haven't been confirmed with a test
for various reasons.
We have to think about all of that
uncertainty that uncertainty is a really
important piece and it's something we
in the models and the better models I
think try to go ahead and visualize that
uncertainty as best they can other
models try to plot these very firm fixed
lines when really they're kind of a best
guess or estimate as you look at models
and look at visualizations look for
people who are transparent about the
uncertainty and the information and the
limitations of what they've created and
if you're really curious about some of
the details on how we visualize and see
or don't see uncertainty in data there's
a great recorded webinar from Matthew
Kay and Jessica Holman at Northwestern
all about visualizing uncertainty around
COVID-19 that relies on a lot of their
in-depth detailed user research on how
we see and understand uncertainty in
something that ultimately is meant to
help us enable understanding and show
some certainty in the data. We believe
data that is on charts and graphs. We
have the challenge that a lot of
reporting and percentages and
calculations are more complex than they
might seem at the surface. Just because
you can divide say the number of deaths
by the number of cases doesn't mean
you're calculating an accurate
case fatality rate. That's been the case
especially early on in the epidemic when
there were very small samples of people
who we actually had data on and so I
really think we need to look to
epidemiologists and other experts who
can build those kind of models and make
those estimations just because a tool
like Tableau or a tool like Power BI
lets you run that mathematical function
or calculation doesn't mean that the
number you're pulling out is accurate or
correct. Then when we start to plot those
numbers as we've seen in some of the
widely circulated charts and graphs
suddenly those seem more certain they
seem more like numbers that we know
rather than calculations we kind of run
based on the best data we have and not
really knowing all the nuances of that
and I've done a couple interviews with
epidemiologists and others going ahead
and talking about those uncertain that
uncertainty in those challenges and
really wanting to look to them as
experts. I think that we also finally
need to not forget the humanity behind
this data. I think that in some ways the
bigger the numbers grow the less they
seem like people and like stories and as
we think about responsible data
visualization I think it's important to
remember that whether we're looking at a
chart of cases and deaths or we're
looking at a chart around unemployment
each of those individual data points on
that chart is a person and I think
we sometimes forget that as we're
creating these aggregate charts and
graphs and really need to return to some
of those principles from Georgia Lupi's
data humanism manifesto and
thinking about the ways in which data is
represented and represented as people.
For me seeing some of the pictures of
the lines of the cars outside of
foodbanks waiting for support in the
U.S. juxtaposed against piles of
potatoes that were being thrown away in
Idaho because there wasn't the supply
chain or the the demand to get them out
to places who could buy them that to me
just tells a starker story of what's
happening here in the U.S. in this current
environment and any chart or graph could.
And so, as we think about the data I
think we need to think about the state
of humanism. As we think about this I
think it's good to also make sure we're
anchored in who is involved there are
data collectors, there are analysts and
visualizers, journalists, there are other
translators of information, and there are
readers. And I think that the the burden
and the responsibility is greater on us
as visualizers who present and share
information and on journalists who are
then taking that information and making
it accessible to the public
some of intersection they're on data
journalism in some ways. The burden is
greater and the responsibility is
greater that we communicate these things
well because the unintended or perhaps
intentional in some cases consequences
of us making misguided comparisons or
presenting incorrect stories is much
greater. If the chart I create makes you
think that this COVID-19
situation just is another flu that's problematic
because it could cause you
to not take some of the basic
precautions around wearing a mask in
public to protect other people and
essential workers or doing social
distancing. So, as we think about that
responsibility and where that sits and
where we bear that I think it's
important that we think about the things
we can do both as analysts and tech
folks, as well as, as readers of data
visualizations. One is we can demand and
look for more honesty about the
uncertainty in the numbers. I remember
seeing an NPR chart that in the footnote
it said that there were six thousand six
hundred cases represented in the
disaggregated data, but I knew at that
point in time we'd had over a hundred
and twenty thousand cases in the U.S. and
going back to the data source from the
CDC it was clear that the story not
represented in that bar chart was really
a story about the unknowns in that data.
Always consider the people behind the
data,
like we talked about. Think about the
stories there. Bring those voices to the
table. Bring up patient voice to the
table as you're visualizing and sharing
information. Think about how someone
would feel if they lost someone to COVID-19 and we're looking at your chart and
graph. Think about the people represented
there. Collaborate with those subject
matter experts, epidemiologists, and
others. And vet your work before
publishing. Look and see if your
colleague or maybe it's your roommate or
your spouse sitting at home with you can
take a look at your chart and graph and
say "hey, I think I see it this way" and
that might not be what you intended. And
then finally, if you're actually going
ahead and doing data visualization, if I
could close with these words, please
think before you publish into the public
space especially around case and death
data there's so much uncertainty and
nuance in that information even if it's
data that seems staggeringly available
through all of the virtual spaces -- the
AWS data lake, a Tableau data repository
the JHU Github -- you can grab that data
and do things with it,
but the big question on a responsibility
and ethical side for me is should we?
Should we? Should be use it for our own
exploratory data analysis or should we
be publishing in the public domain
because it's actually helping to enable
understanding for others? So, think before
you publish and feel free to play with
that data on your own, but consider how
someone else might see it if they look
to you as an authority in the data space.
And then finally, look for other data to
visualize. Think about the other things
happening in our world right now. The
story isn't just about cases. The story
is about all the other things that we're
seeing and think about where we can
divert our attention to raise and
elevate those kind of stories and look
at that data. So, I think I've given a lot
to think about hopefully as we go ahead
and shift over into breakout rooms
around the data we're consuming everyday.
Linda back over to you. Thanks so much
Amanda and thanks to everybody for these
amazing points I think there's going to
be quite a lot to discuss in the
breakout rooms.
