[MUSIC PLAYING]
DAVID MALAN: So today
we're going to talk
about challenges at this crucial
intersection of law and technology.
And the goal at the end of today is not
to have provided you with more answers,
but hopefully generated more questions
about what this intersection is
and where we're going to go forward.
Because at this intersection lie a lot
of really interesting and challenging
problems that are at the
forefront of what we're doing.
And you, as a practitioner,
may be someone
who is asked to confront and
contend with and provide resolutions
for some of these problems.
This lecture's going to be
divided into two parts roughly.
In the first part,
we're going to discuss
trust, whether we can trust
the software that we receive
and what implications that
might have for software
that's transmitted over the internet.
And in the second part, we're going to
talk about regulatory challenges that
might be faced.
As new emergent
technologies come into play,
how is the law prepared,
or is the law prepared
to contend with those challenges?
But let's start by talking about
this idea of a trust model,
trust model being a
computational term for basically
do we trust something that we're
receiving over the internet?
Do we trust that software
is what it says it is?
Do we trust that a provider is providing
a service in the way they describe,
or are there doing other
things behind the scenes?
Now, as part of this lecture, there's a
lot of supplementary reading materials
that we've incorporated in that
we're going to draw on quite a bit
throughout the course of today.
And the first of those is a paper
called "Reflections on Trusting Trust."
This is arguably one of the most
famous papers in computer science.
It was written in 1984 by Ken Thompson.
Ken Thompson was one of the
inventors of the Unix operating
system, on which Linux was
based, on which subsequently,
based on a version of
Linux, Mac OS is based.
And so he's quite a well-known figure
in the computer science community.
And he wrote this paper to accept an
award called the Turing Award, again,
one of the most famous
awards in computer science.
And in it, he's trying to highlight
the problem of trust in software.
And he begins by
discussing about a computer
program that can reproduce itself.
We typically refer to this as what's
called a quine in computer science.
But the idea is can you write a
simple program that reproduces itself?
And we won't go through
that exercise here.
But Thompson shows us that, yes,
it is relatively trivial actually
to write programs that do this.
But what does this then lead to?
So the next step of the process
that Thompson discusses is
stage two in this paper, is
how do you teach a computer
to teach itself something?
And he uses the idea of a compiler.
Recall that we use compilers
in some programming languages
to turn source code,
the human-like syntax
that we understand--
languages like C, for example,
will be written in source code.
And they need to be
compiled, or transformed,
into zeros and ones, machine
code, because computers only
understand these zeros and ones.
They don't understand
the human-like syntax
that we're familiar with as programmers
when we are writing our code.
And what Thompson is
suggesting that we can do
is we can teach the
compiler, the program that
actually takes the source code and
transforms it into zeros and ones,
to compile itself.
And he starts out by doing this
by introducing a new character
for the compiler to understand.
The analogy is drawn to the
newline character, which we
type when we reach the end of a line.
We want to go down and back
to the beginning of a new one.
We enter the newline character.
There are other characters that
were not initially envisioned
as part of the C compiler.
And one of those is vertical
tab, which basically
allows you to jump down several lines
without necessarily resetting back
to the beginning of the
line as newline would.
And so Thompson goes
through the process,
that I won't expound
on here because it's
covered in the paper, of how
to teach the compiler what
this new character,
this vertical tab means.
He shows us that we can write
code in the C programming language
and then have the compiler compile
that code into zeros and ones that
create something called
a binary, a program
that a computer can
execute and understand.
And then we can use that
newly created compiler
that we've just created to
compile other C programs.
Which means that once we've
taught the computer how
to understand what this
vertical tab character is,
it then can propagate into any
other C program that we write.
The computer is learning,
effectively, a new thing to interpret,
and it can then interpret
that in every other program.
But then Thompson leads
us into stage three,
which is, what if that's not all
the computer or the compiler does?
What if instead of just adding
that vertical tab character
whenever we did it, we also
secretly, as part of the source code,
insert a bug into the code, such
that now whenever we compile the code
and we encounter that backslash
V, that vertical tab character,
we're not only putting
that into the code
so that the computer can
understand and pass this slash
V, the character that it
never knew about before,
but we've also sort of surreptitiously
hidden a bug in the code.
And again, Thompson
goes into great detail
about exactly how that can be done
and exactly what steps we can then
take to make it look like
that was never there.
We can change the
source code, modify it,
and make it look like we
never had a bug in there,
even though it is now propagating
into all of the source code
we ever write or we ever
compile going forward.
We've created a way to
surreptitiously hide bugs in our code.
And the conclusion that
Thompson draws is, is it
possible to ever trust software
that was written by anyone else?
In this course we've talked about
some of the tools that are available
to programmers that would allow them
to go back in time-- for example,
we've discussed GitHub on several
occasions to go back in time--
and see prior versions of code.
In the 1980s, when
this paper was written,
that wasn't necessarily possible.
It was relatively easy to hide source
code changes so that the untrained eye
wouldn't know about them.
Code was not shared via the internet.
Code was shared via floppy disks
or hard disk that were being
passed between people who needed them.
And so there was no easy way to verify
that code that was written by somebody
else is actually trustworthy.
Now, again, this paper came
out 35-plus years ago now.
And it came out around the time
that the Computer Fraud and Abuse
Act, which we've also
previously discussed,
was being drafted and
run through Congress.
Did lawmakers heed the
advice of Ken Thompson?
Do we still today trust that
our programs that we receive
or that we write are free of bugs?
Is there a way for us to verify that?
What should happen if
code is found to be buggy?
What if it's unintentionally buggy?
What if it's maliciously buggy?
Do we have a way to
challenge things like that?
Do we have a way to prosecute
those kinds of cases
if the bug creates some sort of
catastrophic failure in some business?
Not exactly.
The challenge of figuring out whether
or not we should trust software
is something that we have
to contend with every day.
And there's no bright line
answer for exactly how to do so.
Now let's turn to perhaps a more
modern interpretation of this idea
and take a look at the
Samsung Smart TV policy.
So this was a bit of
news a few years ago,
that Samsung was recording or
was capturing voice commands
so people could make use of their
television without needing a remote.
You could say something
like, television,
please turn the volume up, or
television, change the channel.
But it turned out that when Samsung
was collecting this information,
they were transmitting it to a
third party, a third-party language
processor, who would ostensibly
be taking the commands they hear
and feeding them into their own database
to improve the quality of understanding
what these commands were.
So it would hear--
let's say thousands of people
use this brand of television.
It would take the thousands of people's
voices all making the same command,
feed it into its algorithm to process
this command, and hopefully try
and come up with a better or more
comprehensive understanding of what
that command meant to avoid
the mistake of I say one thing,
and the TV does something else
because it misinterprets what I do.
If you take a look at Samsung's
policy, it says things like the device
will collect IP addresses, cookies, your
hardware and software configuration, so
the settings that you have put onto your
television, your browser information.
Some of these TVs, these smart TVs,
have web browsers built into them.
And so you may be also sharing
information about your history
and so on.
Is this necessarily a bad thing?
When it became a news story it was
mildly scandalous in the tech world
because it was unexpected.
No one thought that that was something
a television should be doing.
But is it really all that different
from when you use your browser anyway?
We've seen in this course that
whenever we connect to a website,
we need to provide our IP address so
that the site that we're requesting,
the server, knows where
to send our data back to.
And in addition.
As part of those HTTP headers,
we not only send our IP address,
but we're usually sending information
about what operating system or running,
what browser we're currently
using, where geographically we
might be located, so ways
to help the routers route
traffic in the right direction.
Are we leaking as much
information when we
use the internet to make a request as we
are when our television is interpreting
or understanding a command?
Why is it that this particular
action, this interpretation of sound,
feels so much more of
a privacy violation
than just accessing something on the
internet when we're voluntarily, sort
of, revealing the same information?
Are we not voluntarily
relinquishing the same information
to a company like Samsung, whose
smart TVs sort of precipitated this?
Moreover, is it technologically
feasible for Samsung
to not collect all of
the sounds that it hears?
One of the big concerns
as well that came up
with these smart TVs is that when does
the recording and transmitting start?
For those of you who maybe have
seen old versions of Star Trek,
you may recall that in order to activate
the computers on that television show,
someone would just say computer.
And then the computer would
sort of spring to life,
and then they could have a normal
English language interaction with it.
There's no need to
program specific commands
or click anything or have any
other interaction other than voice.
How would we technologically
accomplish that now?
How would a device
know whether or not it
should be listening unless it's
listening for a specific word?
Is there a way for the
device to perhaps listen
to everything that comes in but
only start sending information
when it hears a command?
Is it impossible for it not to
capture all of the information
that it's hearing and send it somewhere,
encrypt it or not encrypt it, and just
transmit it somewhere else?
It's kind of an interesting question.
Samsung also allows not only voice
controls, but gesture controls.
This may help people who
are visually impaired
or help people who are unable
to use a remote control device.
They can wave or make certain gestures.
And in so doing, they're going
to capture your face perhaps
as part of this gesture.
Or they may capture certain
movements that you're making
or maybe even capture,
depending on the quality
of the camera built into the television,
aspects of the room around you.
Is this necessarily problematic?
Is this something that we
as users of this software
need to accept as something
that just is part of the deal?
In order to use this
feature, we have to do it?
Is there a necessary compromise?
Is there a way to ensure that Samsung
is properly interacting with our data?
Should there be a way
for us to verify this?
Or is that proprietary to Samsung,
the way that it handles that data?
Again, these are all sorts
of questions that we really
want to know the answers to.
We want to know whether or not what
we are saying we're doing is secure,
is private.
And we can read the policies of these
organizations that are providing
these tools for us to interact with.
But is that enough?
Do we have a way to verify?
Is there anything we can
do other than just trust
that these companies are doing
what they say they're doing,
or services or programmers
are providing tools that
do exactly what they say that they do?
Without some really advanced knowledge
and skill in tech, the answer is no.
And even if you have that
advanced skill or knowledge,
it's really hard to take
a look at a binary, zeros
and ones, the actual executable program
that is being run on these devices,
and look at it and say, yeah, I think
that that does match the source code
that they provided to
me so I can really feel
reasonably confident that yeah I trust
this particular piece of software.
As we've discussed in
the context of security,
trust is sort of something
we have to deal with.
We're constantly torn between this
tension of not trusting other people,
and so we encrypt everything, but
needing to trust people in order
for some things to work.
It's a very delicate balancing act
that we have to contend with every day.
And again, I don't mean
to pick on Samsung here.
This is just one of
many different examples
that have sort of existed
in popular culture.
Let's consider another one, for example.
Let's consider a piece of hardware
called the Intel Management
Engine, or hardware,
firmware, software, depending
on what it is, because
one of the open questions
is, what exactly is the
Intel Management Engine?
What we do know about it is that it
is usually part of the CPU itself.
It's unclear.
It's not exactly been publicly disclosed
whether it's built into the CPU
or perhaps built into the CMOS or the
BIOS, different parts, low-level parts
of the motherboard itself.
But it is a chip or some software that
runs on a computer, whose intended
purpose is to help network
administrators in the event
that something has gone
wrong with a computer.
So recall that we previously
discussed this idea
that it's possible to
encrypt your hard drive,
and that there are
also ramifications that
can happen if you
encrypt your hard drive
and forget exactly how to
un-encrypt your hard drive.
What the Intel Management Engine would
allow, one of its several features,
is for a network administrator,
perhaps if you're
in an enterprise suite, your IT
professional, your head of IT
might be able to access your computer
remotely by issuing commands,
because the computer is able
to listen on a specific port.
It's like 16,000 something.
I don't remember
exactly the port number.
And it's discussed again, as
well, in the article provided.
But it allows the
computer to be listening
for a specific kind of
request that should only
be coming from an administrator's
computer to be able to remotely access
another computer.
But the concern is because it's
listening on a specific port,
how is it possible to ensure
that the request that it's
receiving on that port or via
that IP address are accurate?
Because Intel has not
disclosed the actual code
that comprises this module of the IME.
And then the question
becomes, is that a problem?
Should they be required
to reveal that code?
Some will certainly argue yes
it's really important for us
as end users to understand what
software is running on our devices.
We have a right to know what programs
are running on our computers.
Others will say, no, we don't
have a right to do that.
This is Intel's intellectual property.
It may contain trade secret information
that allows its chips to work better.
We don't, for example,
argue Coca-Cola should
be required to reveal its secret
formula to us because it may implicate
certain allergies or
Kentucky Fried Chicken needs
to disclose its secret recipe to us.
So why should Intel
be required to tell us
about the lines of code that comprise
this part of its hardware or software
or firmware, again depending on exactly
what it is, because it's slightly
unclear as to what this tool is.
So the question again
is, are they required
to provide some degree of transparency?
Do we have a right to know?
Should we just trust that
this software is indeed
only being used to allow remote
access only to authorized individuals?
If Intel were to provide a tool to
tell us whether our computer was
vulnerable to attack from
outside computers accessing
our own personal computers
outside of the enterprise context,
should we trust the
result of the software
that Intel provided that tells us
whether or not it is vulnerable?
As it turns out, Intel
does provide this software
to tell you whether or not your
IME chip is activated in such a way
that yes, you are subject to potential
remote access or no, you are not.
Does saying that you are or
your aunt reveal potential trade
secret-related information about Intel?
Should we be concerned that
Intel is the one providing us
this information versus a third
party providing us this information?
Of course, Intel being
the only organization
that really can tell us
that we're vulnerable
or not because they're the only ones
who know what is on this software.
So again, not picking on
any individual company
here, just drawing from case studies
that exist in popular culture
from in tech circles about
the kinds of questions
that we need to start
considering and wrestling with.
Are they going to be required
to disclose this information?
Should Samsung be revealing
information about what sorts of data
it's collecting and
how it's collecting it?
Do we trust that our compilers,
as Ken Thompson alluded to,
actually compile our code the
way that they say that they do?
This healthy skepticism is always
at the forefront of our mind
when we're considering programming-
and technology-related questions.
But how do we press on these
issues further in a legal context?
That's still to be determined.
And that's going to be
something that we're
going to be grappling with
for quite some time, I think.
Another key issue that's likely
to be faced by technologists
and the lawyers who
represent them, particularly
startups working in a small environment
with limited numbers of programmers
that may be relying on material
that's been open sourced online,
is this idea of open source
software and licensing.
Because the scheme that exists
out there is quite complicated.
There are many, many
different licenses that
have many, many different
provisions associated with them.
And each one will have
different combinations
of some of these things being
permitted, some of them not,
and potential ramifications of
using some of these licenses.
We're going to discuss three of the most
popularly used licenses, particularly
in the context of open source software,
generally that is released on GitHub.
And the first of these is GPL version
3, GPL being the new Public License.
And one of the things that
GPL often gets criticism for
is it is known as a copyleft license.
And copyleft is sort of designed
to be the inverse of what copyright
protection's usually thought of.
Copyright protections give the owner or
the person who owns the copyright, not
necessarily the creator but the person
who owns the copyright, the ability
to restrict certain behaviors associated
with that work or that material.
The GPL sort of does the opposite.
Instead of restricting
the rights of others,
it compels others, who use code that
has been licensed under the GPL,
to avoid allowing any
restrictions at all,
such that others can also
benefit from using and modifying
that same source code.
The catch with GPL is that any
code that incorporates the GPL--
GPL license, excuse me.
Any code that includes
GPL-licensed code--
so say you incorporate some
module written by somebody else,
or your client incorporate
something that they found on GitHub
or found on the internet and wants
to include it into their own project.
If that code is licensed under the GPL,
unfortunately one of the side effects
perhaps of what your client
or what you have just done
is you have transformed your
entire work into something that
is GPL, which means you are also
then required to make the source
code available to anybody, make
the binary available to anybody,
and also to allow anybody to have
the same rights of modification
and redistribution that you had as well.
So think about some of the dangers
that might introduce for a company that
relies extensively on GPL license code.
They may not be able to
profit as much from that code
as they thought they would.
Perhaps they thought they had
this amazing disruptive idea that
was going to transform the market.
And this particular piece of
GPL code that they found online
allowed them-- it was the final
piece of the puzzle that they needed.
When they included it in
their own source code,
they transformed their
entire project, according
to the terms of the GPL license, into
something that was also GPL licensed.
So their profitability--
they could still sell it.
But their profitability may be
diminished because the source code is
available freely to anybody to access.
Now, some people find this
particularly restrictive.
In fact, pejoratively
sometimes this is referred
to as the GNU virus, the
General Public License virus,
because it propagates so extensively.
As soon as you touch
code or use code really
that is GPL licensed,
suddenly everything
that it touches is also GPL licensed.
So it's, depending on your
perspective of open source licensing,
it's either a great thing because
it's making more stuff available,
or it's a bad thing because
it is preventing people
from using open source material to
create further developments when they
don't necessarily want to license those
changes or modifications that they
made.
The lesser General Public License,
or the lesser GNU Public License,
is basically the same idea, but
it only applies to a library code.
So if code is LGPL-ed,
what this basically means
is any modifications that you make
to that code also need to be LGPL-ed,
or released under the LGPL license.
But other ancillary things that
you do in your program that
overall incorporates this library
code does not need to be LGPL-ed.
So it would be possible to
license it under other terms,
including terms that are
not open source at all.
So changes that you
make to the library need
to be propagated down the
line so that other people can
benefit from the changes that are
specific to the library that you made.
But it does not necessarily
reflect back into your own code.
You don't have to necessarily
make that publicly available.
So this is considered slightly lesser
in terms of its ability to propagate.
And also, though, it's considered
lesser in terms of its ability
to grant rights to others.
Then you have, at the other end
of the extreme, the MIT license.
The MIT license is considered one of
the most permissive licenses available.
It says, here's the software.
Do whatever you want with it.
You can make changes to it.
You don't have to re-license
those changes to others.
You can take this code
and profit from it.
You can take this code and
make whatever-- re-license it
under some other scheme if you want.
So this is the other end of the extreme.
Is this license copyleft?
Well, no, it's not copyleft
because it doesn't require others
to adhere to the same licensing terms.
Again, you can do with it
whatever you would like.
Most of the code that is actually
found on GitHub is MIT licensed.
So in that sense, using
code that you find online
is not necessarily problematic to an
entrepreneur or a budding developer who
wants to profit from some larger program
that they write if it incorporates
MIT-licensed code, which might be an
issue for those who are incorporating
GPL-licensed code.
What sorts of
considerations, then, would
go into deciding which license to use?
And again, these are just three
of many, many licenses that exist
that pertain to software development.
Then, of course, there
are open source licenses
that are not tied to this at all.
So for example, a lot of the
material that we produce for CS50,
the course on which this is
based at Harvard College,
is licensed under a
Creative Commons license,
which is similar in
spirit to a GPL license,
in as much as it oftentimes will require
people to re-license the changes that
they make to that material under GPL--
or under Creative Commons, excuse me.
It will generally require a
non-commercial aspect of it.
It is not possible to profit from
any changes that you make and so on.
And that's not a software license.
That's more of a general
media-related license.
So these software open source
licenses exist in both contexts.
But what sorts of considerations
might go into choosing a license?
Well, again, it really does
depend on the organization itself.
And so that's why understanding
a bit about these licenses
certainly comes into play.
Do you want your changes to
propagate and get out into the market
more easily?
That might be a reason to use the MIT
license, which is a very permissive.
Do you just feel compelled
to share code with others,
and you want to insist that
others share that code as well?
Then you might want to use GPL.
Do you potentially want
to use open source code
but not release your own code
freely to others, the changes
that you make to
interact with that code?
That might be cause for relying
on LGPL for the library code
that you import and use but licensing
your own changes and modifications
under some other scheme.
Again, a very complex
and open field that's
going to require a lot of
research for anyone who's
going to be pursuing
and helping clients who
are working with software
development and what they want
to do with that code going forward.
So let's turn our attention now from
issues that have existed for a while
and sort of been bubbling
underneath the surface,
issues of trust and issues
of software licensing--
those have been around a lot longer--
and start to contend
with new technologies
and how the law keeps up with them.
And so you'll also hear
these terms that are
being considered emergent
technologies or new technologies.
You'll sometimes see them referred
to as disruptive technologies
because they are poised to materially
affect the way that we interact
with technology, particularly
in terms of purchasing things
through commerce, for example, as in the
case of our first topic, 3D printing.
So how does 3D printing work, is a
good question to ask at the outset.
Similar in spirit to a 2D
printer, with a 2D printer
you have a write head that spits out
ink, typically in some sort of toner.
It moves left to right
across a piece of paper.
And the paper's also fed
through some sort of feeder.
So the left-to-right movement
of the toner or ink head
is the x-axis movement.
And the paper rolling underneath
that provides y-axis movements.
Such that when we're done,
we may be able to get access
to a piece of paper that has ink
scattered across it, left to right,
top to bottom.
3D printers work in very much the same
way, except instead of their medium,
instead of being ink or toner, is
typically some sort of filament that
is conventionally, at least at
the time of this recording, been
generally plastic based.
And what basically
happens is the plastic
is melted just to above the
melting point of the plastic.
And then it is deposited
onto some surface.
And that surface that is being moved
over by a similar read-write head,
basically it's a nozzle or
eyedropper basically of plastic.
And it can move up and
down across a flat surface,
similar to what the printer would do.
But instead of just being flat,
the arm can also move up and down.
On some models of 3D printers,
the table can move up and down
to allow it to not only print on the
xy-plane, but also on the z-axis.
So it can print in space and create
three-dimensional objects, 3D printing.
Typically the material used,
again, is melted plastic just
above the melting point.
So that by the time it's
deposited onto the surface
or onto other existing plastic,
it's already basically cooled
enough that it's hardened again.
So the idea is we want
to just melt it enough so
that by the time it's put
onto some other surface,
it re-hardens and becomes a
rigid material once again.
Now, 3D printing is usually considered
to be a disruptive technology
because it allows people to create items
they may not otherwise have access to.
And of course, the
controversial one that
is often spoken about in
terms of we need to ban things
or we need to ban certain 3D printers
or ban certain 3D printing technologies
is guns, because it's actually
possible, using technology
that exists right now, to
3D print a plastic gun that
would evade any sort of metal detection
that is usually used for detecting guns
and is fully functional.
It can fire bullets, plastic
bullets or real metal bullets.
The article that is recommended that
goes with this part of the discussion
proposes several different
ways that we might be able to--
or the law may be able to keep
up with 3D printing technologies.
Because, again, the law typically
lags behind technology, and so
is there a way that the
law can contend with this?
And there are a couple of
options that it proposes
that I think are worthy of discussion.
The first is allow
permission-less innovation.
Should we just allow
people to do whatever
they want with it, the
3D printing technology,
and decide ex post facto this,
what you just did, is not OK,
the rest of it's fine and disallow
that type of thing going forward?
This approach is interesting because
it allows people to be creative,
and it allows potentially
for things to be
revealed about 3D printing
technology that were not
possible to forecast in advance.
But is that reactive-based
approach better?
Or should we be proactive
in trying to prevent
the production of certain things
that we don't want to be produced?
And moreover, all the
plastic filament tends
to be the most popular and common way
that things are 3D printed right now.
3D printers are being developed that
are much more advanced than this.
We are not necessarily restricted
to plastic-based printing.
We may have metal-based printing.
And you may have even seen that
there are 3D printers that exist
that can produce organic materials.
They use human cells, basically,
to create things like organs.
Do we want people to be
able to create these things?
Is this the kind of thing that
should be regulated beforehand rather
than regulated after
we've already printed
and exchanged copyrighted designs
for what to build and construct?
Is it too late by the time we
have regulated it to prevent it
from being reproduced in the future?
Another thought that this article
proposes is immunizing intermediaries.
Should we allow people to do
whatever they want with 3D printing?
Or maybe not allow people to do
whatever they want 3D printing,
but regardless don't punish the
manufacturers of 3D printers
and don't punish the
designers of the CAD files,
the Computer-Aided Design files,
that generally go into 3D printing?
Is this a reasonable policy approach?
It's not an unheard of policy approach.
This is the approach that we
typically have used with respect
to gun manufacturers, for example.
Gun manufacturers generally are not
subject to prosecution for crimes
that are committed using those guns.
Should we apply something similar
to 3D printers, for example,
when the printer is used
to manufacturer a gun?
Who should be punished in
that case, the person who
designed the gun model,
the person who actually
printed the gun, the 3D
printer manufacturer itself,
any of those people?
Again, an unanswered question
that the law is going
to have to contend with going forward.
Another solution potentially is
to rely on existing common law.
But the problem that
typically arises there
is that there is not
a federal common law.
And so this would potentially result
in 50 different jurisdictions handling
the same problem in different ways.
Whether this is a good
thing or a bad thing, again,
sort of dependent on how
quickly these things move.
Common law, as we've seen,
certainly is capable of adapting
to new technologies.
Does it do it quickly enough for us?
Finally, another
example that is proposed
is that we could just allow the 3D
printing industry to self-regulate.
After all, we, as
attorneys, self-regulate,
and that seems to work just fine.
Now, granted this may be because
we are in an adversarial system,
and so there's advantages and
extra incentives for adversaries
to insist that we are adhering
to our ethical principles
and doing the right thing.
There's also the overhanging
threat of outside regulation
if we do not self-regulate.
So in a lawyer context, adapting
this model to 3D printing
may work because it seems to
be working well for attorneys.
Then you consider that social
media companies are also
self-regulating, with respect to
data protection and data privacy.
And as we've seen, that's
maybe not going so well.
So how do we handle the
regulation of 3D printing?
Does it fall into the
self-regulation category?
Does that succeed?
Does it fall into the self-regulation
category that doesn't succeed?
Does it require preemptive
regulation to deal with?
Now, 3D printing also has
some other potential concerns.
Very easily, by the nature
of the technology itself,
it's quite capable of violating
copyrights, patents, trademarks,
potentially more just by
the virtue of the fact
that you can create things that may be
copywritten or patented or trademarked.
And there's also prior case law that
sort of informs potential consequences
for using 3D printers, the Napster case
from several years ago, the technology.
Napster would allow peer-to-peer
sharing of digital music files.
Basically that service was
deemed to entirely exist
for the purpose of violating copyright.
And so that shut down Napster basically.
Will 3D printers suffer the same fate?
Because you could argue that 3D printers
are generally used to recreate things
that may be patented or may
be subject to copyright.
Or is it going to fall more
into a category like Sony, which
many years ago faced a lawsuit, or
was part of a lawsuit involving VCRs
and tape-delaying copywritten material?
Is that going to be more of
a precedent for 3D printing,
or is the Napster case going to be
more of a precedent for 3D printing?
Again, we don't really know.
It's up to the future practitioners
of technology law, who
are forced to grapple with the
challenges presented by 3D printing,
to nudge us in that direction,
one way or the other.
To dive a bit more deeply into
this topic of 3D printing,
I do recommend you take a look at
this article, "Guns Limbs and Toys--
What Future for 3D Printing?"
And if you're particularly
interested in 3D printing and some
of the ramifications of it and the
technological underpinnings of it,
I do encourage you to also take a look
at "The Law and 3D Printing," which
is a Law Review article from 2015, which
also is periodically updated online.
And it's a wonderful bibliography
of all the different things
that 3D printing does.
And it will presumably continue to
be updated as cases and laws come
into play that interact with 3D printing
and start to define this relatively
ambiguous space.
Another particularly
innovative space that
really pushes the boundaries of
what the law is capable of handling
is the idea of augmented
reality and virtual reality.
And we'll consider them in that order.
Let's define what augmented reality is.
And the most common example of
this that you may be familiar with
is a phenomenon from several
years ago called Pokemon Go.
It was a game that you
played on your mobile phone.
And you would hold up
your phone, and you
would see through the
camera's lens, as if you
were taking a picture, the real
world through the lens of the camera.
But superimposed onto that
would be digital avatars
of Pokemon, which is part of this
game of collectible creatures
that you're trying to walk around
and find and capture, basically.
So you would try and throw some
fake ball at them to capture them.
So augmented reality is some sort
of technical graphical overlay
over the real world.
Contrast this with virtual
reality, in which one typically
wears a headset of some sort.
It's usually proprietary.
It's not generally available
as an app, for example,
like the augmented-reality
game Pokemon Go was.
It's usually tied to a
specific brand of headset,
like Oculus being one type
of headset, for example.
And it is an immersive
alternate reality basically.
When you put the headset on, you don't
see the lens of the world around you.
You are transformed into another space.
And to make the experience
even more immersive
is the potential to wear
headphones, for example,
so that you are not only
immersed in a visual space,
but also immersed in a soundscape.
Now, something that's particularly
strange about these environments
is that they are still interactive.
It is still possible for
multiple people, scattered
in different parts of the world, to be
involved in the same virtual reality
experience, or the same
augmented-reality experience.
Let's now consider virtual
reality experiences, where
you are taken away from the real world.
What should happen if someone were to
commit a crime in a virtual reality
space?
Studies have shown that people who
are immersed in a virtual reality
experience can have
serious ramifications.
They can have real feelings
that last for a long time
based on their experiences in them.
For example, there's been a study out
where people put on a virtual reality
headset, and they were then
immersed in this space where
they were standing on a plank.
And they were asked
to step off the plank.
Now, in the real world, this
would be just like this room.
I can see that everything
around me is a carpet.
There's no giant pit
for me to fall into.
But when I have this headset on, I'm
completely taken away from reality
as we see it here.
The experience is so
pervasive for some people
that they walk to the edge of the
plank, and they freeze in fear.
They can't move.
There's a real physical
manifestation in the real world
of what they feel in this reality.
And for those brave people who are
able to take the step off the edge,
many of them lean forward and
try and fall into the space.
And some of them may
even get the experience
like when you're on a roller coaster,
and you feel that tingle in your spine
as you're falling.
The sense that that
actually is happening to you
is so real in the virtual reality
space that you can feel it.
So what would be the case, then, if
you are in a virtual reality space,
and someone were to pull
a virtual gun on you?
Is that assault?
Assault is a crime where your perception
of harm is a material element.
It's not actual harm.
It's your perception of it.
You can perceive in the real world
when somebody points a gun at you,
this fear of imminent bodily harm.
Can you feel that same imminent
bodily harm in a virtual world?
That's not a question that's
really been answered Moreover,
who has jurisdiction over a crime
that is committed in virtual reality?
It's possible that I,
here in the United States,
might be interacting
with someone in France,
who is maybe the perpetrator of this
virtual assault that I'm describing.
Is the crime committed
in the United States?
Is the crime committed in France?
Do we have jurisdiction over
the potential perpetrator,
even though all I'm
experiencing or seeing
is that person's avatar as
opposed to their real persona?
Does anyone have jurisdiction over it?
Does the jurisdiction only
exist in the virtual world?
Virtual reality introduces a lot
of really interesting questions
that are poised to redefine the
way we think about jurisdiction
in defining crimes and the
prosecutability of crimes
in a virtual space.
Some other terms just to bring up
as well that sort of tangentially
relate to virtual and augmented
reality so that you're
familiar with them are the real-world
crimes that are very technologically
driven of doxing and swatting.
Doxing, if unfamiliar,
is a crime involving
revealing or exposing the
personal information of someone
on the internet with the intent
to harass or embarrass or do
some harm to them by having
that exposed, so, for example,
revealing somebody's phone
number such that it can
be called incessantly by other people.
As well as swatting, which is a,
well, pretty horrible crime, whereby
an individual calls the
police and says, John Smith
is committing a crime at
this address, is holding me
hostage, or something like
that, with the intention
that the police would
then go to that location
and a SWAT team would go,
hence the term swatting,
and potentially cause serious injury
or harm to the ostensibly innocent John
Smith, who's just sitting
at home doing nothing.
These two crimes are
generally interrelated.
But they oftentimes come up
in the technological context,
usually as part of the same
conversation, when we're
thinking about virtual reality crimes.
One of the potential
upsides, though, if you
want to think about it like that,
of crimes that are committed
in virtual or augmented reality are--
well, there's actually a few.
First, because it is
happening in a virtual space,
and because generally in the virtual
space all of our movements are tracked,
and the identities of everybody
who's entering and leaving
that space are tracked
by way of IP addresses,
it may be easier for
investigators to figure out who
the perpetrators of those crimes are.
You know exactly the IP address of
the person who apparently initiated
this threat against you in the virtual
space, which may perhaps make it easier
to go and find that person
in reality and question them
about their involvement
in this alleged crime.
The other thing that's fortunately
a good thing about these crimes,
and this is not to mitigate the
effect that these crimes can have,
is that usually you can kind
of mute them from happening.
If somebody is in a virtual space,
and they're just screaming constantly,
such that you might consider that to
be disturbing the peace when you're
in a virtual space trying to have some
sort of pleasant experience ordinarily,
you usually have the
capability of muting them.
This is not a benefit
that we have in real life.
We generally can't stop crimes by
just pretending they're not happening.
But in a virtual space,
we do have that luxury.
That's, again, not to mitigate some
of the very unpleasant and unfortunate
things that can happen in virtual
reality that are just inappropriate.
But being in that
space does allow people
the option to get away from the crime
in a way that the confines of reality
may not allow.
But again, this is a
very challenging area
because the law is not
really equipped right now
to handle what happens in an
alternate reality, which effectively
virtual reality is.
And so, again, if you're considering
trying to figure out the best
way to prosecute these issues
or deal with these issues,
you may be at the forefront
of trying to define how crimes
are dealt with in a virtual space.
Or how potentially, if working
with augmented reality,
if malicious code is
put up in front of you
to simulate something that might
be happening in the real world,
how do you prosecute those kinds of
crimes, where you may be, for example,
using a GPS program that
is designed to navigate you
in one direction versus the
other based on the set of glasses
that you're wearing so you don't have to
keep looking at your phone to make sure
that you're going the right way.
What if somebody maliciously programs
that augmented-reality program to route
you off a cliff somewhere, right?
How do we deal with that?
Right now, again,
augmented-reality virtual reality,
it's a relatively untested
space for lawyers in the law.
In the second part of
today's lecture, we're
going to take a look at some potential
regulatory challenges going forward,
some issues at the forefront of law and
technology generally related to privacy
and how the law is ill
equipped or hopefully
soon to be equipped to handle the
challenge that these issues present.
And the first of these
is your digital privacy,
in particular, the abilities
of organizations, companies,
and mobile device manufacturers to
track your whereabouts, whether that's
your digital whereabouts,
where you go on the internet,
or your physical whereabouts.
We'll start with the former,
your digital whereabouts.
So there's an article we provided
on digital tracking technologies.
This is designed to be a primer
for the different types of things
that companies, in particular
their marketing teams,
may do to track individuals
online with, again,
relatively little recourse
for the individuals
to know what sorts of
information is being gathered
about them, at least in the US.
Now, of course, we're
familiar with this idea
of a cookie from our discussion
of interacting with websites.
It's our shorthand way to
bypass the logging credentials
and show sort of a virtual hand stamp
saying, yes, I am who I say I am.
I've already previously
logged into your service.
Cookies are certainly
one way that a site
can track a recurrent user from coming
to the site over and over and over.
Now, this article posits
that most consumers have just
come to accept that
they're being tracked,
like that's just part of
the deal with the internet.
Do you think that using
cookies and being tracked
is an essential requirement of what
it means to use the internet today?
And if you do think that, is
that the way it should be?
And if you don't think that, is
that also the way it should be?
Or should we be considering the
fact that tracking is happening?
Is that an essential part of what
it means to use the internet?
We also need to be concerned
about the types of data
that companies are using
or collecting about us.
Certainly cookies are one
way to identify who we are.
But also it's possible for a cookie to
be identified with what types of data
an individual accesses while
visiting a particular site.
So for example, if I am on
Facebook, and I'm using my cookie,
and I'm looking up lots
of pictures on Facebook--
I'm just I'm searching
for all my friends
profiles and clicking on all the
ones that have cats in them--
that might then give Facebook, or
the administrator of that site,
the ability to pair that cookie
with a particular trend of things
that that cookie likes.
So in this case, it might want to then--
it knows, OK, maybe the person who
owns this cookie likes cats.
And as such, it may
then start to serve up
advertisements related to cats to me.
And then when I log
into a site, it's going
to get information about my IP address.
And if I use that cookie, it has
now mapped my IP address to the fact
that I like cats.
And then it could sell the information
about me, this particular IP address--
I guess it's not necessarily me because
one IP address usually covers a house
but gets you pretty close--
maps this particular IP address
to somebody who likes cats.
So they may sell that
to some other service.
Now, it turns out that IP
addresses are generally
allocated in geographic blocks, which
means that, again, just by virtue
of the fact that I log
into a particular site,
I'm able to access and access
similar data when visiting that site.
They may not be able to
geographically isolate down to--
again, depending on how populated
the area you are currently living in
is, possibly narrow it down to a city
block, that someone in this city block
really likes cats.
And then this company may be involved
in targeted actual physical mail
advertising, snail
mail advertising, where
some company that sells cat products,
like a pet store or something,
might target that particular block with
advertising, in the hopes that because
of this data that has been collected
about this particular cookie, who then
logged in with a particular
IP address, which
we've zeroed in to a particular
geographic location--
it's kind of feeling a
little unsettling, right?
Suddenly something that we do online
is having a manifestation, again,
in the real world, where we're
getting targeted advertising not just
on sites that we visit, but
also in our mailbox at home.
It's a little bit discomfiting.
Should IP addresses be
allocated in this way?
Is this the kind of thing that
technologically can be changed?
The latter answer is yes,
it is possible to allocate
IP addresses in a different
way than we typically do.
Should we allocate IP addresses in a
different way than we typically do?
Is the potential threat of
receiving real-life advertisements
related to your online activities
enough to justify that?
What would be enough to
justify that kind of change?
Then, of course, there's the question
of tracking not in the digital world,
but in the real world.
This is usually done through
mobile phone tracking.
And so we provide an article from
the Electronic Frontier Foundation.
And full disclosure, some of the
articles we've presented here
do have a certain bias in them.
The Electronic Frontier Foundation
is well-known as a rights advocacy
group for privacy.
And so they're going to naturally
be disinclined to things that
involve tracking of data and so on.
So just bear that in mind,
some additional context
when you're considering this article.
But it does contain a lot of
factual information and not
necessarily just purely opinion
about things that should be changed.
Although it does advocate
for certain policy changes.
Now, why is it that
tracking on a mobile device
is oftentimes perceived as much worse
than tracking on a laptop or desktop?
Well, again, first of all,
it's your mobile device
is generally with you at all times.
We've reached the point where our phones
are generally carried in our pockets
and with us wherever we go, which
means that it's very easy to use data
that's collected from mobile phone--
information that's given
out by the mobile phone,
whether that's the cell phone
towers or GPS data and so on,
to pinpoint that to us.
The other concern is that mobile
phones are very, very quick
to become obsolete.
Oftentimes one or two
versions of a new version
of a phone, whether it's a new
Android phone release or software
release or a new iPhone or so on, the
version that came out two years ago
is generally obsolete, which means it
is no longer subject to firmware patches
provided by the
manufacturer or the software
developers of the
operating systems that are
run on those phones, which could
also mean that they are much more
susceptible to people figuring
out how to break into those phones
and use that tracking
information against you.
So laptops and desktops
generally don't move that much.
You may carry your laptop
to and from but generally
to just a couple locations.
It's usually set at a
desk somewhere in between.
Your desktop, of course,
doesn't move at all.
So the tracking potential
there is pretty minimal.
And also those devices tend
to last quite a long time,
and the lifecycle support for service
and keeping those operating systems
up to date is quite a bit
longer versus the mobile phone,
where that window is much, much shorter.
Now, phones, contrary to most
people's opinions of this,
phones do not actually track your
information based on GPS data.
The way GPS works is your
phone just fires off a signal,
and it gets a response back
that is trying to triangulate
where exactly you are in space.
But there's no information about what
device requested that data or so on.
And generally that data's not stored
on the phone or in the GPS satellite
in any way.
It's just sort of
ask-and-answer type inquiry.
The real threat vector for phone
tracking, if this is the kind of thing
that you're concerned about, is
actually through cell phone towers
because cell phone towers
do track this information.
Different companies
own different towers.
They would like to know
who is using each tower,
whether or not this may
involve also charging the--
say I'm using a Verizon
phone, and I happen
to be connected to an AT&T tower.
AT&T may wish to know that this is
mostly being used by Verizon customers.
And the only way they
really know that is
by mapping the individual
device to the phone number,
then checking that
against Verizon's records.
And so they are collecting
all this information
about every phone that connects
their tower so they could potentially
bill Verizon for the
portion of their customers
who were using their infrastructure.
So these towers do track information.
And towers also can be used
to triangulate your location.
If I'm standing in the middle
of an open field, for example,
and there's a tower over there
and a tower maybe just beside me,
generally the signal that
I'm sending-- my phone
is emitting a signal constantly.
If I'm emitting one
signal in that direction,
and it's received by a tower fairly
weakly, and if I'm emitting another--
my phone is, again, radially
sort of emitting the signal.
If right next to me is
another tower that's
picking it up very
strongly, in space I can
use the information, sort of
extrapolating from these two points,
I'm most likely here.
So even without having GPS turned on,
just by trying to make a phone call
or use a 2G, 3G, 4G
network, it's pretty easy
to figure out where you are in space.
And this is potentially a concern.
This concern comes up
sometimes in the context
of are these companies who provide
operating systems for phones
or firmware for phones, are they at
the behest of government agencies, who
may request back doors into the
devices so that they can then
spy on individuals?
And certainly this
might be something that
comes up in a FISA
court or the like, where
they're trying to get phone records.
And there's always this sort of unknown.
Is it happening to all of
our devices all the time?
Is it is it happening right
now the phone in my pocket?
Or is the sound being
captured in such a way
that it can be transmitted just because?
Because there happens to be
a backdoor in the operating
system or a backdoor
in the firmware that
allows anybody to listen
to it, even if they're not
supposed to be listening to it.
It's really hard to pretend to be
somebody that you're not with a phone.
As you saw, it's pretty easy
to pretend to be somebody
that you're not with a computer you
can use a service like a VPN, which
pretends to be a different IP address.
You connect to the VPN.
And as long as you trust VPN, the VPN
ostensibly protects your identity.
With mobile phones, every
device has a unique ID.
And it's really hard to change that ID.
So one way around this
is to use what are
called burner phones, devices
that are used once, twice,
and then they're thrown away.
Now, this again comes down to how
concerned are you about your privacy?
How concerned should you
be about your privacy?
Are you concerned enough that you're
willing to purchase these devices that
are one-time, two-time use
devices, which you then
throw away and constantly do that?
And moreover, it's actually
kind of interesting to know
that burner phones don't actually do--
they're not shown to do anything to
protect one's identity or privacy
because it tends to be the case
that we call the same people,
even if we're using different phones.
And so by virtue of the
fact that this number seems
to be calling this number
and this number all the time,
like maybe it's my work line
and my family, my home number.
If I'm always calling those two
numbers, even if the phone number
changes, a pattern can still be
established with the device IDs of all
of the other phones, maybe my regular
phone plus all the burners that I've
had, where you can still
craft a picture of who I am,
even though I'm using different
devices, based on the call patterns
that I'm making.
As usual, humans are
the vulnerability here.
Humans are going to use the same--
they're going to call the same people
and talk to the same people
on their phones all the time.
And so it's relatively easy for
mobile devices to track our locations.
Again, every device has a unique ID.
You can't hide that ID.
That ID is part of something that
gets transmitted to cell towers.
And potentially the threat
exists that if somebody
is able to break into
that phone, whether that's
because of old, outdated
firmware that's not been updated
or because of the potential that
there is some sort of backdoor that
would allow an agent, authorized
or not, to access it, again,
this vulnerability exists.
How does the law deal with do you own
the information that is being tracked?
Do you want that information to
be available to other people?
It's an open question.
Another issue at the forefront
of where we're going,
especially when it comes to legal
technology and law firms itself
availing itself of technology, is
artificial intelligence and machine
learning.
Both of these techniques are
incredibly useful potentially
to law firms that are trying to
process large amounts of data
relatively quickly,
the type of work that's
generally been outsourced to contract
attorneys or first-year associates
or the like.
First of all, we need to
define what it means when
we talk about artificial intelligence.
Generally when we think
about that, it means
something like pattern recognition.
Can we teach a computer to
recognize specific patterns?
In the case of a law firm,
for example, that might be can
it realize that something looks like
a clause in a contract, a valid clause
that we might want to see
or a clause that we're
hoping not to see in our contracts.
We might want to flag that
for further human review.
Can the machine make a
decision about something?
Should it, in fact,
flag that for review?
Or is it just highlighting things
that might be alarming or not?
Can it mimic the operations
of the human mind?
If we can teach a computer
to do those things--
we've already seen that
we can teach a computer
to teach itself how to reproduce bugs.
We saw that in Ken
Thompson's compiler example.
If we can teach a computer
to mimic the types of things
that we would do as
humans, that's when we've
created an artificial intelligence.
There's a lot of potential uses
for artificial intelligences
in the legal profession, like
I said, document review being
one potential avenue for that.
And there are a few different types of
ways that artificial intelligences can
learn.
There are actually two kind
of prevailing major ways.
The first is for humans to
supply some sort of data
and also supply the rules that
map the data to some outcome.
That's one way.
The other way is something
called neuroevolution,
which is generally best exemplified
by way of a genetic algorithm.
In a moment, we'll take a look at a
genetic algorithm literally written
in Python, where a machine learns
over time to try and generate
the right result.
In this model, we give the
computer a target, something
that it should try and
achieve, and request
that it generates data
until it can match
that target that we are looking for.
So by way of example,
let's see if we can
teach a computer to write Shakespeare.
After all, it's a theory that
given an infinite amount of time,
enough monkeys could write Shakespeare.
Can we teach a computer to do the same?
Let's have a look.
So it might be a big ask to get a
computer to write all of Shakespeare.
Let's see if we can get this
computer to eventually realize
the following line, the target, so
to speak, "a rose by any other name."
So we're going to try
and teach a computer.
We want a computer to
eventually on its own
arrive at this phrase using
some sort of algorithm.
The algorithm we're going to use to
do it is called the genetic algorithm.
Now, the genetic algorithm is called
this based on the theory of genetics,
that best traits or good traits
will propagate down and become
part of the defined set of
traits we usually encounter.
And bad traits, things that
we don't necessarily want,
will be weeded out of the population.
And over successive generations,
hopefully only the good traits
will prevail.
Now, just like any
other genetic variation,
we need to account for a mutation.
We need to allow things to change.
Otherwise we may end up in
a situation where all we
have is the potential for bad traits.
We randomly might need something to
happen to eliminate that bad trait.
We have no other way to do it.
So we do have to mutate some of
our strings from time to time.
How are we going to teach
the computer to do this?
We're not providing it with
any data set to start with.
The computer's going to generate its own
data set, trying to get at this target.
The way we're going to do this is
to create a bunch of DNA objects.
DNA objects, in this example, we're just
going to refer to as different strings.
And the strings are just a random--
as exemplified here in this
code, a random set of characters.
We're going to have it randomly pick.
I believe that the string's
about 23 characters long
that we're trying to have it match.
So it's going to randomly
pick 23 characters,
uppercase letters, lowercase
letters, numbers, punctuation marks,
doesn't matter, any
legitimate Ascii character,
and just add itself to the
list of potential candidates
for the correct phrase.
So randomly slam on your
keyboard and hit 23 keys.
The computer has about 1,000
of those to get started.
Every one of those strings,
every one of those DNA items,
also has the ability to
determine how fit it is.
Fitness being is it more likely
to go on to the next generation?
Does it have characteristics that we
might want to propagate down the line?
So for example, the way we're
going to, in a rudimentary way,
assess the fitness of a string, how
close it is basically to the target,
is to go over every single
character of it and compare,
does this match what
we expect in this spot?
So if it starts with a T--
or excuse me, starts with an
A, "a rose by any other name,"
if it starts with an A, then
that's one point of fitness.
If the next character is a space,
then that's one point of fitness.
So a perfect string will have all of
the characters in the correct space.
But as long as it has even just
one character in the correct space,
then it is considered fit.
And so we iterate over all of
the characters in the string
to see if it is fit.
Now, much like multiple generations, we
need the ability to create new strings
from the population that we had before.
And so this is the idea of crossover.
We take two strings.
And again, we're just
going to arbitrarily decide
how to take two strings
and mash them together.
We're going to say the first half
comes from the mother string,
and the second half comes
from the father string.
And that will produce a child, which
may have some positive characteristics
from the mother and some
positive characteristics
from the father, which may then make us
a little bit closer towards this idea
of having the perfect string.
Again, the idea here is for
the computer to evolve itself
into the correct string rather than
us just giving it a set of data
and saying, do this.
We want to let it figure
it out on its own.
That's the idea of
the genetic algorithm.
So we're going to arbitrarily
split the string in half.
Half the characters, or genes of
the string, come from the mother.
The other half come from the father.
They get slammed together.
That is a new DNA sequence of the child.
And then again, to
account for mutation, we
need some random percent of the time,
in this case, we're saying less than 1%
the time, we would like one of
those characters to randomly change.
So it doesn't come from the
mother or the father string.
It just randomly changes into
something else, in the hopes
that maybe that mutation will be
beneficial somewhere down the line.
Now, in this other
Python file, script.py,
we're actually taking those strings
that we are just randomly creating--
those are the DNA objects
from the previous file--
and starting to actually
evolve them over time.
So we're going to start out with
1,000 of these random strings.
And the best score so far,
the closest score we have,
the best match to "a rose by any
other name" is currently zero.
No string is currently there.
We may randomly get it
on the first generation.
That would be a wonderful success.
It's pretty unlikely.
Population here is just an array.
It's going to allow us to store
all of these 1,000 strings.
And then as long as we have not
yet found the perfect string.
The one that has 100% fitness
or a score of exactly 1,
we would like to do the following,
calculate the fitness score
for every one of those random
1,000 strings that we generated.
Then, if what we just found is better
than anything we've seen before--
and at the beginning,
we start with zero,
so everything is better than what
we've seen before, as long as it
matches at least one character--
then print out that string.
So this is a sense of progression.
Over time we're going to see the strings
get better and better and better.
Then we're going to create
what's called a mating pool.
Again, this is this idea of two
strings sort of crossing over.
They're sort of breeding to try and
create a better subsequent string.
Depending on how good
that string is, we may
want that child to be in the
next population more times.
If a string is a 20% match,
that's pretty good, especially
if it's an early generation.
So we may want that string to appear in
the mating pool, the next generation,
20% of the time.
It has a better likelihood than a
string that matches 5% of the characters
to be closer to the right answer.
So a string that barely
matches anything,
sure, it should be in the pool.
Maybe it has the one character
that we're looking for.
But we only want it in
the pool 5% of the time
versus the string that
matches 50% of the characters.
We probably want that in
the pool 50% of the time.
The idea is, again, taking the best
representatives of the next generation
and trying to have the computer learn
and understand that those are good
and see if they can build better
and better strings from those better
and better representatives
of the population that
are more close to the target
string that we're looking
for, "a rose by any other name."
Then in here all we're doing
is picking two random items
from that pool we've just created
of the best possible candidates
and mating those two
together and continuing
this process of hopefully getting
better and better approximations
of this string that we're looking for.
And what's going to happen there is
they're going to create a crossover.
That crossover child DNA string will
mutate into some other new string.
And we'll add that to the population
to be considered for the next round.
So we're just going keep
going over and over and over,
generating hopefully
better and better strings.
So that's how these two files interact.
The first file that we took a look
at defines the properties of a string
and how it can score itself basically.
And this process here in script.py--
and this these two files are
based on a Medium post, which
we've described in the course materials,
as well as an exam question that we've
previously asked in the
college version of CS50,
for students to implement
and solve on their own.
Hopefully these two files taken
together, the script file,
will actually go through the process of
creating this generation over and over.
So let's see this in action.
Let's see how in each
successive generation
we see strings get closer and closer
and closer to the target string.
Again, we never told
the computer-- we never
gave the computer a set of starting
data to work with, only an end goal.
The computer needs to
learn how to get closer
and closer to finding the right string.
And that's what we do here.
So let's run our program and see if
we've actually taught the computer how
to genetically evolve itself to
figure out this target string
that we're looking for.
So we're going to run script.py,
which is the Python file where
we described the process happening.
And let's just see how the
generations evolve over time.
So we get started, and we have
some pretty quick results.
This first string here has a matching
score of 0.042, so 4%, which I believe
is one character.
So if we scroll through, we try and
find "a rose by any other name,"
I don't know exactly which
character it is here.
But this is basically saying one.
One of these characters matches.
It's 4.2% what we're hoping for.
That means that in the next
pool, the next iteration,
this string will be
included 4.2% of the time.
And there may also be other
strings that also match.
Remember, we're only printing
out when we have a better string.
So this only going to get
included 4.2% of the time.
But there are going to
be plenty of other things
that are also 4.2% matches that are
probably matching-- each one of them
matches one different character.
So those will comprise part of the pool.
Then we're going to cross pollinate.
We're going to take
each of those strings
that each had a one character
match and mash them together.
Now, if the first string
that we're considering
has the character match
in the first half,
and the second string has a
character match in the second half,
now we've created a new string
that has two matches, right?
We know one of them
was in the first half.
That came from the mother string.
We have one of them in the second half
that came from the father's string.
And so the combined string
together, unless that character
happens to get mutated out,
which is a possibility--
we might actually take a good thing
and turn it into a bad character.
Then the next one
should be twice as good.
It should be 8.3% or 8.4% likely.
And that's exactly what it is.
So this next string has two matches.
And the next one has three and four.
And as we kind of scroll down,
we see some patterns like this,
A question mark Q Y. That's obviously
not part of the correct answer.
But it suggests that there's a parent
in here that has this string that
tends to have really good fitness.
Like this string probably has many other
characters outside of this box here
that match.
And so that parent propagates
down the line for a while
until eventually those characteristics,
in about the ninth generation or so,
get kind of wiped out.
And as we can see over
time, what starts out
as a jumbled mess gets closer
and closer to something
that is starting to look even at 58%
like we're getting pretty close to
"a rose by any other name."
And as we go on and on, again, the
likelihood gets better and better.
So that by the time we're
here, at this line here,
this string is going to
appear in 87 and 1/2%
of the next generation's population.
So a lot of these characteristics
of this string that's close but not
exactly right will keep, appearing
which makes it more and more likely
that it will eventually pair
up with another string that
is a little bit better.
And as you probably saw, towards the
end, this process got slower, right?
If all the strings are
so good, it might just
take a while to find one where the
match is better than the parents.
It might be the case
that we are creating
combinations that are worse again.
We want to filter those back out.
And so it takes a while to find
exactly what we're looking for.
But again, from this random string
at the very beginning, over time,
the computer learns what parts are good.
So here's "rose," right,
as part of the string.
This was eventually correct.
This got rooted out in
the next generation.
It got mutated out by accident.
But mathematically, what it
found was a little bit better.
There are more characters in this
string that are correct than this one,
even if there are some recognizable
patterns in the former.
But the computer has learned,
evolved over time what it
means to match that particular string.
This is the idea of
neuroevolution, teaching a computer
to recognize patterns without
necessarily telling it
what those patterns are, just
what the target should be.
So that genetic algorithm is kind
of a fun programming activity.
But the principles that underpin
it still apply to a legal context.
If you teach a computer to recognize
certain patterns in a contract,
you can teach a computer
to write contracts
potentially that match those patterns.
You can teach a computer
to recognize those patterns
and make decisions based on them.
So we were using neuroevolution
to build or construct something.
But you can also use neuroevolution
to isolate correct sets of words
or correct sets of phrases that
you're hoping to see in a contract
or that you might want to
require for additional use.
So again, the types of legal work
that this can be used to help automate
are things like collation, analysis,
doing large document review,
predicting the potential
outcome of litigation
based on having it review
case precedents and outcomes
and seeing if there are any trends
that appear in cases X, Y, and Z all
had this outcome.
Is there some other
common thread in cases
X, Y, and Z that might also apply
to the case that we're about to try?
Or potentially we need to settle
because we see that the outcome is
going to be unfavorable to us.
But does this digital lawyering
potentially make you uncomfortable?
Is it OK for legal decisions
to be made by a computer?
Is it more OK if those decisions
are made because we've trained them
with our own human instincts?
There are services out there.
There's a famous example of a parking
ticket clearing service called Do Not
Pay from several years ago, where
a 19- or 20-year-old computer
programmer basically
taught a computer how
to argue parking tickets
on people's behalf
so that they wouldn't have
to hire attorneys to do so.
He wasn't a trained attorney himself.
He just recognized some
of the things that are--
he talked to people and
recognized some of the things that
are common threads for people
who successfully challenged
parking tickets versus don't
successfully challenge parking tickets,
taught a computer to
mimic those patterns,
and have the computer send out
notices and the like to defend parking
ticket holders.
And he was able to--
I think it was several hundred thousand
dollars in potential legal fees saved
and several hundred thousand
parking tickets that
were challenged successfully.
And the case was dropped, and
there was no payment required.
So is it OK for computers to be making
these decisions if humans teach them?
Is it only OK for computers
to make those decisions
if the humans teaching them have
legal training at the outset in order
to make these decisions?
Or can we trust programmers to write
these kinds of programs for us as well?
Does lawyering rely on a gut instinct?
I'm sure sometimes in
cases you've experienced
in your own practice
the decision that you
make might be contrary to what
you think might be the right thing
to do because you just feel
like if I do this other thing
it's going to work better in this case.
And I'm sure that for many of you,
this has paid off successfully.
Doing something that is in
contravention of the accepted norm
is something that a
computer may not be-- you
may not be able to train
a computer to do that.
You may not be able to train gut
instinct to challenge the rules,
when all this whole idea of
neuroevolution and machine
learning and AI is designed to have
computers learn and enforce rules.
Will the use of AI affect
the attorneys' bottom line?
Hypothetically it should
make legal work cheaper.
But this would then
potentially reduce firm profits
by not having attorneys,
humans, reviewing this material.
This is, in some ways, a good thing.
It makes things more
affordable for our clients.
This is in some ways a bad thing.
We have entrenched expenses that we need
to pay that are based on certain monies
coming in because of the hourly rates
of our associates and our partners.
Does this change that up?
Does the fact of this changes
it up, is it problematic?
Is it better for us to provide the most
competent representation that we can,
even if that competent representation
is actually from a computer?
Remember that as attorneys, we have an
ethical obligation to stay on top of
and understand technology.
Sometimes that may become a
situation where using that technology
and working with that
technology really forces
us to do something we
might not want to do
because it doesn't feel
like the right thing
to do from a business perspective.
Nevertheless our ethical obligations
compel us to potentially do that thing.
So we've seen some of the good
things that machine learning can do.
But certainly there are also some bad
things that machine learning can do.
There's an article that we provided
about machine bias and a computer
program that is ostensibly supposed
to be used by prosecutors and judges
when they are considering
releasing somebody on bail
or setting the conditions
for parole, whether or not
they're more likely to
commit future crimes.
Like, what is their
likely recidivism rate?
What kind of additional support
might they need upon their release?
But it turns out that the data that
we're feeding into these algorithms
is provided by humans.
And unfortunately
these programs that are
supposed to help judges make better
decisions have a racial bias in them.
The questions that get asked
as part of figuring out
whether this person is more likely
or not to commit a future crime,
they're never outright asking
the question, what is your race
and basing a score on that.
But they're asking other questions that
sort of are hints or indicators of what
someone's race might be.
For example, they're asking
questions about socioeconomic status
and languages spoken and
whether or not parents have ever
been imprisoned and so on.
And these programs sort of stereotype
people in ways that are not OK,
or we might not deem to be OK
in any way, to make decisions.
And these stereotypes
are created by humans.
And so we're actually teaching
the computer bias in this way.
We're supplying data.
We, as humans, are providing it.
We're imparting our
bias into the program.
And the program is
really just implementing
exactly what we're telling it to do.
Computers, yes, they are intelligent.
We can teach them to learn
things about themselves.
But at the end of the day,
that knowledge comes from us.
We are either telling them to hit
some target or providing data to them
and telling them these
are the rules to match.
So computers can are only as intelligent
as the humans who create and program
them.
And unfortunately that means
they're also as affected by bias
as the humans who
create and program them.
These programs have been
found that they are only 20%
of the time accurate in producing
and predicting future violent crimes.
They are only 60% of the
time accurate in predicting
any sort of future crime,
so misdemeanors and so on,
so a little bit better than a
50/50 shot at getting it right
based on these predictive questions
that they're asking people when
during intake process.
Proponents of these scoring metrics
say that they provide useful data.
Opponents say that the
data is being misused.
It's being used as part of
sentencing determinations
rather than what its
ostensible purposes, which
is to set conditions for
bail and set conditions
for release, any sort of parole
conditions that might come into play.
These calculations are
also done by companies
that generally are for-profit entities.
They sell these programs to states and
localities for a fixed rate per year
typically.
Does that mean that there's a financial
incentive to make certain decisions?
Would you feel differently
about these programs
if they were not free
versus paid programs?
Should computers be involved in
making these decisions that humans
would otherwise make anyway?
Like, given a questionnaire,
would a human being
potentially reach the same conclusion?
Ideally that is what it should do.
It should be mimicking the
human decision-making process.
Is it somehow less slimy feeling,
for lack of a better phrase,
if a human being, a
judge or a court clerk,
is making these determinations
rather than a computer?
Now, granted the judge is
still making the final call.
But the computer is printing
out likely recidivism scores
and printing out all
this data about somebody
that surely is going to
influence the judge's decision
and in some localities, perhaps over
influencing the judge's decision,
taking the human element
out of it entirely.
Does it feel better if the computer
is out of that equation entirely?
Or is it better to have a
computer make these decisions
and potentially prevent mistakes from
happening prevent or draw attention
to things that might otherwise be missed
or minimize things that might otherwise
have too much attention drawn to them?
Again, a difficult question
to answer, how much do we
want technology to be involved in
the legal decision-making process?
But as we go forward, it's
certainly undoubtedly true
that more and more
decisions in a legal context
are going to be made by
computers at the outset,
with humans sort of falling into
the verification category rather
than active decision maker category.
Is this good?
Is this bad?
It's the future.
For entities based in
the United States or who
solely have customers
in the United States,
this next area may not be a
concern now but it's very likely
to potentially become one in the future.
And that is what to do with
GDPR, the General Data Protection
Regulation, or General
Data Privacy regulation
that was promulgated
by the European Union
and came into effect in May of 2018.
This basically defines the right
for people to know what kind of data
is being collected about them.
This is not a right that currently
exists in the United States.
And it'll be really interesting
to see whether the EU
experiment about revealing this
kind of data, which has never
been available to individuals
before, will become something
that exists in the United States
and is going to be something
that we have to deal with.
If you're based in the United States,
and you do have customers in Europe,
you may be subject to the GDPR.
For example, us at
CS50, we have students
who take the class through at edX, or
HarvardX, the online MOOC platform.
And when GDPR took effect in
May of 2018, we spoke to Harvard
and figured out ways that we
needed to potentially interact
with European users of our platform,
despite the fact that we're
based in the United States, and
what sort of data implications
that might have.
And that it could be because of it's out
of an abundance of caution to make sure
we're on the right side
of it, even if we're not
necessarily subject to the
GDPR, but it is certainly
an area of evolving concern
for international companies.
The GDPR allows individuals
to get their personal data.
That means data that either could
identify an individual, something
like what we discussed earlier
in terms of cookies and tracking
and the kinds of things that you search
being tied to your IP address, which
then might be tied to your
actual address and so on,
or data that even could
identify an individual
but doesn't necessarily
identify somebody just yet.
The requirement itself
imposes requirements.
The regulation itself
imposes requirements
on the controller, so the person
who is providing a service
or is holding all of
that data, and basically
says that what the controllers
responsibilities are
for processing that data and what they
have to reveal to users who request it.
So for example, on request,
by a user of a service,
when that user and the
controller are subjects the GDPR,
the controller must identify
themselves, who they are,
what the best way is to contact them,
tell the user what data they have
about them, how that
data is being processed,
why they are processing that
data, so what sorts of things
are they trying to do with it.
Are they trying to make longitudinal
connections between different people?
Are they trying to collect it to
sell it to marketers and so on?
They need to tell them if that data is
going to be referred to a third party,
again, whether that's selling the data
or using a third-party service to help
interpret that data.
So again for example,
in the case of Samsung,
that might be Samsung is
collecting your voice data.
But they may be sharing
all the data they
get with a third party, whose
focus, whose programming focus
is about processing that data and
trying to find out better voice
commands by collecting the
voices of hundreds of thousands
of different people so
they can get a better
synthesis of a particular thing they
hear, translating that into a command.
These same restrictions
will apply whether the data
is collected or provided by the user,
or is just inferred about the user
as well.
So that the controller would
also need to reveal information
that was gleaned about somebody
without necessarily having just
been given to them directly by the
person providing that personal data.
The owner can also compel the controller
to change data about them once they
get this report about what data they
have about them that is inaccurate,
which brings up a really interesting
question of, what if something
is accurate, but you
don't like it, and you are
a person who's providing personal data?
Can you challenge it as inaccurate?
This is, again, something
that has not been answered yet
but is very likely to be answered
at some point by somebody.
What does it mean for
data to be inaccurate?
Moreover, is it a good thing
to delete data about somebody?
There are exceptions that exist in
the GDPR for preserving data or not
allowing it to be deleted if
it serves the public interest.
And so the argument that is
sometimes made in favor of GDPR
is someone who commits a
minor crime, for example,
might be haunted by this one mark
on their record for years and years
and years.
They can never shake it.
And it's a minor crime.
There was no recidivism.
It wasn't violence in any way.
It just has now hampered--
it's impacted their life.
They can't get the kind of job
that they want, for example.
They can't get the kind of
apartment that they want.
Shouldn't they be able
to eliminate that data?
Some people would argue yes, that the
individual's already paid the price.
Society is not harmed by this crime
or this past event any longer.
And so sure, delete that data.
Others would argue no,
it's a part of history.
We don't have a policy
of erasing history.
That's not what we do.
And so even though it's annoying
perhaps to that individual,
or it's had a non-trivial
impact on their life,
we can't just get rid of
data that we don't like.
So data that might be deemed
inaccurate personally,
like if a company gets a
lot of information about me
because I'm doing a lot of
online shopping, and they say,
I'm a compulsive spender, and
that's part of their processed data,
can I challenge that
is inaccurate because I
don't think I'm a compulsive spender?
I feel like I earn enough money and
can spend this money how I want,
and it has an impact
on my life negatively.
But they think, well, you've
spent $20,000 on pictures of cats.
Maybe you are kind of
a compulsive spender.
And that's something that
we've gleaned from this data,
and that's part of your record.
Can I challenge that?
Open question.
For those of you who may be contending
with the GDPR in your future practice,
we've excerpted some parts of it
that are particularly relevant,
that deal with the
technological implications
of what we've just discussed
as part of the recommended
reading for this module.
The last subject that we'd
like to consider in this course
is what is kind of a political hot
potato right now in the United States.
And that is this idea of net neutrality.
And before we get into
the back and forth of it,
I think it's properly
important for us to define
what exactly net neutrality is.
At its fundamental core, the idea
is that all traffic on the internet
should be treated equally.
We shouldn't prioritize
some packets over others.
So whether your service is
Google, Facebook, Netflix,
some huge data provider, or
you are some mom-and-pop shop
in Kansas somewhere that
has a few customers,
but you still have a
website and a web presence,
that web traffic from either
that location, the small shop,
or the big data provider
should be treated equally.
One should not be
prioritized over the other.
That is the basic idea that underpins--
when you hear net neutrality,
it is all traffic on the web
should be treated equally.
The hot potato, of course, is,
is that the right thing to do?
Let's try and visualize
one way of thinking
about net neutrality that kind of shows
you how both sides might perceive this.
It may help to think about net
neutrality in terms of a road.
Much like a road has
cars flowing over it,
the internet has
information flowing over it.
So we can think about
this like we have a road.
And proponents of net
neutrality will say, well,
wait a minute, if we built a second road
that was parallel to the first road,
went to the same place, but this
road was maybe better maintained,
and you had to pay a toll to use
it, proponents would say, hey, wait,
this is unfair.
All this traffic needs
to use this main road
that we've been using for a long time.
But people who can afford to
go into this new road, where
traffic moves faster, but you
have to pay the toll, well, then
their traffic's going to be prioritized.
Their packets are to get there faster.
This is not fundamentally fair.
This is not the way the
internet was designed,
where free flow of information
is sort of priority,
and every packet is treated equally.
So proponents of net neutrality
will say this arrangement is unfair.
Opponents of net
neutrality, people who feel
like you should be able to
have traffic that goes faster
on some roads than others,
will say, no, no, no, this
is the free market talking.
The free market is
saying, hey, if I really
want to make sure that my
service gets to people faster,
I should have the right to do that.
After all, that's how the market
works for just about everything else.
Why should the internet
be any different?
And that's really the basic idea.
Is it should everybody
use the same road,
or should people who can afford to use
a different road be permitted to do so?
Proponents will say no.
Opponents will say yes.
That's the way the free market works.
From a theoretical perspective
or from a technical perspective,
how would we implement this?
It's relatively easy if the
service that we're trying to target
has paid for premium service.
Their IP addresses associated
with their business.
And so the internet service
provider, the people
who own the infrastructure on which the
internet operates, so they literally
own the fiber optic cables
along which the data operate,
can just say, well, any data
that's going to this IP address,
we'll just prioritize
it over other traffic.
There might be real reasons to actually
want to prioritize other traffic.
So for example, if you are
sending an email to somebody
or trying to access a website, there's
a lot of redundancy built in here.
We've talked about TCP, for example,
the Transmission Control Protocol,
and how it has redundancy built in.
If a packet is dropped,
if there's so much network
congestion because everybody's
flowing along that same road,
if there's so much congestion
that the packet gets dropped,
TCP will re-send that packet.
So services that are low impact, like
accessing a website for some company
or sending an email to somebody,
there's no real worry here.
But now imagine a service
like you're trying
to make an international
business video call
using Skype or using
Google Hangouts, or you're
trying to stream a movie on Netflix
or some other internet video streaming
provider.
Generally, those packets
are not sent using TCP.
They're usually using a
different protocol called
UDP, whose purpose in life is really
just to get information to as quickly
as possible, but there's no redundancy.
If a package gets dropped, that
packet gets dropped, so be it.
Now, imagine if you're having
an international business call.
There's a lot of packets
moving, especially if you're
having a call with Asia, for example.
Between the United States and Asia, that
has to travel along that Pacific cable.
There's a lot of traffic that
has to use that Pacific cable.
Wouldn't it be nice, advocates
against net neutrality would say,
if the company that's
providing that service
was able to pay to ensure that
its packets had priority thus
reducing the likelihood of
those packets being dropped,
thus improving the quality of the
video call, thus generally providing,
theoretically again, a better
service for the people who use it.
So it might be the case that some
services just need prioritization.
And the internet is
designed in such a way
that we can't guarantee or
give them that prioritization.
Isn't that a reason in favor
of repealing net neutrality,
making it so that people could
pay for certain services that
don't work with redundancy and
require just to get there quickly
and get there guaranteed
over other traffic?
In 2015, the Obama administration, when
the Federal Communications Commission
was Democratically controlled,
voted in favor of net neutrality,
reclassifying the internet as a
Title II communications service.
Meaning it could be much more
tightly regulated by the FCC
and imposing this net
neutrality requirement.
Two years later, when the Trump
administration came into office,
President Trump appointed Ajit Pai,
the current chairman of the FCC,
who basically said he was going to
repeal the net neutrality rules that
had been set in place by
the Obama administration.
And he did.
Those took effect in the summer of 2018.
So we're now back in this
wild lands of net neutrality
is on the books in some places.
There are even states
now who have state laws
that are designed to enforce this
idea, this theory of net neutrality,
that you're now running into
conflict with federal law.
So there's now this question
of who wins out here?
Has Congress claimed this domain?
Can states set different rules from
what Congress and what regulators
appointed by or delegated responsibility
by Congress to make these decisions?
Can states do something
different than that?
It is probably one of the most
hot-button hot-potato issues
in technology and the law right now.
What is going to happen with
respect to net neutrality?
Is it a good thing?
Is it a bad thing?
Is it the right thing
to do for the internet?
To learn a bit more
about net neutrality,
we've supplied as an additional
reading a con take on net neutrality.
Generally you'd see pro takes
about this in tech blogs.
But we've explicitly included a con
take on why net neutrality should not
be the norm, which we really do
encourage you to take a look at
and consider as you
dive into this topic.
But those are just
some of the challenges
that lie at the intersection
of law and technology.
We've certainly barely
skimmed the surface.
And my hope is that I've created
far more questions than answers
because those are the
kinds of questions that you
are going to have to answer for us.
Ultimately it is you,
as practitioners, who
will go out and face these
challenges and figure out
how we're going to deal with
data breaches, how we're
going to deal with AI
in the law, how we're
going to deal with net neutrality,
how we're going to deal with issues
of software and trust.
Those are the questions for the
future that lie at this intersection.
And the future is in your hands.
So help lead us in the right direction.
