Hello everyone! Virtual welcome to the very
first lecture of CS 220 / CS 319. As I mentioned
as part of my welcome video, I am planning
to split every 50-minute lecture into two
or three shorter videos, so that they are
easier to follow. And I will be applying the
same to the introduction lecture as well.
So let’s get started and talk about what
exactly "Data Programming I" course is all
about. As the name of the course suggests,
it is all about data analysis. So the world
right now is swimming in an abundance of data,
than ever known historically to the human
kind. So we have a lot of data explosion is
various fields, even in fields that you might
not think of immediately on the top of your
head when it comes to data explosion. The
natural question that you get out of this
is, “so we have an abundance of data, what
exactly can we do with that?" That is the
second part, which is the programming. So
if you have good programming skills, you can
do a lot of analysis on datasets, ask specific
questions about datasets, find out answers
about it and find good ways of visually representing
such answers. That’s exactly what all of
you are going to learn throughout this semester.
So let me give you a few examples about fields
that you might not think of when it comes
to data explosion. One such example is journalism.
This image on the top left is from a website
called 538 (https://fivethirtyeight.com/).
If you do not know what 538 is, I would suggest
that you go check it out. It is sort of a
new kind of innovative journalism where .. the
articles are associated with statistics and
graphical visualizations and things like that,
as opposed to the historical, anecdote-based
journalism. Let me move on to the next field
that I want to talk about, which is biology.
Human brain is a fascinating organ! It is
just amazing to understand various aspects
of the human brain and to be able to figure
out how exactly it works. That is a lot of
data again to process. You can probably observe
and record everything, but you need to be
able to analyze the data in order to answer
specific questions from the abundance of data
that you can possibly record. DNA is another
example that you can think of in terms of
... finding interesting information about
a human being. DNA is .. a lot of data, which
is available for a very cheap price nowadays.
Moving on to the next example, the image on
the bottom left is from a Hadron Collider.
Let us talk about physical sciences. As you
can see, experiments in the physical sciences
have to deal with so much data on a per second
basis, which means that your analysis also
needs to be efficient enough to be able to
answer really good questions based on that
data. The final example that I would like
to provide is engineering-based simulation,
which is given by the aircraft image on the
bottom right. As you can see, some parts of
the aircraft are marked in blue versus some
parts are marked in green. This is sort of
a stress-based simulation to ... understand
how exactly stress impacts airline passengers.
So you do not need to actually do physical
experiments just with data, you can even come
up with simulations, which enable you to answer
questions about the data. We have seen that
data explosion is happening in many fields,
... as I mentioned the natural follow up question
is "how can we get insights from this data?"
The answer to that question is computation.
So let's talk about two different approaches
to computation, the first one is something
which I am tagging as "human computation".
In the early days when ... we did not have
machine computers, humans used to do a kind
of a job called "Computer". A computer used
to be just like a job like a typist or a banker
or something like that. So what exactly computers
used to do? They used to just like follow
a step-by-step algorithm in a meticulous fashion
and apply the algorithm in order to crunch
numbers and find out answers to questions.
So obviously this is not the approach that
we are going to be following. So let's talk
about the alternate approach, which is machine-based
computation. You should be aware that right
now computing machines are extremely cheap
and easy to access. There are like so many
warehouse full of ... servers that you can
use and do data analysis on. There are a lot
of data centers, which host a lot of cloud-based
compute, which you can take very easy advantage
of leverage if you know how to do programming.
So, CS220 is obviously about the second approach,
where we apply fast and reliable programming
principles to analyze data and to find meaningful
insights into it. So let me actually talk
about a quote given by Larry Page, who is
the founder of Google. The quote is all about
"Find the leverage in the world, so that you
can be more lazy". So, that might surprise
you that Larry Page said that, right! You
wouldn't actually ... think about Larry Page
in the sense that he might say people can
be lazy! I am sure he didn't build Google
just by ... being lazy, lying around. So what
exactly does he mean in this quote is not
the traditional sense of lazy, it is actually
about saving precious time of human beings
and rather like leveraging the computation
power that is available through the machines
and to be able to write programs to make your
life easier, so that you can use your precious
time for other important tasks. And that is
what we are going to learn as part of 220
this semester. So what exactly is programming,
it is just this ability to be able to tell
computers what to do and how to do it. That
is exactly what we are going to learn as part
of this course. So, I am going to introduce
a term which is something called as "bilinguial".
So it is related to speaking two different
languages, but not like in a literal sense
that you can speak Spanish and English, but
from a scientific sense that you are an expert
in like your field of study, which could be
anything like biology, journalism or like
social sciences. When you combine your expertise
in your area of study, along with your skill
of programming and the computation that you
can perform, you become a very very useful
resource. That is exactly what a Data Scientist
position is all about. Data Science is not
just about computer scientists analyzing a
lot of data, it is all about domain experts
applying their domain knowledge ... both using
their domain knowledge and the programming
skills to be able to manipulate data, specific
to their domain. So let's talk about the difference
between CS 220 versus a typical introduction
CS course, which introduces you to programming
languages like C++ or Java. So, one of the
main differences I would say between regular
CS intro course and CS 220 is that, the emphasis
on CS 220 is on data and we do not actually
put an emphasis on theory of computation,
as opposed to the other introduction courses.
What are we going to learn as a programming
language in 220 is a language called Python.
It is very powerful but, easier as the very
first programming language to learn. What
can you do by learning Python programming
is ask specific questions about datasets and
emphasize how exactly you can communicate
answers to such questions in the form of visualizations
to other people so that they can actually
understand how your analysis works. So Data
Scientist position as such, as you can see
from this glassdoor screenshot is considered
one of the top 50 best jobs in America and
I am quite sure it is still applicable in
2020 as well, given with this current situation
with the world right now, programming is a
very useful and very resourceful skill for
anyone to learn. Alright, let me actually
browse through the rest of my introduction.
I am Meena, again. So what did I miss in my
introduction video. I am passionate about
running and working out in general. That is
my husband and me doing the Degree Dash (Doctoral
Derby) in ... 2018 I think, hopefully I am
correct :). So the image on the right beside
our picture is from one of my research publications
, which has to do with how exactly to use
cell phone tower's physical location to deploy
specific mobile edge computing servers. I
am not going to talk more about it, but I
am just briefly saying what exactly my research
is about. Alright, let me move on to the next
slide which introduces Mike. Mike is going
to be my co-instructor, he is going to be
leading lecture 003. As you can see, just
like me, Mike is also into physical fitness.
He is actually a chemist, he has his Ph.D.,
in chemistry, which is amazing. He decided
to switch over to Computer Science somewhere
mid-career. So what Mike and I are going to
be doing as part of this course is that, we
will be co-ordinating everything. So you can
possibly go and attend any of the Q/A sessions
as part of any of the lectures. You won't
be missing out things if you're doing that.
Alright, so the third person that I want to
introduce is Tyler. You probably won't interact
much with Tyler, but I still want to introduce
him because he has designed this entire course.
and you still might see his name somewhere
here and there. That is because Mike and I
are still using all the infrastructure built
by Tyler for the course website, for the project
submission, and things like that. So Tyler
is right now teaching CS 320, which is a follow
up course after CS 220. So just in case if
you are interested to enrolling into that
in the future semesters. Alright, so obviously
we can't do student introductions, even if
this lecture were in-person, we wouldn't have
been able to do that! So what you can do instead
is, you need to go fill out this "Who are
you" survey, which you should have gotten
a link to at this point. So let me talk about
a couple of things about this survey. You
need to be able to logged into your wisc account
in order to get access to the survey form.
So, sometimes this might be problematic to
you because Chrome browser often stores cookies,
so it might directly sign you into your gmail
account, if that is the case, then I would
suggest that you should use an incognito window
in Chrome browser to get around this problem.
So why exactly should you fill this survey?
It counts for participation. I will talk more
about it when we discuss how the grading split
up works for this course. Alright, let me
talk a little bit more about how exactly we
use this data. As you would have guessed,
this is a course which is all about data analysis.
So we are obviously going to use the data
that comes out of any of the survey that we
send out. Unfortunately, we have like the
charts plotted from almost a year ago. We
haven't gotten a chance to update them yet.
But ... I still feel these charts are still
relevant. So let me give a couple of examples
of questions that we ask as part of the "Who
are you?" survey. So the first question here
is about is this the first time that you are
taking a CS course. As you can see, ... there
are students who have taken CS course previously,
whereas there are still students, in this
course who have never had a programming course
done before. So, let me talk about the second
question here, which is asking what is the
largest program that you have written prior
to doing this particular course? As you can
see, there are like various options. These
options are the same in your current "Who
are you?" survey as well. So ... the main
... part of the pie that I am interested here
is this part (blue pie), where like you have
N/A and these are the kind of students who
have never written a program in their life,
but come into this course and get really good
programming skills and ... apply those to
their real world requirements. So this course
is all about introducing programming to brand
new programmers. If you are like enrolling
into this course thinking like you can get
a lot of advanced topics from this course,
this is probably not the right course for
you. Also right now, CS 220 is a mandatory
course if you are declaring a Data Science
major. We are trying to create a work-around
for that for people who are really experienced,
for example this 1K slice of pie (purple pie),
where people might have self-taught themselves
Python and like even Data Science part of
this course. Just that we don't have a solution
right now to that. Alright, let's take this
second pie chart over here and look at more
details about the students who had like experience
in programming before. So the next chart is
a bar chart, which is going to ... plot what
grades each of those categories of students
got. So we take like each of the pie in this
chart and plot that as like one line in the
bar chart. So as you can see, as expected
people who have prior experience do really
well, whereas people who have never had experience
also do get an A grade in this course. I want
to take this moment to mention that, if you
work really hard and if you are dedicated,
you can achieve really good grades in this
course. But at the same time, I want to caution
you that this is a very competitive course.
So stay on top of things and you should be
fine, just like keep up with all the projects
and deadlines. Alright, let's talk about specific
feedback left by students. So I have like
three different examples here. So the first
one is from a CS senior student who has mentioned
that this class was very easy for them. Like
I mentioned, this is an introduction class
for people who have like no background of
programming or like very little background
of programming. The second example talks about
... how none of them will ever code again.
That kind of makes me feel sad. I absolutely
believe that all of you will be applying programming
and coding skills to real-world problems and
that's what is motivating me to be an instructor
to this course. So the third example is the
right kind of example that I can put up. Here
is a student who had like zero programming
experience prior to this course and they came
in and they found the course to be in good
pace and they really enjoyed learning programming.
I hope ... majority of you will fall under
the third category. Alright at this point,
I am going to take a break and the next video
is going to be about course logistics.
