- This is the first of a
three part workshop series
that we're bringing to you,
Introduction to Data Analysis
for Aspiring Data Scientists.
So, this workshop is gonna
be an Introduction to Python
on Databricks.
And we have two more workshops coming up,
so the next one, part two,
is gonna be next week on April 15,
and then the third part
is Machine Learning
on April 22.
So, visit the Data Plus
AI online meetup page
and RSVP for those events so
you get the notifications,
we'd love for all of you to join us,
especially if you find
this workshop useful.
So, just a quick few reminders.
So, we have a bunch of
different kind of upcoming
online meetups,
interviews,
workshops, tech talks,
and we all launch them
from the Data Plus AI online meetup group.
So, I'll drop these links in the chat
and I'll also include a follow
up message with these URLs,
so make sure to join that group
so you can check out what's upcoming,
and then also, I know
we have a bunch of folks
that are joining us from YouTube Live,
so thank you for that.
And we always broadcast
or we set reminders,
so make sure to subscribe on YouTube
and turn your notifications on.
So, whenever we launch
a new online offering,
that syncs to YouTube,
and so you can set a reminder
for yourself to join us
Live there if you'd like to.
And then also to joining
this Zoom broadcasts
allows you to participate
in the chat function
and then also ask questions in the Q&A.
And I'll drop these links
in the chat as well.
And then I did just send out a message
to everyone who RSVP
these are two links that will
be useful for this session,
again, I'll drop them in
the chat in a few minutes
once I'm handed over to Chengyin,
but I just wanted to put
these up here for a second
if anybody wanted to try to grab them.
So, now I'd like to have all
of our instructor Chengyin
and other TAs just to
do a quick introduction,
maybe just 30 seconds wave and say hi,
we have a bunch of TAs
that are gonna be answering Q&A questions,
so, that's where you're gonna
wanna drop your questions,
and they'll be helping
throughout the presentation.
So maybe Amir if we can
just start with you,
you wouldn't mind saying a quick hello.
- Sure, can you hear me now?
- Yeah.
- Okay, hi everyone.
My name is Amir Issaei,
I'm a senior sata science
consultant at Databricks.
I work in the same team as Conor,
Brooke and Chengyin.
So, I spend my time about
40% training customers
and 56% of the time, implementing
machine learning solution.
So I'm gonna be the TA as Karen mentioned,
so if you have questions,
just drop them in the chat
and me and Conor and Brooke
try to help to help you understand
the things you're asking
or answering the questions.
Thank you.
- Thanks, Conor.
- Hi folks, my name is Conor Murphy
and I also work with
Amir, Brooke, Chengyin,
and so we all work very closely together
on consulting engagements for data science
and then some training as well.
And so if you have any
questions, feel free to ping us
we're happy to answer them.
- My name is Brooke Wenig.
I lead in machine learning practice team
along with Chengyin, Amir and Conor,
and just as a side note,
Conor will be teaching
the next session on pandas
and Amir will be teaching
the following session
on machine learning,
so I'm really excited this
is really a whole team effort
to put together this series for you.
- Thanks everyone,
yeah, now I'll pass it to Chengyin Eng
who is our instructor for today.
So I'll stop sharing my screen
and let you take it away.
- That's good.
Hi, everyone,
my name is Chengyin Eng,
you can call me Chengyin.
I think all my coworkers have introduced
my responsibility pretty well,
I work with customers
on consulting projects,
helping them to implement
data science solutions
and also deliver missionary trainings.
So we have a packed schedule here,
I'm just gonna go ahead and get started,
I'm gonna go ahead and share my screen,
oops sorry,
that was the wrong screen.
You share,
this one.
Okay.
So to get started with this workshop,
you will need a Community Edition account
to log into the database platform.
If you don't have account yet,
please go to this link
databricks.comm/try-databricks,
and you need to sign up
to give in, to sign up for your account
and you will receive an
email to verify your account
and that's when you can sign up for,
that's when you can log into
the Community Edition portal.
So I'm gonna give all of you
a few minutes over here to get set up,
and if you don't mind
telling me when you are done,
you can go ahead and
go to the chat function
and tell me you're done
then that's when I know that
I can proceed to logging
into the Community Edition.
Cool, I'm just gonna give
it a couple more minutes.
Yeah, so you can also
see on my screen here,
there's a link, try-data bricks,
Yep, we are using a Community Edition,
so when you will sign up for
a Community Edition account.
Okay, I think some of us have already
completed the sign up
session, the sign up process,
don't worry if you are
lagging behind a little bit
because the notebook will
also be available later
for you to work on that, on your own time.
So let's go ahead and log
into the Community Edition,
quick log in.
Okay, so if you are seeing this page
it means that
you have successfully logged
into the data bricks platform,
which is a cloud based big data platform.
There is a bunch of icons
here on the left hand side
the side all over here,
I'll be introducing to you some
of them during the session,
but first, let's go ahead and
click on the cluster icon,
which is this icon that
looks like a graph or tree.
So during today's session,
rather than using your computer's
local computing resources,
we'll all be using the
database computing resources.
And so to do that, we will
need to create a cluster first.
So let's go ahead and hit
create cluster button,
I'll explain to you what a
cluster means in a second,
and what this details, means here as well,
but for now, let's put in our name
to create a cluster,
and you can hit create cluster.
So we go to the customer page,
when you have successfully
set up the cluster,
you should have a spinning
icon that tells you that
their status is pending
because it's pending to set up,
so if you click into this cluster,
you can look at the
configurations of this cluster.
So what is the Databricks cluster?
A Databricks cluster is a
set of combination resources.
The edition that we're using
now is Community Edition,
which means that it is free,
it comes with 15 gig of memory,
and, you will also terminate automatically
two hours after your
last command as executed.
So this Community Edition allows you
to access a small cluster
to run your code in notebook environment.
So this Community Edition
is good for prototyping
simple applications, but it's
not meant for production.
So Community Edition is also
using Amazon Web Services,
AWS under the hood,
so it will take a few minutes to spin up.
So what it looks like
on your page right now
you should still be spinning,
a spinning green icon.
You will see that there is
a Databricks runtime version
over here,
and what this means is that
is one (mumbles) version
is backed up by Apache Spark,
and that's all you need to know for now,
in this introductory class,
we will just be using regular Python
so we won't be getting into
the nitty gritty of using
distributed computing using this Spark.
So that introduces what cluster is
if you're using computing
resources on the tab.
So if you want to work
together with your peers
on your notebook,
you can invite your peer to your workspace
and that's when a workspace
icon comes in handy,
so we can click on this
icon workspace icon,
you can see that there's
a shared and Users,
my under the user,
it should be just your email
because you're the only
person in the workspace.
And if you were to invite your
friends to this workspace,
you can click on this person icon
on the top right hand corner,
and click on admin console.
That's when you can add a user,
to invite friends.
So in this Community Edition,
you can invite up to two users.
So it means that in your work space
you can have 3 user accounts altogether.
So now I'm gonna show you
how to access the notebook
that we're going to use
during this session,
lets go to the GitHub page,
which is this,
github.com/databricks/techtalk.
If you are not familiar
with what a GitHub is,
GitHub, it's like a Google Drive,
It keeps track of your code
It can host code for you.
I would encourage you to
follow along what I'm doing,
if you want to, you know, run code cells
when, I conduct this session,
but you can also choose to sit
back and watch my demo after
and play with the
notebooks, in your own time.
So you can see that there
is a bunch of folders here,
all the future resources
for the upcoming workshops
will also be posted in,
just get hot repo as well.
For today's workshop,
you will be interacting with this folder,
Introduction to Python,
so let's click into it,
you'll see a different files over here,
there is a lab for you
to do it in your time,
so during today's session,
we'll just be using
Python fundamentals file,
so let's click into it
and then let's grab this link,
command copy,
command C to copy this link.,
you will also notice that
this file has an extension,
I-P-Y-N-B.
It means that he's a Jupyter Notebook,
it can be,
you can also access this notebook
in your local Jupyter environment as well.
But I'm gonna show you how you can import
regular Jupyter Notebook
into Databricks environment.
So now that you have this link,
copy it,
and then go to this Databricks
homepage again,
and you can click on home
button right here in a side bar,
and you can see that your name, is up here
and you can see there is a
arrow for you to click on it,
and you can see there is a drop down menu.
So you can choose the create a notebook,
create a new notebook
or library or folder,
but for today's purpose,
because we already have a link to import,
so let's go ahead and click import,
and you can see that you
can import either from file
or from URL,
so you can choose to download
a file and upload it,
but because we have a link,
so I'm just gonna use URL
and I'm gonna paste it in
here and I click import.
So it should take you automatically
to the notebook that you wanted to access,
and beneath the title of this notebook,
you will see there is a word here detach,
and if you have a word, it says right now,
if it's not attached to a cluster,
you select the cluster to attach.
And you also notice that
this icon looks familiar
because it is the cluster
icon that we just put into.
So we are going to attach
this notebook to a cluster
by clicking on this dropdown
and attach your cluster
that you just set up
to this notebook.
So what, it means by attaching
a notebook to a cluster,
means you can execute
article on this notebook
on this particular cluster.
So when, you attach a notebook
it means that the notebook
is using the resources
of this cluster.
So in this session,
we'll be covering what numbers are,
strings, variables,
print statements,
lists, for loops, functions,
conditional statements
and checking different types.
There is a good reference
sheet for you to bookmark,
you'll click into the show you
here is a Python sheet,
that shows you the
different Python syntax,
and there is another link to show you,
official tutorial,
if you want to, learn more
about Python after this
is developed by the Python developers,
you can follow along
their lessons as well.
You may also notice that this notebook
has a combination of text and also code
and you might also wonder
what a markdown cell is.
if I double click into this cell,
you can see that this all
starts with percentage MD,
MD stands for lockdown.
So I can write my text
and render it as text rather than code.
So if I hit Shift + Enter,
it will compile the cell
and show that it's a text.
And you've also noticed
that I started the headings
with different pound size,
like different,
different number of concise,
here the top header has
only one pound sign,
and then the second
header has two pound sign,
so the top level will be bigger.
So that's how you can
write a markdown cell.
You can also choose to run
a cell by hovering over,
you know the top part
top right hand part of this cell,
you can see that there is a delete button,
minimize button,
and also edit menu
to, perform different operations
on this notebook cell.
So now let's go ahead and
try to interact with Python
on this Data bricks Notebook environment,
so lets use this as a calculator,
just real simple,
we're gonna use,
we're gonna just type in one plus one,
you can either press Shift
+ Enter to run this cell,
or you can also go to this
top right corner over here
to hit the run button,
to hit run cell.
Or you can just click on it,
and it knows that you'll
need to run the cell.
So you can see here that one plus one,
yeah, so that was our first Python code
in the Python environment.
You can see that we have already,
have our output, we know it's two,
so send me to check,
that was correct.
You can also interact with strings,
for example if I type in ice cream
and wrap it in quotes,
it doesn't matter if you use single quotes
or double quotes, it
would do the same thing,
so I Shift + Enter,
you can see that it is all
putting ice cream for me.
I can also choose to
concatenate strings together,
so you can see that here's
the first part of the string
here is second part,
and I'm going to concatenate
them using the plus sign.
So I'm gonna hit Shift + Enter again,
you can see that it's
now printing ice cream
is paradise for me,
but we got to space
that's because it's not smart enough
to figure out that we
need a space in between,
so if you do wanna
space, you need to press,
you have to enter a space in there.
So notice that when numbers,
Python knows that you
should add on numbers,
but with strings,
it would think that you want to
concatenate those strings together.
So now let's move on to variables.
So a variable in Python really
is just a named unit of data,
you can assign your value,
like this value for example,
to any name that you want,
but of course the more intuitive
you name your variable,
the more helpful it is for you,
when you want to look
back at your own code.
Like for example,
if I want to run this line,
I like ice cream with
the pound sign in front,
this means that this type
of code is commented up
and to uncomment or
comment of line of code,
all you need to do is to press Command + /
if you are in the mac environment.
So if I run, try to run this cell
is gonna give me an error,
it says invalid syntax,
because Python is gonna think that
there is a variable that's named,
I like ice cream,
and it's gonna try to print,
I and like
ice and cream, you know,
for the variable value.
But it's not going to work
because we need to assign
the value to a variable.
And by assigning our value to a variable,
all you need to do is
to have an equal sign.
And you can see that now
I'm going to assign ice cream
as the best food.
So I'm gonna use equal
sign to assign the value.
So we're under cell, again, Shift + Enter,
then you will see that the
best food is now ice cream.
So notice that I can also just
update this best food
variable whenever I want.
Like for example,
I can choose to update
my best food to be, pizza
and notice that here,
I'm gonna show you that
here I'm using double quotes
here single quotes but
they really work the same.
So what best food would
print is actually pizza,
because Python remembers
that the latest variable,
the value is pizza rather than ice cream.
So I'm gonna run this,
you can see that it's showing us pizza,
rather than ice cream.
Okay, because like ice cream better,
so I'm just gonna uncomment
itself and run the cell again,
so that the best food is
always gonna be ice cream.
So moving on,
you can also choose to print statements.
You may wonder why do we
even need to specify print?
Because, here I am not using print
and I can print things just fine.
The utility of having a print statement
is that you will force,
it would automatically,
it will force the Python to
print out every statement
that you want.
Like, for example,
if I were to, not have,
you know this sign over here,
and if I were to print this,
it would only print best food
because this is the latest line.
So for me to print both lines,
I need to add,
print
to the first line as well.
So here, then you can see
that it's explicitly forcing,
Python to print out both lines,
rather than the last one.
You can also be more explicit
about what you are printing,
like for example,
if you want to remind yourself,
what best food is,
and you don't want to
keep typing two lines,
you can also choose to wrap this variable,
within quotes.
Still, we can see that
how I'm,
the style that I'm using is F formatting,
so all I need to do is
add F in front of codes
and then wrap a variable
that you want to print
within the cody brackets,
and then you can print this.
So now you're automatically knows that
you should retrieve the best variable
and you would print the
statement correctly.
Now, let's take a look at list.
So I'm gonna try to make a list of what
I think everybody would eat
for breakfast this morning.
So say that you went really
fancy you had pancakes,
eggs, and waffles,
I'm gonna run this as a list,
so you can see here that again,
I'm using a concept of strings
over here
and I'm going to wrap them
within the square brackets,
this square brackets means that I can,
now I'll put a list
and I'm gonna use this
variable assignment,
the equal sign,
and we're gonna assign
this list of strings
to become breakfast list.
So again, I can name this however I want,
because I want myself to remember
what the list actually contains,
so I name it as breakfast list,
I can also choose to add
more items to these lists,
for example, you are telling me,
Oh, you actually had milk too,
so I'm gonna append milk
to this list as well.
So there is an append function over here.
So by print this,
then you should see
that this breakfast list
should have four items,
pancakes, eggs, waffles, and milk
So let's try to get the
first breakfast element
from this list.
So we know that just by
looking at this list,
the first item is pancakes,
but because everything
applied on zero indexed,
so the first element is at position zero.
So if you want to get a first item,
instead of saying the first,
you know, using a number one,
you are using number zero
because its zero indexed.
And you are using square
brackets again to index
into the list.
So let's run this cell,
yes, I'm getting pancakes,
so that is correct.
So what if I want the last item
from this list?
All I need to do is add a minus
and I want a last item
because it's counting from the very end,
so it's the last one,
so it's minus one,
so we would get milk for me.
I can also choose to print
the second breakfast item
and onward.
And remember that everything
played on zero index,
so if I wanna second
item, I will need one,
and if I want everything else
to include everything else,
then you will need the sign colon as well.
So second breakfast, item and onward.
I'm gonna press Shift + Enter again,
to get eggs, waffles and milk.
So you can see that
pancakes is excluded here
because it is the first item.
So now let's move on to conditioners.
So sometimes depending on conditions,
we want to execute certain lines on logic.
And we can control this by using the if,
elif, else statements.
So you can really think about conditioners
as a type of branching statements.
Like for example, if
you have enough sleep,
then you have enough energy for tomorrow.
If not your feel tired.
So it's kinda like if,
if something,
if A and then or B and then
or C or something like that.
So we think of them as
branching statements.
So say that we want to print plural forms,
plural forms are for food
and say that just a really simple example
so that we just wanna add
as, to the end of a string
if it doesn't already have an s there
to indicate that its a plural form.
Say that now I changed my line
and say that my best food
is actually chocolate.
And I'm checking, if this
best food ends with S,
here, you can see there's
another build in function
that I'm using,
and I'm just checking
if chocolate has an S.
If it does, then it's gonna print it as S,
if it does it,
it's gonna add S to the it.
So here I'm expecting, the
output to be chocolates,
it was gonna go through
this second line of logic
and gonna add an S,
gonna add an S as to
this chocolate string.
We can also make if else statements
a little bit more complicated
by adding the elif.
So here, for example,
I say that,
the best food is ice cream
and the ice cream cone is a thousand,
and I'm saying that if best food
is equal to ice cream,
then I want a thousand cones of ice cream.
If the best food is blank,
then I actually want you to tell me
what your favorite food is.
And Ellis really is ice cream better?
So let's try this,
so here,
you can see that it's printing out
a thousand cones of ice cream.
So say that if I can make all my mind,
and I'm going to just leave it at blank,
it's going to go to if statement first
and check if this equals to
ice cream, but it doesn't,
so, it's gonna jump to the next line
and check if the best
food is equal to blank,
and it says, yes, it is equal to blank,
so let's screen the favorite food.
And if I say it's something else,
then, it would tell me it
would jump to the third line
and says, Oh, really?
Isn't ice cream better?
So we can check the equality of variables
by using the double equal sign
or If there is no equal,
then you can use the exclamation
point and equal sign.
So you see here that I secretly
already used the equality
principle or concept by checking,
you know if best food
is equal to ice cream
by using the double equal sign,
so, you can now, I can check
if ice cream is indeed the best food,
but it's not because I
just assign it to a pizza.
So again, remember that
with variable assignment,
you will use one equal sign,
but when you want to check equality,
you use two equal sign or
exclamation point and equal sign.
So now let's move on to For Loops.
For example,
I really want to print
out every breakfast item
that we have had this morning.
And for us it was really
simple way to do that,
because I would not want to do,
I would not want you to say print waffle,
print eggs,
print, pancakes, that
is really cumbersome,
so much easier way to
do it is to use loops,
to repeat a block of code
until a certain condition is satisfied.
Like for example, here,
a certain sequence is satisfied.
So if for example here
is the breakfast list,
and I want to create every
single food in a breakfast list.
So all I need to do is that
for food in breakfast list,
I'm gonna print the food.
So it's going to know
that it's gonna iterate
over the sequence,
and it's gonna print out the first,
the first item first,
and it's gonna go to the second item,
third item and fourth item.
So what if I want you to
count the number of letters
in each word?
Again here, you can see that
I'm incorporating different
parts of Python,
like concepts to,
In this print statement.
I'm using the variable
and I'm using the X strings over here,
and I can use this in
a function called LAN,
which really just means length,
so you can count the number
of letters in each word.
So now it's gonna look
through this sequence
of breakfast list
and,
you can check,
you can see that it just
executed everything,
and all we need is just one line of code
rather than four different lines to count,
letters for every single food item.
So now let's move on to Functions.
So you may ask why do
we even need functions?
Because we honestly, this is
already doing what we wanted.
But functions is really helpful
when we want something
to be more repeatable,
We want something to be more organized
and they all accomplish the same task.
So if I want to generalize,
this print thing
to a function to generalize
to other breakfasts,
to other lists,
then I can build a function,
by using the def keyword,
def definition keyword, or define keyword,
and then followed by some
function name that you want,
and again,
the more intuitive you name a function,
the easier it is for you
to interact with it later.
Because you just have to remember better.
And then you're gonna add
a parameter name in it,
rapid in the brackets
apprentices over here,
and followed by a colon.
So you can see here I defined
a really simple function
that essentially it's just
the same thing over here.
Like you can see that
this line and this line
it is exactly the same.
And I'm gonna define create length,
and now I can execute the function
by passing the print
length, function at list.
So if I were to comment this out,
you will see that this
actually does nothing
because all I'm doing
right now in this cell,
it's just defining a function.
So I need to execute a function
by calling this function
and supply the function with a list
or a parameter that I am interested in.
So here, my private of
interest is breakfast list,
I'm gonna pass into it
and I'm gonna execute this
cell Shift + Enter again,
so it is doing exactly what
we are expecting it to do.
We can also make functions a
little bit more complicated
by giving different arguments.
So this is a single argument function,
I can choose to have
multiple arguments.
Here I'm just gonna make
it a little bit simpler
and have only two arguments,
but you can make it
however many you want,
but probably not too many
because it's gonna be hard
to read for you as well.
So say that now I want to
count the favorite food
that I have.
I have ice cream count
and I have chocolate count,
and I want to add them together to count
how much food I have (mumbles)
So you can see here that
I am just defining this as ice cream,
defining this as chocolate and again,
how you want to name this
parameters is really up to you,
I can choose to name this X and Y,
it doesn't really matter,
as long as you are supplying
to it, the right parameter,
the right variable,
but again, it's the more
explicit you name it,
the easier for you to recall later.
So you can see here
that now it's doing the
automatic sum for me,
because I'm specifying
the addition over here,
return statement,
and plus.
So you may also wonder what if I know that
one of the arguments is
always going to be 500
or going to be some number
you can always set a default value,
so you don't have to keep,
you don't have to keep
supplying the same argument over
and over again,
so here, say that I know that
chocolate is gonna be always
gonna be 500,
I can supply this default value
within a definition of function itself,
and then I can just choose
to pass in ice cream account,
because that is the
only thing that changes,
and I already know,
the function already knows
that the chocolate is 500.
So it's smart enough to know that
that is a default value
and you should grab that.
So I run this,
then you should see that the
output is exactly the same.
So now say that we are just
really crazy about data,
we want to know how much chocolate we eat,
or how much chocolate we like,
here I can calculate percentage,
again the same thing,
over here you can see that
instead of just having
two line, function,
I'm have the, a little
bit more involved logic
between this define and return statement.
You can that I'm the coupling
the percentage of chocolate
that I have by dividing the
sum of chocolate and ice cream
and multiply by a hundred.
And I'm going to use another
building function here round
to round off the numbers.
So here, if I run this,
and again you can see that
there is a print statement
over here that uses
the F formatting string
and also the variable right here,
you can see that it's telling
me that I like chocolate 33%
of the time.
So if you do forget the
perimeters of the function,
you can always call help.
So just add help in front of
the function that you want,
and then Shift + Enter again,
and it would tell you
that this is ice cream,
and this is chocolate.
So this is another reason why
you want to name your function
with reasonable names
or intriguing things.
So we have been defining
quite a number of variables,
so there is a chance
that we will forget what a variables are,
but what we don't have to worry
because we can always check them.
So just a quick recap,
there is a percent variable,
best food,
breakfast list, variable
that we have defined.
So we can check type percent
and is a float.
Float really just means a
number with decimal points.
If you remember, the
percent up here is 33.33,
this is why it is a float.
And if I were to type best food,
it is a string.
I don't remember whether
it's chocolate or pizza
or ice cream anymore,
but we know that it is a string
and STR indicates string.
And here,
the breakfast list,
here we can see that,
we probably know from the name itself,
that this is a list,
which is why naming is really important,
so yes, this is a list.
So in a summary,
so we have gone through a
different types of variables,
the int is numeric,
it's a whole number
without decimal points,
like for example, the one
plus one equals to two,
that would be an integer
rather than a float.
A float is a numeric variable,
and it's a number that has decimal places
like a percent, 33.33.
And a string is a type of
sequence of characters,
like for example, I had
chocolate, pizza, ice cream,
they are all string types,
but they can also be a
sequence of any characters,
like for example,
I can choose to enclose
text and also numbers,
within double quotes or single quotes,
and I can also choose
to enclose those numbers
within the quotes as well,
and they can be string.
So we can try that real
quick, just to check,
for example, if I were to say,
number, and it is one, two, three,
and then I'm gonna check
the type of this number.
He's gonna say string,
so if I were to take all the quotes,
is that integer?
So if I were to,
add hello in there,
it is also a string.
So as long as something,
a text that you have is
enwrapped within the quotes
double quote or single quotes
is gonna give you a string.
And here we sort of hinted
at it in the beginning
of Boolean type, either true or false,
like for example, here I can
define a Boolean variable
by saying,
maybe, you are,
cool equals to true,
and then if I check,
you are cool,
this is a Boolean variable.
A Boolean variable just
means that he has two values,
it can be true or false.
So if I were to check the
quality of this Boolean variable
by doing a double equal sign,
you are cool.
So it's gonna check is you
are cool, equals to true.
So here, what you're expecting?
So we should be expecting
this result to be true
because indeed this variable is true.
So lets check,
is true.
And if we check that this is false,
then you can know that because
it's not equals to false,
it's gonna return as false.
So here, yes, you have completed
a first lesson on Python,
you can see that all you need to know,
the foundational concepts
of a Python is really just
how to interact with
different types, of variables,
how do you print them,
you can out figure out
how to do with these for loops functions,
conditional statements,
and checking types,
a good practice is always to
name your variable correctly,
or intuitively so that it helps you better
later when you look at your own code.
But yes, that is the
wrap up for this lesson,
and you can go to this,
get help page,
if you go back to the tech talk page,
go into this folder,
again, Introduction to Python,
you can see that there are two,
there are this one lap of this Buzz lap,
I'm gonna click into it to show you.,
so it's a really common
interview question,
so I can definitely
recommend you trying it out,
there is also a solution in there as well,
so you can also take a look at it
after you have taken a step.
So now, we have 20 more minutes,
so we are open to take any questions.
What happened--
- [Karen] You say quicker...
Sorry, I was just gonna save a reminder
for everyone to post in the Q&A,
but if you can grab them
from chat, go for it.
- Yeah, so I see a question here,
what happens when you assign
a variable with number
and then reassign the same
variable with a string?
So let's try that.
So here, if I were to assign
a variable with number,
so I assume this is what you need,
so we'll say that it's one to zero
and I'll assign the same
variable with a string.
Does anybody wanna take
a guess at what this,
what this will output?
Can you answer in the chat function?
Okay, I see string,
two people have answered so far strings.
Yes, so it is correct.
It will turn out string.
And why is that?
Is because Python remembers
that the latest variable assignment,
is this
rather than this.
So we can definitely
use the export function
in the,
Databricks notebook environment,
so here I can see that
I can export as
IPython Notebook,
which means that you can use
this later in Jupyter as well,
you can export it as a source file,
which means that this will be written out
as PY.Python file that
you can access it locally,
and then you can also
export it as an HTML link,
or you can expose not expose,
sorry, export it as
DBC archive which is a,
Databricks Notebook environment.
So if you export it to a DBC file,
you do need to upload feedback
to Databricks environment.
This file, doesn't get
further safe to get help automatically,
no, it doesn't.
You can choose to integrate with GitHub,
but right now I am not.
So if I were to
show you the home button right here,
you can see that there is,
the false that you have right here,
that you have uploaded,
and if you have multiple files
and say that you want to go
back to the most recent file
that you have worked on,
then you go to this recent,
tab over here,
you can see there is a Python fundamentals
file right here,
then you can take to
retrieve, the latest file
that you have worked on.
Yeah, so there's a lot of documentation on
how you can integrate with get help,
we can definitely send that out as well.
Is there a difference between Databricks
and Azure Databricks?
Let me defer that question to later,
because I'm gonna focus on
Python fundamentals for now,
can we call one notebook from another?
You mean like sourcing
a notebook from another?
Yes, Brooke just answer.
- [Presenter] Actually Chengyin Eng,
I think there's a question,
or a few questions about
why you use Databricks
for this when Jupyter exists?
- Yeah, so on Databricks environment,
you can see that
you probably remember
that in the first place
I showed that you can invite your peer
to collaborate with you on the notebook,
so that is a really handy feature,
you can just invite users
to the same workspace
and you can collaborate
on this, very easily,
and, what is different about Databricks
is also that you have
the cluster, management
feature over here,
so here, if you are using
Databricks to run Python
or any code that you have,
then Databricks will manage the cluster,
the environment for you.
On Community Edition,
you can only run one cluster at a time.
Can you show one more time,
how you got stuff from GitHub
into Databricks environment?
Yes, I can.
So let me go to this, GitHub link again,
and ,just because,
I can show,
I'm gonna just show you the same thing.
Here you can see that
I have the link over here in a top bar,
and I can command
Control + C to copy it,
and it can go back to
the page that you have
on Databricks,
and then you can
hit the home button right here,
and then the drop down
menu under your email,
and then you can click import,
again you can just import a file or a URL,
or a file just means that,
you know, something in
your download folder
or whichever directory you have (mumbles)
you can upload it.
And then you can also
choose to import via URL,
and then I can put in this URL
and then I can just click import,
and then, because they
are both named the same
so this is gonna have
to suffix, click on one
and this is one.
Yes, so to create a cluster,
you can see that, here I have,
the cluster running,
so I I wouldn't be able
to create a second cluster
within the Community Edition,
but all you need to do is to go to this,
you know, we see this sidebar everywhere,
I mean, everywhere in
your local environment,
and then you can click on this icon
and it you would take you to this page,
and then there's a blue
banner right here that says,
create cluster,
you can create cluster,
and then you can put in
your name right here.
So if you want to go
back to the landing page
on Databricks environment,
you can also just hit the
top left icon, right here,
Databricks,
and then you can see that you
can just add a notebook from,
the homepage, rather than
going to the home icon
and clicking to create a new notebook.
So this is the two different
ways to create notebook,
either from the home icon
and then go to the drop down
menu underneath the email,
and then keep creating notebook,
or you can, keep creating
notebook from the landing page,
the homepage of the Databricks environment
- [Presenter] And then Chengyin Eng,
looks like we have a
question for you to create
an example using methods
rather than functions.
If you could show that.
- [Chengyin] Where is that question?
- [Presenter] It's in the Q&A.
So for example, like string
dot upper string dot lower.
Like showing the difference
between a function and method.
- [Chengyin] Yeah, let
me go back to this page.
Yeah, so for example,
here, if I were to,
I guess, say that I want to remove,
item an item from this breakfast list
I can choose to use,
oh this is another feature that
I wanted to show you first.
So if I've already named a variable,
say that I want break,
but I can't remember how
to spell the reservoir,
or I'm lazy to talk to Rex,
all you need to do is to
press tab on your laptop,
on your keyboard,
and you can see that its
suggesting to you different,
different, the auto-completion
for the variable
or for different functions.
So here, because I'm
interested in breakfast list,
So I am going to use
breakfast list right here
and just click into it
and then press enter,
and then you will see that
the name of the variable
now is being auto completed.
So if I were to remove the last item
or remove an item from the breakfast list,
I can say remove,
and then say that I want to remove X.
So let's try that,
So, okay that's actually before this,
let me print out my breakfast list
and I want you to take a
guess what this will output.
So here now it has pancakes,
X, waffles and milk.
So how many items,
what are the items will
remain in this breakfast list
after I remove X?
So it should be everything except X,
so you can see here that,
okay, now we don't have X anymore.
So this is another Boolean
function that you can also use,
you can also like, if you remember,
there was...
Oh, sorry that was with string, yup.
So this is another Boolean
function to remove.
What makes Python so great at data science
and data engineering compared to other
programming languages?
Python is multipurpose,
so it's very versatile
across different purposes,
like for example,
you can choose to use Python
in website development
or data analysis,
or, just building application,
so because it's more multipurpose,
then it makes it easier to
collaborate across teams,
but still stay in the same language.
Can you explain for loop
break and continue feature?
Sure, I can show the,
how you can break a loop.
So for example, right here,
I want to,
say that I want to,
maybe,
say that
if food equals to
waffles,
then I want
to break this loop
and that mean just make sure that
what breakfast list
contains at this point.
So yes,
pancakes, waffles, and milk.
So this means that if I use
a break function right here,
it will not print out milk anymore.
So if I run this Shift + Enter,
oops see, sorry, I need to print.
Oh, so this is doing something different
because it's checking whether or not
this is a waffles,
and if its waffles,
then it's gonna print it,
then if it's not, it's gonna break it,
so this is not very interesting,
so I'm going to just,
say that print food,
and then I'm going to say,
if food equals to,
waffles and then I'm going to break it.
Yeah, so here you can see
that now it's checking,
I'm printing the food
right here to know that
where the sequences is at
so here we know that he has
already gone through pancakes
and he has already gone through waffles
and if waffles,
best food is waffles,
then it's gonna break the sequence,
and its not gonna print it them anymore.
Yeah, so just so everybody saw
how we can create a new cell,
if I hover over this notebook
environment right there,
you can see that there is
a plus sign in the center,
you can insert a new cell like this,
or again you can also
go to the top right
hand corner of the cell
to add cells or remove cells.
So here I can say that I
want to add a new cell, above
add a cell below,
I can also move cells,
I can also show a title.
I say that if I want to show a title,
I can just click into this and say,
this is a breakfast
list.
And you can also choose to
go to this run, cell option,
you can see that I want to
run this cell or run all above
and run all below.
So this is really handy
when you know that for sure
everything else above is gonna work,
and you don't wanna go line
by line to execute yourself
and you can choose to run all above
and run all below.
Any specific advantage of
using Jupyter Notebook,
over Python on our local machine?
So it really just depends
on your personal preference,
there are people who are
really just really like
notebook environment,
there are also people who really
don't like notebook environment.
So we can see here that
the advantage of having a
notebook environment is that
you can see the output
within the execution,
so I know for sure like now
this is like one plus one equals to two,
and the outputs that results
from and we can see right away.
But if we use a local Python file,
then you miss that you will not see,
you will not be able to
see the output immediately,
but together with the code.
So it really just depends
on your personal preference.
I mean, of course now there's
this idea that helps you to,
you know render an output,
we're doing the same console
but, you know everybody use Python,
so I would say that it boils
down to personal preference.
How many Nobles can one customer service?
This is a good question.
So it depends on how memory intensive
or how hardcore your
task is on this notebook,
like, for example,
there's no fixed number how you (mumbles)
you can run out of a specific cluster,
of course, think of cluster as a resource
and if more people
access the same resources
or more (mumbles) access
the same resource,
that resource has to work
harder to execute yourselves.
So yeah, if I have one
really hard notebook
and that I know that all my local machine
is already gonna take like 10 hours
and then even a small cluster,
you know, for example,
maybe you just have like,
basically just, limited
memory on the cluster,
if you were to run to notebooks
one very memory (mumbles)
intensive and the other one is light
then it's definitely
going to slow down the
execution of the code on the
environment, on the cluster.
So the answer is it depends.
Do we have a cheat sheet
of all the functions in 10 (mumbles)
provided anywhere?
There is Spark doc,
not Spark document but there is Python,
you know, if you go back
to this website, actually
let me just show you this.
So you can see that this
is the final tutorial,
so this does show a bunch of
syntax that you can look at,
over here,
and if you aren't interested
in that break continue,
you can also look at this,
there is also,
really just an overview of
what you can do in Python,
so I recommend you going
through this tutorial
if you are interested in
learning more about Python,
but you can also go to
the documentation as well.
You can see here at top,
top, bar over here there is English
and then drip two point two
Python version and documentation.
And you can take a look at the
documentation or I mean the
really easy ways to just
check the syntax is really just to check,
for example, list append
and then I can type Python
three syntax for example,
and then, you can see that there is the,
here you can see that the dots python.org
already brings you back to what
you can do within the data.
So you can see there is append extent,
insert, remove,
so here is how you can
get more information
about what you can do with Python.
Yeah, we have five more minutes left,
any more questions about
Python fundamentals?
And yes, in a future sessions
we are going to talk about
Penn does, but not Spark
because the purpose of
this series of workshops
is really try to get you to understand
how you can use regular Python,
you know, without
without, any I guess prereq
of knowing what Spark is
or, interacting with Spark.
So we're just gonna be
using Python and Penn does,
for the upcoming workshops.
So the number of collaborators
on the workspace is three,
including you,
so here for example,
I can go to this workspace
and say that I want you
to invite more people.
I can go to any console
because this is a Community Edition,
So it allows three user
accounts altogether,
I can just add a user
and I say that,
okay, I am just gonna be,
cheating and then just
put a plus one over here,
now, it says that I can,
I can add this person,
the person should receive an email,
okay, add a second person,
an email sent,
but if I try to add a third person,
you know, which means four in total,
then this button is sort of I guess faded,
then he would say that your plan
doesn't allow more than
three user account,
that more users doesn't operate
in Databricks collaboration.
So for the Community Edition you can only
have up to three user
accounts in one workspace.
The link for the next classes,
so let me put it this way,
so all of the resources will be posted
on the same get help page.
So you can see here that
there is a tech talk,
repo row over here that
is prefix with date.
So for an expert to watch up,
you can see 2020 April 15
and it will be Introduction to
Python and you click into it,
and then again,
using the same process of
importing the link from GitHub,
so you can copy the link
that you're offering here,
you know from a notebook
and then you can import it back
to your notebook environment.
So it will be the same hip hop link
that you'll be interacting with,
but the link that you will
be using for each notebook
that will be different
because it's a different file
or the training session link.
I'm not sure about that.
Oh yeah, Karen do you know?
- [Karen] Yeah, so I just
dropped it in the chat,
if you wanna RSVP to the
two upcoming workshops,
I dropped the link in the chat
and then I'll also include
that in the follow up email.
- Yeah, so here is the link
that I just opened up from
Karen's message in the chat,
so we can, here there is the first part,
which is today's,
and then there is the second part,
which is next week,
and then there is another
tech talk about a different
part of Databricks,
which is the other like
you can also attend
that you can even like
but, third part of the
series will be on April 22nd.
You can also choose to click RSVP here
and then you will receive the email
on how to join this webinar.
We have two minutes left,
I guess is a good time
to wrap this up, Karen.
- Yeah, yeah that's great.
Yeah, there is no more questions,
thank you so much Chengyin for this,
this was great,
I think everybody really
enjoyed the content,
we had almost, just over
300 folks attending,
which is really awesome
so, thank you so much and
thank you to all the team.
