- Hello, hello!
What is up everyone?
Welcome to Help Me Data Geek number two.
The second week now, I've
got a new setup here,
hopefully the audio's better.
The office is complete,
or, I need some acoustic stuff in here,
but overall, it's pretty
close to being done.
What I want to talk about today
is which technology in the
data realm should I learn?
So, this is a fun topic,
I get this question all the time.
I probably have a dozen emails
from aspiring data analysts
and data scientists
and business people asking me
"What should I do and
what should I learn?"
So, I figured I would try
to cover this topic today,
and we'll have some fun
exploring these technologies.
And, if you have any questions
during the broadcast here,
just post them in the chat on the right,
whatever side that is,
and I'll try to get to
them at the end of this.
And if not, whatever,
we can come back and talk about it later.
You can email me at
help@bensullins.com anytime,
and we'll try to get a
question answered soon,
from whenever you send that over, so cool.
Let's take a look now,
I actually created some slides.
I'm a geek about slides, I love slides,
but I like to make them not horrible.
So I'll do that now,
so we hop over to that.
I'm gonna kick up the slides ...
which are going, and ...
or not.
Let me see if I can get that going.
OK, I got that.
Now, let's go here, and I click that one.
Sweet.
OK, cool.
So you should be seeing now
the Which Data Technology Should I Learn?
So the first thing to talk about here,
what we're gonna learn today,
we're gonna talk about the background.
I think there's some
important context here
about this topic, and
about these technologies.
So I'll just talk briefly
about that kind of thing.
Then we get into the actual
technologies themselves,
we'll talk a little bit about
the different technologies
in the data realm,
and we'll get into the personas,
so the people that actually make up
or that use these technologies every day.
And then the marrying of those two,
so who should learn what.
So hopefully this will help
you answer the question,
like "What should you learn?"
depending on who you are,
and what your goals are.
And the last thing I'll point you at
where you can find more
info, well actually,
jump over and I'll show you some websites
that are great to learn on.
OK, cool.
So, first, take a look at this.
So, this is from indeed.com
which is like a job website,
a lot of career stuff on there,
and these are the programming languages
ranked by number of programming jobs.
So I think this is relevant
because the very first one
is a database language,
it is SQL, so, this is one
that's been around forever,
so the point here is that SQL
is probably one of the most
universal programming languages.
So, whether you're a developer,
or whether you're a data geek,
SQL is there and that's pretty awesome.
Also down on the list a
little way is this Python,
and Python is another, granted
it does a lot of things,
but it is really popular
in the data science realm,
as well as the data engineering realm.
So, two out of the, I dunno what is this,
top nine or ten languages
right here are data languages,
so that's incredible,
just to show the popularity
of data technologies today.
Then, this other piece I wanted to share
was Glassdoor put out the
25 Best Jobs in America,
and this is cool
because the number one best
job was data scientist.
Now, I'm gonna go off script
and generalize this for a little,
pause for a second here and say,
and I'll talk about this a little more,
data scientist to me is akin
to data analyst as well.
I'm not sure when they're
talking about it here,
if they're talking about more
of a true data scientist,
somebody who does formal,
statistical modeling
and comes up with machine learning APIs
and those kind of things,
or if they're just talking about
somebody that uses data to solve problems,
which is a very general
way of thinking about
the process of data analysis,
a data scientist being
probably the most advanced role
in the realm of data analysis.
So, just think about that,
that the number one best job in America,
and this includes all kinds of jobs,
in fact I think number two was a CPA,
like a tax guy or something like that.
So, that's just insane that,
I mean data's hot right now,
and so I hope what I share with you today
is gonna be important for you
to understand the journey you wanna go on.
So let's talk about some technologies now.
Alright, and this one,
I dunno if this will be a
surprise to people or not,
people that know me it certainly won't be,
but Excel.
I can't say enough about Excel.
It is probably the most powerful piece
of software ever made.
It helps us run the world essentially,
I still think that OPEC,
the cartel that creates,
controls oil prices from the Middle East
probably sits around with a pivot table,
figuring out what the price of
oil for the world should be.
Excel is that kind of
a thing, it comes up,
the famous Harvard
economist study about GDP,
which turned out to be
wrong, it was an Excel error,
I'll blog about that
in the future I'm sure,
I mean Excel really is one
of the most powerful things
so I think it's really critical for anyone
in any role in the tech world today.
I mean, you're really in
business, or anything,
and most people probably know that,
that's probably not a big surprise there.
The thing that I would say
is that the people that
maybe are skeptics of this,
so probably the biggest
skeptics I've encountered
are database people,
so people that are hardcore
database developers
that know SQL, they think
"Excel, it can't handle too many rows,
"I can't write SQL against it, you know,
"that's my hammer, you know
that I use for everything."
I would encourage you to take a look,
I use Excel to actually automate
the writing of SQL at times,
and I also use it to do things
like build simple data models
and create database tables.
So you can use Excel to save yourself time
in other programming languages,
so whether or not Excel is your hammer
that you use for everything, which,
that'd be tough if you're
doing a lot of data work,
but, it can have many different purposes,
and that's actually one of the
gift and the curse of Excel
is that it's such a generic
product because they have
such a wide audience
that they try to serve,
that it won't take you all
the way to completion there,
it'll get you about,
I don't know, 50, 60%,
but a lot of it is gonna be on you
to finally complete the
project using Excel.
And so, that's one of those things
that people love and hate,
I love it and I recommend,
regardless of what job
or what your role is,
that you take a look and
see how it can benefit you.
OK, now onto the hot stuff,
the fun tech that's going
on in the data world.
But first I need to pause
and have some coffee.
(sighs) Brought to you
by Stone Brewing Company.
If you guys, anyone from Stone is watching
you wanna send me a beer
to drink on the show,
I will happily do that, so.
Anyways, back to regularly
scheduled programming.
The Key Technologies (right now).
So the first one I
wanna talk about is SQL.
This is a query language
that is universal to all databases.
Now, caveat or asterisk there that,
NoSQL databases, which by the
way stands for not only SQL,
not no SQL like the absence of SQL,
those things are something
like MongoDB, or HDFS,
or some of the other databases
that we call databases,
that aren't really, exclude
them from that list.
If you're a database you support SQL.
If you don't you're not a database,
I guess that's my stance on it.
So, SQL works with all the databases,
and each one has it's own flavor,
so MySQL has it's own flavor of SQL,
so it supports the
standard ANSI Standard SQL,
which aren't many programming languages
that have an ANSI standard.
MySQL has it's own
flavor, Oracle has PL/SQL,
SQL Server, another
Microsoft thing, has T-SQL,
that Microsoft thing
like it's just a thing,
Microsoft SQL Server's huge.
All of them, Postgres,
they all have their own version of SQL
that extends beyond the ANSI standard,
but at the base level they all support
a lot of the common functionality,
so Select statements,
Group By's, Where's and
all those kinds of things.
So what that means is that,
if you know this one language you can talk
to nearly all databases that exist,
which is great because if
you're a data geek, like me,
or you're in a data role, you don't care,
you can come into a company,
"Oh OK, what kind of
database do you have?"
Cool I just need the right tool
that I can connect to that database
and then I can execute my queries
cos I can write queries,
cos I know the standards.
The next one is Python, and
Python I absolutely love.
It is one of the few technologies
that actually incorporates Zen into it,
they have these principles,
and the Zen of Python
and it's one that is just
beautiful and easy to read,
and incredibly simple to learn.
And of course because
there's a big community,
like a lot of these,
whatever you're looking to
do has been done before,
so you don't need to reinvent the wheel,
you can Google, and copy and
paste from Stack Overflow,
or whatever you wanna do.
So Python is another one
that is huge in the data world right now.
Python of course is more
general than just data,
but in the data realm,
especially data science
and data engineering, it's huge.
Tableau is another one,
and this may be a bit
controversial, or maybe not,
if you guys follow my stuff
you know that I love Tableau
and I teach and talk about it a lot,
I'm hoping to speak at the
Tableau conference this year,
all those kind of things.
So, anyways, this one is huge,
but I'll generalize this
a little bit and say that,
the BI and analytics tools,
and Tableau is in my mind
the best one out there,
QlikView is another popular
one that is also a leader,
if you saw the recent BI Magic Quadrant,
there was the three leaders left
in the top right quadrant there,
Tableau was one, QlikView was another,
and Microsoft was the other.
I would say Microsoft is
definitely playing catch-up
to the other two,
and Tableau I think is the true leader
because they're the ones that
really have revolutionized
the whole BI and analytics world
with their approach to
self-service analytics,
and making it simple and easy
for people to visually explore their data.
I don't wanna get on a
sales pitch about Tableau,
you guys have heard me do that enough,
but, the point being, your
BI and analytics tool,
I recommend Tableau, is huge and right now
it's super-important for
people to learn that.
And the other one is
R, and I hate the name,
just because anytime you search for it,
you just get all kinds of crap results,
but R is an open source,
statistical modeling programming
language essentially,
and there's variants of it,
there's R Studio, R Server,
there's shiny dashboards,
there's a whole realm of
stuff popping up around R.
And this is largely
used by data scientists,
but I would say that it's
finding other applications
outside of that,
so people that aren't
classically trained in statistics
or some of the other applied
mathematical principles
are using R to understand data better
and to make graphics and
everything like that,
so a really powerful tech.
Alright so those in my
world, or in my opinion,
are the top four technologies
in data right now.
Now we'll switch gears, and I
wanna talk about who you are,
and hopefully if you're watching this,
you're one of these three roles,
and I'll have a fourth role I'll mention
but I don't wanna highlight
it as a data role.
So, the first one is the knowledge worker.
So the knowledge worker is the person
that is a business person,
that is using data to make
decisions to run the business,
to do whatever their business role,
I say business but I mean organization.
I used to work for Mozilla,
so we had a foundation
and we didn't refer to
ourselves as a business,
but whatever your organization or company
or business or whatever it is,
there are people that use
data to make decisions,
hopefully a lot of people,
hopefully this role,
this persona hopefully applies
to a really broad range of folks.
And so, I would contend
that even C-level folks
should be knowledge workers,
in that this is a big, big market
and it's really the ones that,
the people that take the
insights that were developed
for you or the dashboards or whatever,
and apply them and actually
make the difference.
So some of the, the last mile
in the journey, if it were,
from where data starts to where
it actually has an impact.
The next is the analyst and the scientist,
and the data analysts and scientists
are the persons that, the
people that will take data
generally either from
collecting it however they can,
from scraping it from the web,
or pulling it from a database,
or downloads from CSV files, or whatever,
and making sense of it.
So this is the real exploratory work,
this is really fun work
cos you get to learn a lot
and this is constantly evolving,
and there's just a huge opportunity
to be really creative here
about how you use data.
Then you have the engineer,
and the engineer is the one
that really makes this whole thing hum,
without them the pieces
don't fit together,
the data doesn't flow.
Someone told me recently, I was
having a chat with a friend,
and they were saying that
something like 70 to 80%
of data scientist jobs is
collecting and organizing data
so that then they can do analysis on it,
and I thought that was ridiculous.
I think that's just not how I,
I've not structured my teams,
my organizations that way so,
that's insane to me that
companies would hire somebody,
or expect somebody who is
extremely hard to find,
extremely valuable,
and have them do the heavy lifting
of just moving data around.
I mean a data scientist should have,
in theory it's like a chef
coming in to the restaurant,
where they should have
all the tools laid out,
prepped, cleaned exactly
how they like them,
and then they should have
all the food ready to go,
and they just make these beautiful dishes,
these beautiful creations.
That's what the analysts' and
scientists' role should be,
it shouldn't be, you
know Emeril doesn't come
into his kitchen and go chop
tomatos, to make the salad.
Somebody's chopped the tomatos for Emeril.
So that's my point.
You should have somebody chop the tomatos
for your data analysts
and data scientists first.
And that would be the data engineer,
or the data engineering team.
OK, so, onto the next one.
Who Should Learn What?
Well on the left here
I'm just gonna put up
our knowledge worker, our
analysts and scientists
and our engineer,
and then on top I'll put
our programming languages.
So the first one is Excel,
so the knowledge worker
obviously needs to know that,
in fact they're probably
the most familiar with it
and they probably try to
do everything in Excel.
One of the most ironic
things that you find,
and I found throughout my career,
is you spend all this time
building these dashboards
and trying to make it
easy for knowledge workers
to find answers to their
questions and get their job done.
And, still the most common denominator is
"Can I download it to Excel?"
And that's,
it's unfortunate because the
idea is to not have to do that,
cos often what people do then
is they try to join it up
with other data or they
try to mangle it together,
or fit it into the model they want
and then make their own thing in Excel
and it's like "We can do
that for you, you know,
"or we can teach you other
tools and ways of doing it."
So, anyways, Excel is obviously key,
SQL's another one.
I have a fun story back
in my first real data role
I was working at a call center
in Phoenix in the late 90s,
and my boss, who was a pure business guy,
his role was to help understand
customer service staffing,
so what we did, or what I did is
we looked at the schedules for
our inbound sales actually,
so it was customer service
and sales calls coming in,
and we're trying to balance
the staffing levels,
like how many people are
on the phones at this time,
which means we have to predict
how many calls we're going to get,
which things like marketing
campaigns or whatever,
if there's an outage, that kind of stuff.
And then, think about other
people's situations like A,
so-and-so has vacation they
need to take and all this,
and we're talking about 2,000
people in a call centre,
so, lots of data.
And it's really a numbers game,
trying to fit all these things together.
His job, he actually ran that
for a number of call centers,
my job was to help work
with the data there,
and the funny part about the story
is he's a complete business person,
he's not a tech person,
he's not a developer.
He knew SQL, and it blew me away.
I thought, "Holy crap, so
here's a business person",
and we're talking late 90s,
"Where he's writing SQL code
"to figure out how to do his job."
And at first I thought "This
guy's frigging awesome!"
And then I also thought
"Man if he's writing SQL I
need to be stepping it up."
cos I thought SQL was
the end of the skills
I needed at the time.
So anyways, knowledge workers, yes to SQL.
The other one is Tableau,
and again if you don't have Tableau,
you should go try it out,
but if you have a different BI tool,
whatever it may be
that's fine, do that one,
knowledge workers need to use this,
this is the nature of
self-service analytics,
because Excel and SQL,
well SQL's gonna be hard,
especially for complex analysis.
And Excel has its limitations,
for as great as it is.
Tableau goes beyond that,
it is the best of both worlds there,
it's easy to use like Excel,
there's really not coding
or you can't code in it,
but it's not really required,
you can get a lot done without coding,
and it's one that can
handle large sums of data,
connect to databases,
connect to web data sources
and all that kind of thing,
so Tableau is an absolute
must for the knowledge worker.
Then you have on the
analysts and scientists
of course Excel, SQL too,
the analysts and scientists
are gonna have to get down
and query databases.
I know a lot of people that
are actually Tableau experts,
or QlikView experts, that don't know SQL,
and that blows my mind.
I think it depends where you come from,
some people come from the
knowledge worker's side,
like they're a business person
and they just learnt Tableau
and now they're a Tableau expert,
but they aren't really tech,
they're not a technologist,
which is a term from the
90s that we used to use.
We used to think of
ourselves not as a developer,
an IT guy, an administrator or whatever,
we were technologists,
we were Jack of all trades, so to speak.
So, if you're an analyst
or scientist SQL is a must,
I don't care what your background is,
if you now are in that role, learn SQL,
it's super-easy, it's
not crazy-hard to learn,
so don't be intimidated by it.
R is another one that I
would say is required,
or becoming required, if you're
a data scientist absolutely,
and I guess there's some
difference of opinions
between R and SPSS or some
of the other ones, whatever,
but a stats package is the
one there, I recommend R.
Then I'm gonna put a
dotted one around Python,
because I think this is really powerful,
again part of an analyst's
and scientist's job
is to claw and scratch the data together
and Python can allow you
to do that in unique ways
that none of these other tools can,
so I recommend learning at
least the basics of that.
And then Tableau is a must as well,
so there's a lot in the data realm
that analysts and scientists,
there's a lot on your plate,
you are really the workhorse
of this whole process,
and so it really revolves around you,
so there really is nothing
that you shouldn't become good with
or at least proficient
with to some extent.
Not to say that you won't have the things
that you lean towards
based on your experience
or whatever you like.
So the engineer then,
Excel obviously, SQL yep,
and this is where the difference is,
the engineer is really heavy in Python.
So the data engineer
uses Python to move data
from place to place, to manipulate data.
A common framework is we use Python
to take data from wherever it comes from,
from an FTP site, from an API,
from a database,
wherever it lives outside
of our data warehouse
or analytics, you know warehouse,
and we pull that in using Python,
and then we use things like
SQL to actually manipulate it
through the process
inside of our environment.
And of course there are
lots of other tools there,
and other ways to do that,
I really hate getting stuck in
the data engineering toolset,
because they all have their limitations
and when you get stuck and
you can't do something,
it's just incredibly infuriating, so,
the thing is, and rant
about that for a second,
is if you're not going to write the code,
you can get done all the things
you could get done with say,
an Informatica or tool like that
in probably about the same amount of time,
and it's really just about as hard
to maintain and everything,
I mean it's a watch.
But you have the ultimate flexibility,
some people gravitate towards tools
because they're afraid to write code,
or their afraid of command line,
don't be, it's not, if
you're a technologist,
especially if you're a data engineer,
then the command line is your friend
and writing code should
be your friend too.
I put Tableau on here as well
with a dotted line around it,
and I think this one is,
this one is good because
engineers need to show their work,
and just like everyone else.
Now it's not required, you don't have to,
but I actually know a
lot of data engineers
that have benefited from knowing
how to do stuff in Tableau,
or at least the basics.
They can spit out some results,
they can test something,
performance on the server, whatever,
and you use Tableau to
visualize that data,
it's pretty straightforward.
OK, so hopefully that is
a good picture for you,
of, depending on what role
you're in, or wanna be in,
and what types of skills you should learn.
I'm not gonna go into which
one you should learn first,
I would probably just
recommend going after the one,
I'd try them all out,
go after the one that you find the most,
have the most interest in,
and try to go down that rabbit hole,
and have some success there,
sort of play to your strengths at first,
without worrying about your
weaknesses, to get going.
Alright.
Where to Find More.
So here I'm gonna talk a
couple of Pluralsight courses,
obviously I'm an author on Pluralsight,
and I have a lot of stuff there,
these first two are courses
of mine that I recommend
if you're going down this path.
Data Analytics Hands On
takes you from soup to nuts,
it covers all of these
and many other topics
in the one-inch deep level,
and then points you
where you can go deeper,
if you want to get into
that, so data modeling,
star schemas, ETL, all
those kind of things.
And then Tableau Fundamentals of course
is just how to get up
and running with Tableau,
and by the way, we're doing
some new Tableau stuff,
there's a new partnership,
I'll show you some cool stuff.
Tableau just announced they
have learning partnerships
with us, Pluralsight, Lynda
and a couple of others,
so lots more Tableau courses coming out
if you're interested
in that on Pluralsight.
There's also an Introduction to SQL,
so this is a great way
to get going with SQL,
and there's a Beginning
Data Visualization with R,
so you've got all these things covered
on the Pluralsight courses.
I forgot to add the Python one,
so yeah, there's tons of Python courses
on Pluralsight as well.
So on Code School, we've got two,
so Pluralsight costs money,
you can do a free trial for 14 days,
or email me and I can hook you
up with a longer-term trial,
and on Code School
though this is all free.
There is a paid membership as well,
but these two courses totally free,
you probably have to
create an account, I think,
but whatever, you don't
have to pay for anything.
And Try SQL and Try R,
now the cool thing about Code
School, the difference there,
is that these are for
the absolute beginners,
so if you're brand new
to SQL or brand new to R,
it is probably the best way to learn.
You have a person talking,
explaining the concepts very clearly,
you have great graphics
and then you have the
coding in the browser,
so it's a very interactive way,
well it's like a person talking,
diagram explaining something,
now it's your turn,
it's like a little coding
challenge in your browser,
and these guys, the production
quality of their content
is just beyond anyone else's,
it really is the greatest stuff.
Now, contrasting the Pluralsight,
Pluralsight is more I'd say
advanced, so more professional,
so if you're already
know how to install tools
and connect to a database
and stuff like that,
that's a great place to get
going much deeper in the space.
Code School's much higher-level,
and much more beginner content,
and a great way to dive
in to a new technology.
They have a ton more stuff,
but relevant to our talk here
there's these couple of courses.
And then I'm gonna to just
mention one other one as well,
it's not just about promoting
stuff that I get paid for,
it's about promoting,
sharing my knowledge with you
and helping you guys learn.
datacamp.com is really cool
and has whole different tracks,
kind of like Python in R tracks
for becoming a data scientist.
OK, so,
that's all for the slides,
let me jump over now and I will show you,
alright we'll check if
there's any questions,
and if not, we'll call it a day.
Alright, looks like we don't
have any questions now,
so if you do have anything
and you wanna follow-up,
or questions about this
talk or about this podcast,
this podcast, this video blog,
email me at help@bensullins.com,
and I'll see you guys next week.
Ciao.
