As we all know, learning is a continuous journey
you have to constantly upskill to stay relevant
and so on.
So can you tell us a little bit about some
of the things you learn on the job?
Yes.
So, essentially, data science is like a very
huge field.
You will have different tasks every day.
Some days, you might want to automate a process.
Some days, you might want to work on big data.
So it depends on the multiple hats that you
wear.
Again, some days you will have to kind of
go out of your way to understand different
business problems.
So essentially, what I do best is I just go
with the flow, whatever is given in front
of me, I kind of pick that up.
I kind of hustle around with it.
Sometime ago, recently, I was working around
with a package called Helium, which is built
on top of Selenium which is used for automation.
Now, the essence is that, before you even
build models, you have to automate a process
to create models and every day.
Not having like a standard data structure
kind of deviates the process of creating a
model.
So that is where I was learning Helium recently.
So yeah, I mean, I kind of keep learning different
things.
So as I mean there is a requirement, I kind
of pick that up and I kind of hustle with
it, I struggle with a problem.
So a lot of people that have that kind of
thought — there is one thing that I've noticed
is that not a lot of people have that energy
to struggle with problems.
I believe that is one big thing that I learned
at Flexiloans that it's okay to struggle.
It's okay to spend time when you're stuck
somewhere.
But it's not that cool to kind of give up
if you're facing some difficulties.
Understandable.
It's a very good life lesson also.
So, um, tell us a little bit about, you know,
when and how and why did you decide to start
a YouTube channel?
Yeah, so it was around — if I'm not wrong
— it is around September or during the batch
itself, during the sessions.
I kind of picked up this hobby of writing
blogs.
So I remember writing a simple blog on trying
to solve the mystery of what true positive
means, true negative means, and false positive
and false negatives.
These are like simple terms in terms of how
do you identify how much good your classifier
is performing.
So I wrote a simple blog and I caught hold
of a software that could record videos on
my laptop.
This is just something that happened accidentally.
And I created my first video, which was like
a very, very shabby version of what I create
today.
But it was a good starting point for me.
I recorded the video, I just went through
the blog.
So if anyone who kind of visits my channel
would see, the first video that I've created
is mostly like a walkthrough of the blog that
I have written.
The audio is not proper, the content is all
around the place.
But that was like the starting point for me.
After that, whatever little I kept learning
whatever, say on the job or on the whole trajectory
of learning, data science, whatever little
I kind of learned, I wanted to keep a place
folder wherein I could refer that same video
or content, again, from any part of the globe.
So YouTube is accessible everywhere.
And the whole journey kind of became very
simple for me; I started enjoying the process
of creating videos.
That is how I started the YouTube channel.
I mean, it's grown leaps and bounds.
I didn't expect it to grow so much, but yeah,
I mean, it's doing really well in terms of
the views.
It's getting the appreciation it is getting.
So yeah, that's how the whole thing started.
It was more of an accident.
And nothing planned as such that I kind of
thought and kind of executed.
Right, but I mean, your YouTube video, I mean,
channel is doing so well.
The videos are striking a chord with people
because that's the content people want to
see.
So is this or are there any other factors
that have motivated you to mentor — because
you are a mentor — for aspiring data scientists,
So I started my teaching journey, so to say,
during my master’s.
So when you're doing a master’s, if you
have to essentially make a stipend during
the stay, that’s when you teach.
That is the time you get paid every month
a small amount.
So it was then that I started teaching.
That was the first time that I started teaching
and a couple of — or say a few people — from
the batches that I taught, say, power electronics
or some different subjects, they came up to
me and they told me that you're really good
in terms of teaching concepts and making concepts
very clear.
So I had this in the back of my mind that
I have that knack of teaching that was already
existing.
Even at Cisco, I remember I used to give,
say, I used to hold small discussions, although
I stayed in Cisco for a very small time.
But I understood the concepts of, say protocols
like multicasting, and I used to deliver sessions
for my team.
So I always knew that there is a knack of
me, wherein I'm good at teaching, good at
simplifying complex structures or complex
algorithms.
So that was it.
The whole basis was an accident clubbed with
a strong point that I have, which is to teach.
So that is why I kind of took that forward.
It takes a good amount of time, it consumes
a good amount of time to create videos, but
I enjoy the process thoroughly.
I mean, there's nothing that I don't like
about the process.
At the end, the total comments, the total
likes that I get, kind of motivate me to create
more videos on machine learning.
And yeah, I mean, that's the whole journey
of how I create videos.
So that's, that's amazing.
By the way, we know our mentors are a very
big part of the GreyAtom ecosystem.
So okay, next.
So the journey to 40-40!
Congratulations!
It's a huge honor.
And we're really excited at GreyAtom you know,
for your achievement.
What did it take for you?
What generally does it take to make that grade,
you know, for being recognized?
Yeah.
So I kind of, so there are factors that contribute
to people being selected as a 40 under 40
Data Scientist, for me, one of the driving
factors was my open source contribution with
respect to the videos that I create.
So again, a lot of things that kind of, say,
went my way were contributed to the content
that I create, to the reach that I have through
the content that I create.
And also the work that I've been doing at
my organizations with respect to machine learning,
deep learning, and the different things that
I've picked up on the way.
So I've also reviewed a couple of books with
different publication houses.
I've reviewed a couple of books on machine
learning and on ensemble learning as well.
And looking at the contributions that I'd
made so far, when I was nominated by one of
my colleagues for the 40 under 40 Data Scientist
award.
Then the selection panel kind of evaluated
it, evaluated me on a different parameters,
such as the open source contribution, what
is the overall impact of the work that I've
done in terms of business, in terms of returns
on investment, that is made on machine learning.
That is how I guess I could kind of go through
the whole process that was there.
Okay.
That's really something.
All right.
So now we go on to like the real meat and
potatoes of this entire webinar which is the
thousands of people who want to transition
to data science.
Firstly, should they find a mentor or sign
up for a course?
What is the first step they should take if
they want to go into data science?
So, for me the first step that I would say
every time that I recommend people picking
up data science is how good you are with respect
to mathematics, understanding mathematics.
Everything can fall in place, if there is
an interest to learn mathematics, if there
is an interest to understand what you're doing
in terms of mathematics.
A lot of times, in the current organization
that I work for, I kind of interact with my
boss more in terms of statistics than actual
code, or we don't even discuss a lot about
the coding practices which are there.
So, the discussions are mostly centered around
mathematics.
So if you like, or if you have a knack of
understanding mathematics, you have a liking
towards statistics, that is what is going
to drive you in the long run in this field
of data science and machine learning.
So the first and foremost step that I can
think of is that having that understanding
and having that liking towards mathematics,
but a lot of people with, say 5-10 years of
experience, I know it's difficult to go back
to the roots and say that I really liked mathematics
at one point.
But if you're willing to put in that set of
effort that is there, which goes behind you
making that transition.
I think it is a good step to make in terms
of joining an institution as well.
The reason being there is more collected energy
that goes in, you have multiple people with
the same aim of transitioning into data science.
So the first question that I'll always ask
anyone who is making or thinking of making
a transition to data science is mostly how
good or how comfortable are you, if you are
thrown with mathematical equations.
If you had that knack of taking it up, maybe
not say immediately, but in some amount of
time, after some revision, then data science
is really good field for you to be in.
Okay, and the importance of, you know, an
institutional learning because there's just
that much content out there.
So do you feel that having a course is helpful?
Yes.
I mean, there's tons of resources out there
for every topic that you can search for.
I mean, say for example, exploratory data
analysis or a PCA-like algorithm or anything
of that sort.
You will have tons of resources available,
but what to pick out from where, and where
to start from?
What's the starting point?
Should you start with a language first?
Or should you learn for Python first or should
you learn mathematics first?
All of that comes into place when you join
an institution, when you have something that
is channelized, and which is goal-oriented
as well.
So yeah, I mean, joining an institution is
always helpful in terms of learning data science
and machine learning in a more structured
way.
The only thing that is there is the amount
of motivation and the amount of effort the
learners can put in.
That is what kind of takes the whole journey
forward.
Okay.
What do you think is the right temperament
for becoming a data scientist?
The ability to fail and then get up again
every time.
The whole process is really difficult.
There are things that keep coming up.
So going back to the automation I've recently
picked up.
There’s this famous framework called Selenium
that I was kind of using to automate a process
in terms of data extraction and data manipulation.
But one thing that I discovered is there is
a new library in town, which is built over
that Selenium library, which kind of simplifies
the whole process of, say automating the process.
So the idea is you have to keep learning,
you have to be open to learning new things,
you have to take the whole thing, one step
at a time.
Wherein you should be okay to fail in this
field, because a lot of questions are still
very much unanswered.
But you should always be open to trying out
different things and coming up with simpler
solutions that may or may not work for you.
So one of the main defining factors that I
look in when I kind of start hiring for people
is the ability of someone to say how well
can you rise from a position where you're
down and out.
How much can you rise from that position and
come up with a small little concept or a small
MVP that we can showcase to our businesses
well.
Okay, that's solid advice for life as well.
Okay, and what skills should people build:
the must-haves, the good-to-haves?
You've mentioned statistics and maths but
beyond that, technical and specific skills?
So technically, I would say if you are kind
of aiming for a proper machine learning-based
role or a data science-based role, one thing
that I really recommend all of you to build
is your concepts in terms of mathematics.
Till the time you don't understand the algorithm
end-to-end with respect to the mathematics,
then you're just playing around with libraries
and you're not doing a lot of good for the
company as well.
At the end, I'm pretty sure a lot of you would
directly use the libraries — which is there
is no harm doing it because essentially, they're
created for the purpose of people using it.
But when things start going wrong or when
you're not having the right model, you should
know what is going on behind the scenes of
that algorithm to understand how it is functioning
and how you can kind of modify the whole algorithm
code to take it from there.
So, one essential skill that I always look
in people who are making that transition is
the ability to understand the mathematics
behind algorithms, which is very important
according to me.
Okay.
Do you have any advice to say to freshers,
experienced folks, anybody, about this job
or transitioning to data science?
An advice that I can readily give over at
this stage is start enjoying what you're doing
in terms of data science and machine learning.
Go beyond the hype that is there.
I know there is a lot of hype in the market
that is the best job around.
It is but once you start enjoying the process
of creating models or creating solutions that
are making an impact.
That is where you kind of start succeeding
in the initiatives that you have taken in
terms of transitioning into data science and
machine learning.
So don't think of it as a new career path
that will give you a good amount of money.
Money is like a byproduct of what you contribute
to the organization.
But essentially, start enjoying the workflow
of what you're doing, using, say, data science
and machine learning.
Like as a hobby project, something that I’ve
picked up recently and executed is I created
a convolutional neural network for detecting
Braille characters, which is for people who
cannot see, for people who are visually impaired.
What I've done is I've created a small model
that extracts out individual letters from
the transcript of the people who cannot read.
So there is this language of dots, which they
can kind of read directly by touching the
transcripts.
So what I've done is I aim to create an Android
app out of it or create a Google Assistant
action for that.
So ideally, what can happen is you can just
point your cell phone to that transcript,
and it would kind of read or understand the
language first, extract out the simple characters
which are there, and then read it out in the
form of audio.
So this is something that I've currently executed
independently.
This is not tied to an organization as such,
this is more of a hobby project.
So start picking up such hobby projects as
well.
I mean, that keeps you motivated in terms
of doing different things which can make a
significant contribution as well.
Okay.
That's, that's amazing.
So um, you know, you have been an amazing
mentor, even people at GreyAtom keep talking
about what an amazing mentor you are.
Do you have any mentee transition stories
that you can share with us?
Yeah, I mean, a lot of them in fact, I mean,
for people who have studied or for the people
who I’ve taught so far.
A lot of them have reached out to me once
they've gotten or secured a job.
A couple of them who have kind of entered
the proper machine learning-based roles in
companies like Quantiphi and Fractal.
A lot of them that I kind of mentored, they've
kind of made the right successful transition
from their previous software engineering role,
or a business analyst role to a proper data
science or machine learning engineer role.
So yeah, I mean, I've had a lot of people
who have made that transition.
So any extraordinary stories from unusual
backgrounds.
Like you said, right, that people need to
understand math.
So sometimes we have people who come from
many varied backgrounds and they've absorbed
the math, they’ve absorbed the Python and
things.
Any personal stories you have to share about
those kinds of transitions?
So I remember — I don't remember the student's
name — he was from an arts background, and
if I recollect, he was in the HR domain.
He was managing the operations for a company,
and he had decided to do a course on data
science and machine learning.
So I was part of the initial discussion that
he had before joining the course.
And luckily, I was the mentor for his batch
as well.
So I could see his transition, there was a
good amount of struggle for him because not
coming from a coding background, you essentially
have that feeling that will I be left out,
since a lot of people in the batch are from
the engineering background or from backgrounds
which have a statistical base.
Coming from entirely a different background,
he did put in his bit of effort.
He went out of his way; he used to go through
different online courses as well.
He started writing blogs as well.
And suddenly, one fine day, he messaged me
that he's made a transition.
He's working as a data scientist in a company.
So yeah, I mean, that was something that was
very special to me from an entirely non-programming
background to transitioning into data science
is like a very big achievement, according
to me, so he's put in his bit of effort as
well.
I mean, with all due credit to him, he's made
like a very good transition from where he
was to where he is now.
Okay, that's extraordinary.
So somebody who's looking for a job, let's
say, what should they be looking out for?
Company-wise, role-wise, you know, any tips
that you would have?
So ideally, when someone's looking for a job,
understand what the company is doing, your
alignment of what you want to do in that company,
and what the company is doing is very important.
There are companies that are okay with a simple
automation-based task as well and they would
kind of classify that person as a data scientist.
So it is all dependent on what you want out
of the organization.
What is your vision or what is your take on
what data science is and what do you want
to execute for a company.
Try to evaluate what the company is doing
in terms of data science and machine learning.
Try reaching out to folks in the team who
may or may not be interviewing you.
Just get a hang of what the company is doing,
say, understand what their product is, if
you're targeting a product-based company;
if you're targeting a service-based company,
try to reach out to people who can give you
some insights of the projects that the company's
taken up in terms of the services that they're
providing.
How much time do they allocate for research,
if at all, they're allocating some time to
research as well.
What are the different cutting-edge technologies
that they're working on?
So a lot of factors go into deciding which
company you want to join, but the main essence
is both the goals have to be aligned in terms
of the organization and in terms of what you
want to solve.
If you are looking out for like a very high
fundo ML-based role and the company is okay
with like an analytics or an automation as
data science, then there is a mismatch that's
gonna rise in some time.
So it's okay to wait but get that ideology
match with respect to the company as well
as your ideology that's there.
So that is something that I would highly recommend
to everyone.
So this is from a candidate point of view.
Now from a company point of view: You've taken
interviews, you do hiring.
So what do you judge when you interview somebody?
And what should a candidate evaluate when
giving an interview?
So essentially, I have a lot of weird ways.
So I kind of conduct interviews in a very
different way.
Not the normal, say, 10-12 rounds where and
I'm not yet sure if I have to hire someone
or not.
My idea is I have one face-to-face interview,
wherein I ask about projects that are mentioned
in the resume.
So one tip that I can give everyone out here
is, whatever you write in your resume, is
something that is very important.
You have to basically own up to every project
that you’ve mentioned.
So if you mentioned a simple concept, like
term frequency and inverse document frequency,
you should know or you should be able to defend
what that algorithm is over a count vectorizer.
Or any bit of detail is what you have to understand
and give out the answer as well.
So the basics again — it all boils down
to the basics — that you have in terms of
the clarity, in terms of concept clarity.
So I look at concept clarity, I look at how
much clear the algorithms are, say from a
mathematical perspective as well.
There's one situation when I had to create
or I had to write my own outlier-based algorithm
or time series-based anomaly detection algorithm.
That is the place where my mathematical skills
were put to test.
If I would have just been someone who is kind
of say heavily reliant on libraries, then
executing what I've executed so far wouldn't
have been possible.
So that is why one thing that I really look
at in every person that I interview is his
clarity in terms of mathematics of algorithms.
Then I also run through a very weird round
called a Google search round.
So I basically have a simple data set with
me.
And I want someone to manipulate that data
set and bring it in a form that I want to
consume, which is like a real business problem
that I kind of hide some amount of data and
I give that laptop to the candidate that has
come in for a face-to-face interview.
And I expect him to use Pandas or NumPy, the
basic Python libraries and to come up with
that solution.
It's an open laptop, open book, open Google
kind of a test that I conduct.
And for people who are unable to solve it,
then I don't go forward with them.
But for people who are kind of quick in coming
up with solutions using Google search, then
I happily hire them as well.
So yeah, I mean, that's how I kind of evaluate
candidates, because not everything can be
memorized.
There are so many functions in Pandas, NumPy,
that, at least at the human level, you cannot
memorize.
So ideally, what you want to solve, on that
basis, the start point and the end point is
what is defined by a business and the end
goal of what your business wants to achieve.
What you decode in the journey is where your
expertise of struggling with problems, coming
up with solutions using a Google search, how
effectively do you use code from StackOverflow?
Do you just copy paste?
Or do you modify it for your use case?
All of those skills is kind of evaluated in
that Google Search round that I do.
So that is how I evaluate candidates.
Okay, that's very practical.
So your emphasis is really on whether somebody
can take a concept and apply it.
So the application is very important.
Yes.
All right.
And that I kind of recommend, and I've also
realized this as well.
I don't know if there are people from different
companies who are a data scientist, say watching
or going through this or not, but ideally,
what I really find a lot of people do is a
lot of people say: “Go through this simple
concept”, and not having, like a proper
direction in terms of where they want to lead
this interview to.
So, say a lot of companies ask questions or
a lot of interviewers ask questions, which
are heavily bent on what they know as compared
to what the candidate knows.
So essentially, I've always made this a point
to keep the interviews in such a way that
I only focus on what the person has written
in the resume, which he knows and not what
I know.
So this is something that I always take care
of in terms of conducting an interview.
I understand.
I see where you're coming from.
Okay, so our last few questions before we
move on to the participant questions; we've
received quite a few.
What do you think are the factors that drive
demand for data science professionals in India
today?
One of the main factors that is driving a
lot of demand is the data that is generated.
So, data is being generated from every possible
source.
If you think of an e-commerce company, then
the e-commerce company decision of launching
new products is data-driven today.
Say for example, if there is a new product
that is launched, then how is that product
being recommended by people who are using
it?
Is it actually a good product or not is something
that is formulated using the reviews that
you collect from Amazon or the different ecommerce
sites which are there.
So, essentially what happens is a lot of companies
are making decisions which are data driven.
So, previously business was more of an intuition-based
decision that people used to take.
But nowadays even business decisions are backed
by data.
Running a simple marketing campaign, does
it add to my revenue or does it decline my
revenue?
All of these factors that are considered is
this essentially something that is data-driven
now.
So, that is why the demand for a data scientist
who can kind of play around with data, create
data in the right format and then come up
with solutions is really essential in terms
of driving business.
And considering the pace at which data is
evolving and that data is generated every
second — you will have, say, a Facebook
post, going live every second, you will have
reviews coming up online.
Data is generated and for driving business
now, you will need data to kind of come up
with the right insights, what are right models
to solve business problems.
Okay.
So now we've talked about a whole bunch about
what we should do and what data science should
do, and so on.
Are there any things that they shouldn't do?
Like, what are the commonly made mistakes
aspiring data science make when they're building
their career?
One of the main things where I feel a lot
of people go wrong is they get carried away
by a lot of things that are happening currently.
So there's a good amount of research that's
happening.
So every day Google is releasing a new model
in NLP, then Microsoft comes up with a bigger
model and says this is more accurate than
this.
There is a good amount of research happening.
And it's like a really golden time for people
entering the field because you have a lot
of automation already made in terms of data
science and machine learning.
But essentially, everything that kind of comes
down at the end is your basics.
So if people don't focus on the basics, like
what are term frequency or a count vectorizer
is, or why an X algorithm is advantageous
over the Y algorithm.
Essentially people are just relying on predefined
packages, and there are chances that once
that person is hired, he may not be able to
add a lot of value to business as well.
So my only recommendation is keep the basics
going well.
Focus on the basics, which is where a lot
of people go wrong.
If you don't understand simpler concepts then
there is no point talking about a neural network.
So, yeah, I mean, for me, what stands out
is how good your basics are not every task
is meant to be solved using a complex neural
network or a complex convolutional neural
network.
Some tasks can help you achieve a higher accuracy
or a higher metric, using simpler algorithms
as well.
So, it all boils down to how good your basics
are.
So have a solid foundation is basically the
point.
All right.
Okay.
So, we finished with your questions for the
moment.
Now, a lot of people have written in, wanting
questions of theirs answered specifically.
So, from the top, yes, we have one from Chinmay.
He has two questions.
The first one is: is the role of SQL underestimated
for a data science job role.
Many profiles as well as most of the interviewers
have exceptionally high focus on advanced
SQL, whereas not many programs offer it exclusively
for data science program courses.
So again, data science as a role itself is
very diverse in terms of what the company
expectations are.
So before I kind of jump into answer this,
what I would highly recommend everyone applying
for jobs is to go through the job description
really well.
Again, it is more about the alignment of what
the company is executing, and what do you
want to execute being a part of that organization.
So having that clarity kind of gives you an
insight as to how the interviews are gonna
shape up.
Coming to the question in terms of SQL.
SQL is a very important piece of what you
do in terms of data science, because till
the time you cannot extract data from a proper
MySQL database, or different databases which
are there, you essentially don't have access
to data, so you cannot build the fancy models
as well.
So SQL is an integral part of the whole data
science journey.
But again, it is not the only thing that is
there.
So that is why in terms of giving emphasis
to it; so again, a lot of places you will
find SQL topics, or you will find a lot of
places where you get SQL tutorials, or there
are some sections in different organizations,
which covers SQL as well.
But the idea is having that in depth knowledge
is something that comes with experience; it
comes with practice.
So, ideally, if you can kind of put in your
bit of effort, after having a start is what
companies are looking forward in terms of
SQL understanding as well.
So, Chinmay has a second question.
So we've covered this a little bit before,
so if you can just, you know, summarize again.
Almost all industry experts and academicians
say that data engineer, data scientists and
machine learning engineer are three independent
job profiles with some of the work overlapping
in transition.
On the other hand, fresher job profiles expect
candidates to be proficient in all three work
areas.
Why is that?
So the simple answer is that the idea is people
are taking a risk.
For someone who is hiring for a fresher, he
is essentially taking a risk on someone who
has very little to no experience in data science.
So ideally, you would want someone who has
some mixture of everything so that he can
contribute to the overall business, in terms
of deriving insights and deriving a return
on investment from the whole experience.
So ideally, companies are expecting a lot,
I don't deny it, in terms of the expectations
companies are kind of increasing.
Say how do you kind of increase the total
return on investment for even people who are
just fresh out of college?
So that's the reason why companies expect
everything.
But ideally, good companies that have the
intent of hiring freshers would kind of evaluate
a lot of good projects that you have on your
resume as well.
So, ideally, yes, you are expected to know
a bit of data engineering, because ideally,
even if you create a model, that model is
not of great use if you don't, if you're not
able to productionise it.
So that is where your data engineering skills
come in, in terms of DevOps, in terms of deploying
the model, and in terms of data extraction
as well.
Not a lot of companies have a separate data
engineering team.
So you are the one who has to make sense of
data.
So ideally, you will have to capture data
as well.
So yeah, I mean, hopefully that kind of gives
a good insight of why companies want everything.
But if you know one thing really good, and
if you have bits and pieces of knowledge of
different things as well, then you're good
to go in terms of the company.
So the next question is from Kriti.
She says: in my understanding people from
different working backgrounds can do the career
transition to data science.
But what about people who have graduated long
ago and have no proper work experience?
What are their chances for them to restart
their career in data science?
So, long story short, I mean, the whole idea
is if you have 10-15 years of experience,
you would be like a very specialized person
in terms of one domain that she's handled
so far.
So I'm assuming she has 10-15 years of experience.
Then the best possible route that I see with
people with so much of experience is to think
of data science and machine learning as a
stepping stone in their current roles itself.
So data science and ML is not like an isolated
field wherein it's most centered around one
field.
It is kind of making an impact in healthcare,
it's making an impact in e-commerce and the
different fields which are there.
So ideally, the best possible place or the
best possible way that they can transition
is pick up skills from data science courses,
pick up skills from different channels, understand
how algorithms can make a difference in their
current job profile, and then try to kind
of start showing impact to business using
data in their current role and then make a
proper data science related transition.
That is how I see the transition coming around
really smoothly.
I see.
So to join data science, integrate data science
in their domain and then move on to pure data
science thereafter.
Yes.
Great.
All right.
Next question is from Sandhya.
I'm looking out for opportunities after a
family break of almost 10 years.
I’m doing an online course, which is quite
comprehensive, but so much knowledge together
is really difficult to remember.
I’m also doing some projects but need a
lot of help from Google.
How do I cope with the interviews?
So with the interviews as well, try to bifurcate
which companies that you want to target and
what is the kind of role that she's targeting.
Apart from that as well.
Yes, data science is not simple.
I mean, it is overwhelming to have so many
things given to you at once.
So you'll have a mixture of statistics, you
will have a mixture of coding, you'll have
a mix of creating algorithms, maybe from scratch
as well.
Then you will have to work on data sets.
You’ll have to derive insights, so it's
a multidisciplinary field.
What you have to do is you'd have to manage
everything out and try to pick out which field
or which companies are most suitable to you.
And then kind of make an overall transition
in terms of data science and machine learning.
It's an overwhelming field, it's not something
that is like a one-go solution if you do a
particular course.
It's more, say, it's very diversified in terms
of creating insights.
So yeah, I mean, if she can kind of go through
multiple courses, complete or participate
on kaggle as well — I mean participate on
kaggle competitions — go through kernels.
I mean, that is very good source of information
as well in terms of how do you better model,
say, create good models using the data that
is there using simpler algorithms.
Go through the kernels itself on kaggle.
That is a very informative point to start
off a career in data science and machine learning.
Okay, great.
Thank you for that.
Amanpreet has three questions.
The first one is does a data scientist need
to be a good statistician and then a software
engineer, or is it vice versa?
Ah, okay.
So it's a tricky question, but I would say
companies expect you to be good at both.
For companies who are trying to derive business
or trying to evaluate how good you are in
terms of deriving business.
So, essentially, everything boils down to
money, if the company's not making money through
data science and there is no point of having
people in data science as well.
So ideally the company wants to see that the
return on investment is there.
That is where a lot of things that boil down
to at the end is how much impact you can create.
So for every company the impact is different.
Some companies are okay with a proof of concept
or a business insight given to them or a regression
model that is not even deployed there.
So there they don't expect you to be like
super champions in terms of coding, but your
statistical knowledge kind of gives you that
insight that this feature for a defaulter
model is very important as compared to the
others which are there.
So that is the place where your ML skills
or statistics skills kind of override your
software skills.
On the other hand, if you think of the software
skills that are there, then there are companies
that are kind of automating the whole data
science workflow as well.
That is where your data science skills are
good enough, your statistics are, say 19 out
of 20, then you're still good if you have
a good software engineering base.
Your software knowledge can be used in terms
of like a full stack developer as well.
So companies would also want someone who can
create models; may not be the most accurate
models, which the person can learn over time,
but he can help us scale that solution using
that machine learning model in terms of your
final output as well.
So it again depends on the company.
But yeah, I mean, having a mixture of both
of them is something that is advisable.
Okay.
Amanpreet’s next question is very interesting.
He says: How can a data scientist deal with
unreasonable management expectations?
So again, it's a very valid question.
Every company has very different expectations
from a data scientist.
And people want to see the return on investments
quickly, which is where there is a lot of
gap in terms of what a data scientist expects
and what the companies expect out of them.
So ideally, it is good if the person can start
creating some proof of concept, start pushing
things from his end in terms of seeing the
overall business impact.
So every new feature, if the person wants
to add, if there's any model that he can think
of can add value, then essentially what that
person can do is create a small proof of concept.
Say, do an A/B test or start throwing output
from that model, start showing the overall
business impact, again, through data is where
you can kind of come to a reasonable understanding
of how much this model would add impact to
the overall workflow in terms of data science.
So yes, I mean, handling business expectations,
or management expectations is a tough thing
to crack.
But you have to figure out a way in terms
of getting that right balance of creating
models and trying to justify what that model
is in terms of business as well.
Okay.
Next question: how important is it to add
a context around data for a data scientist,
or is just having data sufficient for a data
science to run his or her algorithms?
So ideally, if you have not understood your
data, then you cannot even create new features.
So ideally, data just as a whole is not very
insightful.
And ideally, you won't Want something to come
out of that data.
So adding context is really important.
So I'll be very honest.
And this is something that my previous boss
used to say.
He said this a long time back that no one
likes watching data.
It’s a fact.
No one enjoys having numbers on his shelf
every day or having Excels only.
Can things become more automated in terms
of deriving the right business insights automatically
using data?
So having that context of data, delivering
data in the form of something which is actionable
becomes really important.
So even if you don't create the most fancy
algorithm, even if you don't use Xgboost or
say boosting algorithms, can you come up with
insights that can make an impact to business?
Say for example, if I am getting a lot of
ad revenue from Maharashtra, can I create
an advertisement around the state and then
push it forward for making more improvement
in business?
So think of everything that ties back to your
overall business?
So, how much of an impact does data create
to the end business is what every data scientist’s
goal should be.
Okay, so we have a question from Ashwini and
from Ajay.
How are data science and machine learning
connected?
So essentially, data science is something
that is like a bigger field that encompasses
machine learning.
So machine learning talks about mostly building
algorithms.
So all the fancy things that happen, that
is, you have your research that is going around
in computer vision, you have your good amount
of deep learning models that are being open
sourced day in and day out.
All of this is something that is fancy machine
learning, which a lot of companies are wanting
to use, but they also want an additional bit
of help from the guy that they hire that is
in data engineering.
People should be able to extract data.
People should be able to create models on
their own.
People should be able to automate a process.
People should be able to create websites.
People should be able to create web applications
out of the models that you create.
So, essentially data science encompasses everything
you will have like a business analysis-based
decision that is there.
You will have more of model creation activities.
You will have insight generation activities
in form of data science.
Machine learning engineer focuses more on
the research part.
If there is an end-to-end research project
that is coming up, machine learning engineer
wouldn't care much about where the data is.
The data would already be given to him.
His job is to come up with the most scalable
and the most accurate model that is possible
at that particular point of time.
Okay, several of the questions now — I'm
guessing this is because people are working
in these profiles — Ashwini is working in
a support profile, I presume.
Somebody else is a software developer and
Nikita Soni is from software.
There is another software engineer as well,
and as a BI analyst.
And they all want to know how to transition
to data science.
Now, unfortunately, we are running out of
time so we can take it individually.
So can you give a generalized approach to
this?
So ideally, what I would recommend is, again,
start with mathematics.
One book that I really admired is this book
called Business Statistics by Ken Black.
A PDF is like very readily available.
And you can purchase that book as well.
Get the fundamentals of statistics clear,
because that is part of every interview that
you give, irrespective of the company, if
it's doing some heavy duty work or heavy duty
lifting in terms of software or not.
Your business sense is very important to understanding
the different domains which are there and
in which domain do you want to pursue your
career in data science, finalizing that.
Secondly, finalizing your interest in terms
of where you want to go.
Are you more inclined towards computer vision?
Are you more of a problem solver, or more
of a generalist that is okay to pick up any
problem as in when they come in?
So the first starting point is to get statistics
laid out well.
Statistics is something that will help you
understand all the fancy things that are going
around in the market.
Once the statistics piece is set up, then
start going to kaggle kernels, start participating
in the different competitions that are happening
on kaggle.
Understand algorithms really well, after you've
done the statistics part, don't jump into
the fancy algorithms like Xgboost.
They're really superb terms of the accuracy
that they provide, but what I would still
recommend is can you solve this same problem
using a simpler algorithm like logistic regression
or a decision tree or a random forest?
Can you understand them?
Can you help business understand how an algorithm
works because, essentially, for business,
they don't understand the complex, fancy mathematics
as well.
So you have to understand the mathematical
part of the algorithms.
Once you're clear with that, then you can
start participating in different hackathons.
And come up with good ranks on there, on that
platform, and then kind of apply for jobs
and then take it from there.
So the path is not easy.
I mean, I would be lying if I say it's very
easy, but if you put in your bit of the effort,
that journey would kind of become much more
streamlined.
So that is something that I can recommend.
Great.
Thank you.
There is one question over here, which you
kind of answered in part, but it's a very
specific one from Rajesh: What are the top
five ways or forums where an enthusiast or
beginner data science person can work or take
up side projects besides their full time job?
Ideally, more than, say, me giving out platforms,
try to think of things that you think you
can make an impact with using machine learning.
Since there is this big outbreak of Coronavirus
that's going around, there are openly available
data sets as well.
You don't have to go in for a fancy algorithm
right away, but can you create visualizations?
Can you create some fancy visualizations that
tell you or give you an insight of how that
outbreak is happening?
So try to come up with your own problems,
I wouldn't even recommend any portal specifically
right now.
Because portals mostly have everything very
structured, like a kaggle competition or different
competition places as well.
Everything's very structured, you'll have
a ton of CSV format.
So you're not going through the pain of extracting
data from a database, you're not modifying
the data as well.
So ideally, I would want people who want one
stop solution to come up with their own problems
that they want to solve.
I mean, I built that small Braille convolutional
neural network which reads out their transcript.
So you can attack different problems, there
are different things that you can think of,
in terms of achieving, using machine learning,
or even say simple business analytics or visualizations.
So yeah, I mean, I would refrain from giving
out five top sources but I would want people
to kind of come up with innovative ideas to
solve problems in general not specifically
to business.
Okay, something three people have asked for
something similar.
I’m going to club the question together.
Apurv, Sahib and, one more person, Siddharth
have asked for a learning path for data science,
mathematics, and other prerequisites.
And Sahib has gone further to say: How should
he start preparing for data science?
He works on Tableau, SQL, Excel and a bit
of Python.
Any full resource available because he finds
resources scattered everywhere and didn't
find all resources collated in one place.
So that is still going to exist, even if you
kind of make that transition.
Resources would be scattered everywhere.
So really, I cannot give you like a one stop
solution.
If I find something like that, nothing like
it.
But ideally, what I can recommend is, in terms
of, say, understanding in terms of learning
different technologies as well, people will
have to be open in terms of searching the
internet well.
So getting the broader concept is one thing,
that is the initial part.
If you're doing an end-to-end machine learning
flow, then if you start with a principal component
analysis as a concept, then you should be
open enough to explore 10 different blogs,
10 different videos, 10 different courses,
which kind of specifically teach that topic.
And that's how you kind of diversify your
learning as well.
One resource itself is not going to be sufficient
in this field.
So the more you experiment with things but
it's good to get the initial base in terms
of the main topics ready, and then start working
on it and start going through different content
which is available online.
That's a good point.
Okay, so Adil has a question.
He wants to know whether internship helps.
It does.
It does help a lot.
With the whole internship process in place,
what you get is a very specific project in
that two or three months — the duration
of internship.
I’ve had a lot of interns working with me.
So with internships, you eventually have to
give out good work to them as well because
interns are very young in this field and they
want good, motivated work.
So you don't want a business-driven work from
them.
But you want something that would add value
in the longer run.
So essentially, internships really help a
lot.
But before you even join the internship, it's
good to understand what the company is doing.
What is the end outcome of the internship,
what are they trying to solve using your internship
during your internship tenure.
So having clarity in terms of what is being
solved is good before you take up the internship.
But any internship is good enough to start
in the field of data science and ML.
Last question from Nikita, we've covered this
in the main thing, but any last like tips
for the first interview?
So one thing that I would recommend is, whatever
videos that I've created, is keeping this
in mind.
If you want to go through a concept quickly,
you can go through my videos and the last
4-5 minutes that is what I do generally, if
I kind of prepare for interviews.
But have like a good set of notes prepared,
which you can utilize as a last moment revision
resource, and be thorough with your projects,
whatever you mentioned in your projects, if
you are not sure about a complex algorithm,
like support vector machine, don't ever mention
it on your resume.
So if you've used it, you should be able to
own up to that algorithm as well.
So that is what I would recommend.
Okay, we have time for one last question.
Sudarshan is asking: Is there a freelance
option for data science?
It is there.
They're not very prevalent in terms of how
much because companies are a bit reluctant
to share data just like that.
But if you have a good amount of experience
in terms of data science and ML; because companies
are dealing with people will have access to
their data.
So essentially, it's not as widespread as
say how companies kind of, say, give out freelance
opportunities for creating websites and creating
different solutions.
But yes, freelance opportunities are there;
not as scattered or not as prevalent as in
software engineering, but they do exist.
I think we've reached our time limit.
We don't want to take too much of your time.
Thank you so much for this.
Any last words you want to close up with?
No, I mean, I had a great time conducting
this webinar as well.
So thank you to everyone who has been a part
of this webinar.
I mean, it really means a lot because, given
it's a weekend, given it's like between 11
to 12, there is a good amount of people who
have turned up for the webinar as well.
So a big thank you to everyone who has participated
in this webinar and hopefully we will conduct
more such webinars going forward.
Absolutely.
Thank you so much.
Thank you so much.
Bye.
