Hello and welcome to knowledgehut.
In this video, let us learn what is big data
and how it is classified.
Let us begin by defining the term- Data.
If you do a quick Google search then you will
find that “Data” is defined as ‘the
quantities, characters, or symbols on which
operations are performed by a computer, which
may be stored and transmitted in the form
of electrical signals and recorded on magnetic,
optical, or mechanical recording media’.
In simple words, we can say that all the facts
and figures which can be stored in digital
format can be termed as data.
All the text, numbers, images, audios, videos
stored in our phones and computers are some
examples of data.
They are all digitally stored and comprise
of zeros and ones.
Please remember that data is a plural term,
the singular term for data is .datum’.
The concept of Big Data is nothing complex;
as the name suggests, “Big Data” refers
to copious amounts of data which are too large
to be processed and analyzed by traditional
tools.
Since the amount of Big Data increases exponentially-
more than 500 terabytes of data are uploaded
to Facebook’s database alone, in a single
day- it represents a real problem in terms
of analysis.
Now you may be thinking that if this big data
is so problematic then why is everyone so
obsessed about it?
Well, the answer lies in the benefits it provides.
Here are some real-world examples of the ways
in which Big Data is used.-
Netflix collects user behavior data from its
more than 100 million customers.
This data helps Netflix in understanding what
every individual customer wants to see.
Based on the analysis it recommends movies
and TV shows which the viewer will love to watch.
As a result, the customer is happy because
he is getting what he likes without even searching
for it, and Netflix is happy because it has
delighted its customers which will result
in higher customer retention.
Credit card companies collect and store the
real-time data of when and where the credit
cards are being swiped.
This data helps them in thwarting fraud detection.
Suppose a credit card is used at location
A for the first time.
Then after 2 hours the same card is being
used at location B which is 5000 kilometers
away from location A. Now it is practically
impossible for a person to travel 5000 kilometers
within two hours, and hence it becomes clear
that someone is trying to fool the system.
These were just the two example, big data
has hundreds of different applications in
hundreds of different fields.
Be it banking, communication, healthcare,
media, advertising, manufacturing, transportation,
retail, Big data can be used everywhere and
this is why more and more businesses are trying
to harness its power.
Classification is essential for the study
of any subject.
So Big Data is widely classified into three
main types, which are- Structured, Unstructured
and Semi-Structured Data
Structured data
Structured Data is used to refer to the data
which is already stored in databases, in an
ordered manner.
It accounts for about 20% of the total existing
data and is used the most in programming and
computer-related activities.
There are two sources of structured data-
machines and humans.
All the data received from sensors, weblogs,
and financial systems are classified under
machine-generated data.
These include medical devices, GPS data, data
of usage statistics captured by servers and
applications and the huge amount of data that
usually move through trading platforms, to
name a few.
Human-generated structured data mainly includes
all the data that humans input into computers,
such as names and other personal details.
When a person clicks a link on the internet
or even makes a move in a game, data is created-
this can be used by companies to figure out
their customer behavior and make the appropriate
business decisions and modifications.
Unstructured data
While structured data resides in the traditional
row-column databases, unstructured data is
the opposite- they have no clear format in
storage.
The rest of the data created, about 80% of
the total account for unstructured big data.
Most of the data a person encounters belong
to this category- and until recently, there
was not much to do to it except storing it
or analyzing it manually.
Unstructured data is also classified based
on its source, machine-generated or human-generated.
Machine-generated data accounts for all the
satellite images, the scientific data from
various experiments and radar data captured
by various facets of technology.
Human-generated unstructured data is found
in abundance across the internet since it
includes social media data, mobile data, and
website content.
This means that the pictures we upload to
Facebook or Instagram handle, the videos we
watch on YouTube and even the text messages
we send, it all contributes to the gigantic
heap that is unstructured data.
Semi-structured data.
The line between unstructured data and semi-structured
data has always been unclear since most of
the semi-structured data appear to be unstructured
at a glance.
Information that is not in the traditional
database format as structured data, but contains
some organizational properties which make
it easier to process, are included in semi-structured
data.
For example, NoSQL documents are considered
to be semi-structured, since they contain
keywords that can be used to process the document
easily.
Big Data analysis has been found to have definite
business value, as its analysis and processing
can help a company achieve cost reductions
and dramatic growth.
So it is imperative that you do not wait too
long to exploit the potential of this excellent
business opportunity.
This was all about Big data and the types
of Big Data.
Don’t forget to subscribe to knowledgehut
for more such informational videos.
