In these times, much of the information we have is digital, every day new information is created
at an incredible pace, because more and more people have access to digital media,
with which they create profiles on social networks, consume some kind of entertainment
and create material that, in the form of video, photo or text, gets uploaded.
Every time someone does all that, he or she leaves a part of himself or herself at some database in the world,
and that information says a lot about the person who created it.
Everything you do on the Internet and on your personal device has great value,
even if your searches or activities are boring.
Since you are not the only human being who does that, identifying the person or people
who have that in common is something important.
Data tells us stories and helps us to understand the world better, so a database of any subject is important
because it hides a pattern that can reveal something we did not know.
Data mining (DM), better known as "minería de datos" in Spanish, is precisely responsible for that,
discovering patterns in large amounts of data, which is why it is used mainly in statistics and computer science.
Many believe that DM is responsible for collecting, extracting, storing and processing data,
but it is not really that way.
Let us clarify: DM is not the same as "descubrimiento de conocimiento en base de datos" (in Spanish),
better known as knowledge discovery from data (KDD).
but DM is part of the KDD process, specifically once we have selected and purged all irrelevant information.
The only thing DM is in charge of is analyzing the data to extract relevant or interesting patterns.
But how does it find patterns?
In order to find something relevant in the information we have, it is important to use "classifiers",
which are models that describe how a database behaves.
Depending on its behavior, a label is created for the kind of data found in the information.
If this sounds familiar, it is because DM is a technique applied with artificial intelligence algorithms, such as
neural networks, linear regression, decision trees and others. We can say that DM is part of machine learning.
These algorithms are responsible for finding the necessary patterns
so that the information is properly labeled.
For that to happen, the classifier is given some training data and then it is evaluated to know
how reliable it is at labeling each datum.
If you would like to know more about this, I recommend you watch my video "What is machine learning?".
You may wonder, what is the use of finding patterns? As I mentioned, each piece of information tells a story,
and therefore has many applications, in all the fields you can imagine.
Normally it is given the use of predicting the type of information contained in a database.
For example, it can be used in medicine to know just how effective a medicine will be,
depending on certain data from the patient.
A bank can predict how reliable a person is for a loan.
A company can sell you certain products depending on your likes, the photos you upload, the place you live in,
the amount of posts you do... and many personal data that Facebook has sold them without your consent.
Data mining is a very powerful tool in this era, when more and more information is available;
which encourages us to improve and make technology more capable,
given that we all want to know which patterns are hidden.
With this, you can do both good and wrong, so… be careful with the information you give away.
I want to thank my patreons, like Atahualpa Bravo, Héctor Pulido, Jesús Adán, Kalcifer Vallarta,
Maximiliano Camilo, Shenka_007 and Yago Galleta,
by giving me the opportunity to do data mining in order to extract tacos...
I mean, data.
You already know that, if you want to support me, all you have to do is go to: www.patreon.com/MindMachineTV.
Thank you very much and until next time!
