This video explains what data aggregation is. Data aggregation essentially consists of two steps:
First, the data grouping identifies one or more data groups based on values and selected features.
Then, the data aggregation puts together the values in one or more selected columns for each group.
Data grouping identifies one or more groups based on values in selected features.
In this example, we could identify all men, all women, or all groups of people with different opinions
about a given product.
Also, we could identify groups based on the two features gender and sentiment, for example, or women, or all
men with the positive sentiment about the topic or product. Data aggregation is
the second step. We have already identified the groups. We calculate now an aggregated measure for each group.
An aggregated measure can mean many things.
Just a count of points, or a percent of points in each group
with respect to the whole dataset. An average, a standard deviation, or any other statistical measure
on some feature, like for example age. A sum, like for example on the amount of all contracts in each group.
A list of products bought by each group. The number of days between the first purchase date and the last
purchase date, or many more.
Thank you.
