So another useful sort of
subcategory of record data
is document data.
So in this case, it
kind of is somewhat
similar to a data matrix.
Every term, every entry,
every data attribute
has a numeric value.
But in this case,
we've got counts,
we've got discrete values.
So in this case, what we have
here is each row, each data
object, is
represented by what we
think of as what we
call a term vector.
So this term vector in
this case and there's
several ways you can do
it, but in this case,
it just counts the
number of times
a given word appears
in the document.
So document 1 has team appear
three times, play appear five,
but coach appear none.
Document 2, on the other hand,
has coach appear seven times,
but never has play appear over
the course of the document.
So because these attributes
are all discrete,
because they're all
integer attributes,
we can do different
kinds of things,
different kinds of algorithms
and processing methods
are more appropriate than data
matrices or mixed data is.
All right, so the
last special kind
of record data that we're
going to talk about here
is transaction data.
So this shares some
similarities to document data.
And you can use some
of the same analysis.
But there's different
semantics around it as well.
So transaction data is
exactly what it sounds like.
It's record data where each
record involves a set of items.
So if we're at a
grocery store, the set
of products purchased
by a customer
during one shopping trip
constitutes a transaction.
And the individual products that
were purchased are the items.
So the difference between
this and document data
is that usually these items
have more information than just
a count associated with them.
So not only is it bread, there's
a price associated with that,
there's maybe an
inventory stock associated
with that, how many are left,
all of those sorts of things.
So we can do sort of things
similar to document analysis,
but there's other
sorts of information
we have to consider as well.
So that's transaction data.
