>> You're not going to all miss
this episode of the AI Show,
where we talk about language,
we talk about recipes,
and we tie it all together with
search. Make sure you tune in.
[MUSIC]
Hello and welcome to this
episode of the AI Show.
My name is Seth Juarez, and I
have a special guest today.
Tell us who you are
and what you do buddy?
>> I am Luis Cabrera, I am
a Program Manager in
the Azure Search team.
>> Fantastic. So tell us,
Azure Search but special,
I've heard of something
called Cognitive Search.
>> Right.
>> We might have done some episodes,
but why don't you fill
us in on what that is?
>> That is right. So the
Cognitive Search part
of it has to do with the
ability to enrich information,
so that you can search
this information.
You actually did an AI Show on
the Knowledge Store recently.
>> That's right.
>> So now we're going even
beyond search scenario.
So actually in the screen right now,
you can see like several
parts of the pipeline.
You can see how we take data
from a variety of data sources,
then we actually crack,
we say Document Cracking.
We essentially know the format of
different file types to be able
to extract their text, their images,
we can read different data sources,
we can also read structural
data sources such as databases,
whether they're relational
or non-relational.
So then we apply
Machine Learning AI algorithms
that we call Cognitive
Skills to the information.
Sometimes let's say, that you
have some customer feedback,
you may want to do sentiment
analysis on that information,
or maybe today we'll be talking about
how we can translate some
information as well.
So essentially, you can apply
different Machine Learning models
to the information to extract
a tree of information that is
then projectable into
either the Knowledge Store,
or you can project it into
the search index, so
that you can query it.
>> So let's slow down,
let's bring this image back up,
because I've been having them
work on a demo for this.
>> Yes.
>> I ran into some things
and I have some questions.
>> Ask me.
>> So let's get the screen up
and let's ask a little bit.
Because here's the thing
that I want to ask,
the first thing is; so
this is for searching?
>> That is right.
>> Okay.
>> So Azure Search will index
your information so that you
can quickly retrieve the data.
>> Got it.
>> That said, people told us,
"Look, this is super valuable.
Once you enrich this
information like I really,
really like that you extracted
essentially structure of
what was unstructured data.
Can you please share
that structure with us?"
>> Got it.
>> That enables other scenarios
beyond Search such as
Analytics or Machine Learning.
>> Got it. So the first thing
I ran into is this notion of,
if I were just to do
regular Azure Search,
there is this thing
that happens called
Document Cracking,
and you see it there.
>> That is right.
>> What are you doing to the
document when you're indexing it?
>> Think of a PDF,
let's say that it's a
very valuable contract.
If you can find this contract,
that's a million dollar
difference, right?
>> Right.
>> Whether you can find it or not,
or amend it, or something like
that. Guess what happens?
This contract, it's a PDF,
and inside the PDF,
there are just a bunch of images,
so you need to extract the text
out of those for instance.
So the first thing that we do
as part of the Document
Cracking is we say,
is there any text in that document?
Because sometimes they
have structured text.
>> Okay.
>> Think of a Word document that
has both image and some text.
So we extract the text and we say,
"Are there any images
embedded in this document,
so we can extract the
normalized images as well?"
>> I see.
>> That means that we have
to understand not only PDFs,
the PowerPoints, Word documents,
a bunch of different file formats.
>> So what's happening is because
we understand a bunch
of file formats,
and this is before we
even get to the AI bit,
we have to crack this document open.
It'd be like here's the text,
here's the images, what other
things do they crack open?
>> So really is the texts,
and the images, and some metadata.
>> Okay.
>> Sometimes there is metadata
on the file themselves.
>> Like how old the file is.
>> The title, sometimes in a PDF,
you can have a title field for
instance or the
timestamp of the file,
but sometimes you can also
put metadata about the file,
not in the file itself but
let's say in Blob Storage.
>> I see.
>> You can have additional columns,
so we also know how to read those.
>> I see. So not even getting
into the cognitive part,
you basically take the
document as it is indexed,
you crack it open,
here is the text, here's the
images, here's some metadata,
and now we can do
glorious stuff with AI to make
that even more intelligent.
>> Exactly.
>> Okay.
>> Exactly.
>> Cool.
>> That's exactly what we do.
>> That's awesome.
So when we do that,
how hard is it to do this?
>> Well, I'm going to show you,
there is a very, very easy
path that you can follow.
>> Okay.
>> With upgrading a
single line of code.
>> Okay.
>> I'm going to show you a scenario
where sometimes you need
to translate content.
Imagine you have a Swedish company,
so all your technical documents
are written in Swedish,
and then it turns out that
your company just went global.
>> Right.
>> Now, you have support
centers in Mexico, in Brazil,
in Denmark, or here in the US.
>> Yeah.
>> These customer support
representatives may
do a query not in Swedish,
they don't know Swedish.
>> Yeah, I don't know.
>> So they would type
English most likely, right?
So wouldn't you want all
your technical document to
be translated to English
before you can search them?
>> It's obviously very costly,
but AI has made it a
little bit easier.
>> Right, it should be super
easy because we have
all the AI models.
>> What I mean by
costly is having humans
translates stuff, is
ridiculous expensive.
>> That is right.
>> Yeah.
>> Yeah, and the time also,
imagine having to hire all kinds
of people that speak Swedish.
>> That's right.
>> Yeah. So I have a
problem closer to home.
>> Okay, let's hear that.
>> So I'm Hispanic, I believe
you're Hispanic as well.
>> We are both Hispanic.
>> Okay. I moved to this
country pretty early,
and then when my kids were born,
my wife didn't speak much Spanish,
and my kids didn't learn
Spanish. Shame on me.
>> Camera two here, I am the same,
my wife is really and all my
kids don't speak Spanish,
so we will hang our head in
shame after this episode
on behalf of all of
most hispanohablantes.
>> That's right. So now,
there is one thing that is
harder to lose than language,
which is this wonderful
food that we cook.
My mom comes, and grandma cooks
you-all these amazing meals.
>> Yes, right.
>> Then when she leaves,
the kids are like, "We love
that dish that grandma makes."
Then we're like, "Which one, the
one that has chicken and sauce?"
>> That's right.
>> They don't even know what
the name of the dish is,
so my mom is amazing.
She said, "Luis, no problem.
I'll make a recipe book for you."
>> Okay.
>> So she started to give me a ton of
pages of different recipes
because she wants her
legacy to continue on.
>> That's right.
>> At least the food part of it.
>> Yeah.
>> So I thought,
"Would that be great for
my kids to be able to
search these recipes,
to find the recipes
that they care about?"
So it's exactly the same problem.
>> It's exactly the same problem.
>> All right. So if I am going
now to the Azure Portal,
you can see my Blob Storage account,
and you can see that
I have all kinds of
images that I downloaded,
and that's an example.
>> But they're all in Spanish.
>> Yeah, she put the
ingredients and everything,
that's some eggplant right there.
So I already put this
in Blob Storage,
I'm going to go to
my search index and there is this
beautiful "Import data" wizard.
So I'm going to click on that,
I'm going to say that I want to
index a Blob Storage account
because I already put it somewhere,
I'm just going to select it.
I believe, I put it in one
that is called luiscastore,
and I called it multiLang
because it has different
recipes in different languages.
So now I'm just going
to give it a name,
I'm going to call it "myrecipes" ,
and I am going to add
Cognitive Search,
or AI skills to enrich
my information.
So right now, the system
went to Blob Storage,
understood what kind of
data I have there in
order to tell me what enrichment
they can apply to this information.
>> So for those that are wondering
and looking for answers,
once you crack that document,
enrichment are things that you
can add to either the
text, the images,
or the metadata, or
all three of them,
so that the tree then gets bigger.
>> Bigger and bigger.
>> Bigger and bigger,
and then you can
search on more things. Okay, cool.
>> Exactly. So now,
just a few technicalities,
I do need to attach a
Cognitive Services resource so
that I can actually get billed
for these capabilities.
So I selected the CustomerDemosCS
resource that I had created.
This resource has to be in the
same region as your Azure Search.
>> Which makes sense, right?
You don't want to sending-
>> Exactly. Now, we get to
add all these enrichments.
So I'm going to say, let's all see
out the content because I know
that the content is just a bunch
of scanned images that
she sent over e-mail.
That's going to create
a Merged Content field.
So what does that mean?
It looks at the file,
if there are images inside it,
it extracts the text out of it.
If there is text in the file,
it will also use that and
merge it together into
one single big field that has text.
>> So in this case, if
you have information and
images even in a Word document,
then it will be able to find
the dark text as well
as the text in it.
>> That is correct.
Consider technical document
should have diagrams
showing things like that.
Okay. So now I can apply all
kinds of cognitive skills,
but today I'm going to
just do one of them.
>> That's new. Because I
haven't seen that one before.
>> That's new, exactly. The
new translate text skill.
You can call it
programmatically as well.
But today we'll do it
through the portal.
I can tell it what
language to translate it
to because this
presentation is in English,
we'll just keep the English language.
It's going to put that in
the translated text field.
>> Cool. Which is another
part of the tree.
>> Another node in that tree of
enrichments that you
were talking about.
That's all I'm going to do.
Just for awareness, you
could project that into
the knowledge store.
That is an AI show just for that.
>> Yeah.
>> So I am not going to go
into too much detail there.
Then I can click "Next" to
customize the target index.
>> Okay.
>> This will allow me to select
the different fields that I want
to have in my search index.
In this case, for instance,
you will notice that I
have my translated text,
and I want it to be
searchable and retrievable.
There's one text here, layout texts,
that has all the x y coordinates
for each of the pieces of text.
I don't need that in this case.
I'm just going to remove it.
One more thing that is interesting
in the translated text,
because I told it the
language of the translation,
it automatically picked the
English analyzer for it.
>> That's cool.
>> So we can do better indexing
given the knowledge of the language.
There are two English analyzers here.
That is the Microsoft
and the Lucene analyzer.
Why do we expose both?
So the Lucene one is faster,
the Microsoft one is better.
>> Okay.
>> So one is richer than the other.
Let's do the Microsoft
one in this case.
>> It's always a trade-off
whether you want speed.
>> It's always a
trade-off. Exactly. So we
figured we will offer both of them.
>> So for those who are watching,
because you mentioned
two terms that confused
me initially, indexer and an index.
>> That's right.
>> An index is a table that's
going to store all this stuff.
>>> That is right. The index
is what you actually will
query at some point.
>> Got it.
>> Right. So think of
it as having, yes,
it has certainly the information
about the different fields.
Internally, it's structured in
a way that is optimized
for fast queries.
>> Right.
>> The indexer is really what
connects the data source,
the skill set, or the enrichments,
and the index together.
>> I see.
>> So this is the orchestrators.
>> It's like the workhorse..
>> Yeah, exactly. Go the get data
from the data source, document,
crack it, enrich it,
and then push it into the index.
>> It's like the construction
worker of the actual search.
Because it has its tools,
which is the skill sets.
>> Exactly.
>> It bangs on the
data that comes out,
and put it in the index.
>> That's a fair analogy.
>> I love it.
>> All right. Perfect.
Now that we have that,
I'm going to give my index a name.
I'm going to call it Recipes.
I am going to create the
index or this Orchestrator.
For instance, I can tell it
whether I run it hourly, or daily,
or at some custom schedule.
Because sometimes I may
have millions of files.
So I can say, once a
day just check what's
new and then update the index.
I'll call it the Recipes Indexer,
and I'll click "Submit".
So I have either no less
than 100 and files here.
So it shouldn't take too long.
So actually it's going on right now.
It would probably take
about a minute or so.
But just to save time
while we are doing this,
I have the Recipes Index
that I just created,
and I have the Cooked Recipes indexer
that I pre-cooked ahead of time.
>> That's just awesome.
>> So let's just go to the indexes.
I have the Cooked Recipes here.
>> Nice. We're going to run out
of analogies here pretty soon.
>> That's right.
>> Yes.
>> So let's say that we're
searching for chicken recipes.
You can see that there is
some kind of eggplant,
and then the procedure.
You can see the text in Spanish.
So I was able to find
the berenjena recipe,
even though the text was not
in English in the beginning.
>> That's cool.
>> So I am hitting right
now the index directly.
So you can see I'm essentially
making a request on the index.
You can see the translated texts
and the score for each
of them and so forth.
>> Right.
>> I want to make you
aware of one more thing
because usually you will
not show this in just
a REST call, right?
So there is a solution accelerator
that we have put up for you. If
you go to aka.ms/kmsolutions.
The solution accelerator
has many capabilities.
It can teach you how to
create custom skills,
but it also has a web UI template.
>> Okay.
>> That web UI template,
it's a small application
that has a little front-end.
Essentially, I just modified
my appsettings.json.
Let me make this a little
bit bigger for you,
essentially a set of
credentials there.
>> I see.
>> Your storage
account, your service,
the key to hit the
service, and so forth.
Then run it already so we
can see what happened.
I believe I have it right here.
>> So this is
just a nicer way of querying
the indexer, it looks better.
>> It's so nice to
visualize the results.
Just to connect the
scenario end to end here.
Let's say I have some
beets in my fridge.
Let me make this just
a little bit bigger.
So I'm going to search for beets.
Yes. So for instance, it will
show me my mom's enchiladas
right here that have beets.
It turns out that Guatemalan
enchiladas have beets.
>> Really? I had no idea.
>> Yeah. They're delicious.
>> I'm learning cultural
and technical things.
>> That's right.
>> This is awesome.
>> So in retrospect, what did we do?
We learned a little bit
about cognitive search.
We learned about this new skill,
the translate skill that makes it.
We didn't write a
single line of code.
>> That's right.
>> Makes it very easy for me
to find technical content.
We also learned how to visualize
that if you use the
solution accelerator.
>> Well, that is fantastic.
This has been pretty cool
for a number of reasons.
Number I, because I love Spanish,
and I should teach my kids Spanish.
Number II, because we
saw some cool recipes.
The thing that was new
because we learned
a little bit about this stuff before.
But the thing that was new for
me is that little check box that
has those extra skills
to do the translation,
which is really cool.
>> That's right.
>> All right. Well, where can
people go to find out more?
>> So you can go to
aka.mscognitivesearch.
>> Fantastic.
aka.mscognitivesearch.
Thank you so much for watching.
Thank you so much for
watching the AI Show.
If you have any questions or
comments, we'd love to hear them.
Thanks for watching. We'll
see you next. Take care.
>> Thank you. Bye-bye.
[MUSIC]
