
English: 
In the previous videos we talked about text preprocessing,
clustering and classification.
We worked with Grimm's tales, a data set I have prepared in a spreadsheet.
But working with spreadsheets and long text can be a pain.
Is there any other way we can import texts into Orange?
Of course there is!
This time I will work with Kennedy's speeches.
I have 17 of them in a Kennedy folder,
each in its own file.
Files can be Word documents, PDFs or plain text files.
Here for example is a speech from Democratic National Convention.
To load corpus into Orange

Serbian: 
U prethodnim klipovima, 
govorili smo o
preprocesiranju, klasterovanju i 
klasifikaciji teksta.
Radili smo sa bajkama braće Grim,
skupom podataka koji smo pripremili u
programu za tabelarna izračunavanja.
Ali rad sa ovim programima i tekstom
ume da bude naporan.
Postoji li drugi način da uvezemo 
tekst u Orange?
Naravno da postoji.
Ovog puta ćemo raditi sa
Kenedijevim govorima.
Imamo 17 njegovih govora u folderu -
svaki u zasebnom fajlu.
Fajlovi mogu biti Word 
dokumenti, PDF-ovi,
ili običan tekst.
Ovo je, na primer, govor sa
nacionalnog sabora demokratske stranke.
Da biste učitali korpus u Orange,

Serbian: 
pokrenite 'Text' dodatak, zatim postavite 
'Import Documents' operator na platno,
i pokrenite ga.
Kliknite na ikonicu foldera
i odaberite folder koji želite da uvezete.
Hajde da pregledamo
podatke u 'Corpus viewer'-u.
Evo ga govor koji smo videli ranije.
Sada možemo da izvršimo klasterovanje.
Koristićemo 'Preprocess text' operator,
'Bag of words',
'Distances',
i 'Hierarhical clustering'.
Ovo smo dosta brzo prošli.
Za više detalja o preprocesiranju 
teksta i klasterovanju,
možete pogledati naše prethodne klipove.
Izgleda da imamo dva 
zanimljiva klastera:
jedan o nuklearnom oružju,
a drugi o Kenedijevim 
predsedničkim obraćanjima.

English: 
open Text add-on,
place Import Documents widget on a canvas,
and open it.
Click on the folder icon
and select the folder you wish to import.
Let us observe our data in a Corpus Viewer.
Here's the speech we've seen earlier.
Now we can do some clustering.
I'll use Preprocess Text,
Bag of Words,
Distances,
and Hierarchical Clustering.
I was fast.
For details on text preprocessing and clustering
you can check our previous videos.
Looks like I have two interesting clusters:
one on nuclear arms
and the other with Kennedy's presidential addresses.

English: 
Clustering is fine, but what about classification?
Can I tell Orange some documents belong to one group
and others to the other?
Let us put Kennedy speeches into two folders.
say 'pre-1962'
and 'post-1962'.
Now, reload the folder.
Orange recognized subfolders as class categories.
If we observe the corpus in a data table,
we can see that Orange put 'pre-1962'
and 'post-1962' in the gray class column.
You can check our previous videos on text classification
to learn how to proceed.

Serbian: 
Klasterovanje je u redu, 
ali šta je sa klasifikacijom?
Možemo li reći Orange-u da neki dokumenti 
pripadaju jednoj, a neki drugoj grupi?
Stavićemo Kenedijeve govore u
dva zasebna foldera.
Recimo, 'Pre 1962.' i
'Posle 1962'.
Sada ćemo ponovo učitati folder.
Orange je prepoznao podfoldere
kao različite oznake klasa.
Ako pogledamo korpus u tabeli,
videćemo da je Orange smestio 
'Pre 1962.'
i 'Posle 1962.', 
u sivu, kolonu klase.
Možete pogledati naše prethodne 
klipove o klasifikaciji teksta,
da saznate kako da 
nastavite dalje sa analizom.

Serbian: 
Uvoženje dokumenata olakšava 
organizovanje fajlova u istraživanjima.
Danas smo naučili kako da uvezemo 
sopstvene podatke za analizu teksta,
i kako da definišemo 
klasni atribut, od nule.

English: 
Import Documents makes it so much easier to organize your files in your research.
Today we've learned how to import our own data for text analysis
and how to define class values from scratch.
