Hi, my name is Maya and I will be guiding you in the Orange Single Cell software tutorials.
In the Getting Started with Orange series, we learned the basics of data mining with visual programming.
This time we've prepared a brand new flavor of Orange,
tailor-made for the analysis of single-cell RNA expression data.
In this video, we will glimpse through scOrange to look at the basics of handling,
visualizing and clustering single cell gene expression data
First, we will close scOrange's welcome screen and minimize the toolbar.
Single Cell Orange comes with a database of example expression datasets.
Here, we will have a look at data on bone marrow mononuclear cells with AML,
and we'll use the data with a sample of 1000 cells.
Like any single cell data in Orange,
the bone marrow data comes in a table with cells in rows and genes in columns.
The first column in this data set is special and contains information
stating if the cell came from a healthy donor or a patient with leukemia
Notice that this data includes information about the replicant, ID of the cell and cell barcode.
The rest of the column store gene expressions.
Unfiltered single cell data is sparse and hence contains many zeros.
We'll use scOrange to find subgroups of cells.
Let's use Louvain clustering,
a variant of network based clustering that is often used on single cell data sets.
To see the results of clustering, let's use t-distributed stochastic neighbor embedding
or in short t-SNE.
In t-SNE, each point now represents a cell.
Let's color them by cluster label.
Although clustering was done on an original data set,
we see the cells in each cluster map together nicely in the two-dimensional t-SNE space.
We can select the data in this visualization and check the type of the selected cells.
It looks like the cluster we have found includes cells that are mostly from a healthy donor.
Are there any subpopulations of cells in our data?
If so, are there any marker genes that define them?
Let's use a set of markers from our database
and select those for natural killer cells.
We'll use this marker to score our cells.
Score cells adds a column to the data that reports the mean expression of the marker genes.
Let's visualize this in t-SNE.
I'll change the point color and size to correspond to the score.
T-SNE indeed finds subpopulations
and killer cells are nicely clustered together.
Great!
scOrange is all about designing workflows, where just like with Lego bricks,
even the combination of the simplest components can yield interesting results.
In our workflow I can change the selection of marker genes to, say, B cells.
See how the rendering in t-SNE changes?
Also B cells are clustered together.
This is just the beginning and there is so much more.
scOrange can do clustering,
differential expression analysis and gene ontology analysis,
data pre-processing and filtering,
removal of batch effects, cell type classification,
pseudo time analysis and so much more.
We've designed this series of videos to guide you through all of these.
In our next video, however, we'll start with the basics:
how to load your own single cell data.
