In our introductory video on Single Cell Orange
we used Orange's own small repository of single-cell data sets.
Here we will show you how to load your own data from 10x Genomics matrix data file.
Let's start with an example data set from the 10x Genomics repository of single cell data sets.
We can pick the data from the studies of acute myeloid leukemia
and compare the cells before and after the transplant of bone marrow mononuclear cells.
Let's download the filter gene and cell matrix for both,
unzip the files,
and name the directories to reflect the origin of the cells.
The 10x Genomics stores the single-cell expression data
in a matrix file that is accompanied by two files with annotations of genes and cells.
We will use the Load Data widget from Single Cell Orange.
The simplest way to find this widget, is to right-click on the empty canvas of Orange,
start typing the name of the widget,
and then once it's there,
select it and confirm the choice by pressing the return key.
Double-click the widget to open it.
Load data can open and align multiple data sets.
We start with our pre-transplant data and drag the data matrix file to the widget.
Double-click on the source name
and rename the data accordingly.
We do the same for the post-transplant data.
There are about 4,000 cells in each data set
and over 30,000 entries for the genes.
The data is sparse and there are only about 2% of non-zero gene expression entries.
Press Load Data button to invoke loading of the data from the files.
Ok, we're done.
What next? Obviously we would first like to see the data.
We'll do this by adding a Data Table widget.
Right-click on the empty canvas,
start typing the name of the widget,
and confirm the choice by pressing return
once the line with the widget is selected.
If we choose the wrong widget, like I just did, no problem.
We select the widget and remove it by pressing delete.
Ok again, here is my Data Table widget.
There is no data in data table yet, so the widget is empty.
Let's connect the output of the Load Data to the input of the Data Table widget.
Orange stores the cells in rows and genes in columns.
The first two columns are in beige and store cell barcodes and the labels of the corresponding data files.
Single-cell data is normally sparse.
There are many 0 entries in our data matrix.
Let's filter out the genes that are expressed in only a small fraction of the cells.
We will use the Filter widget, connect its input to the Load data widget,
and include only the genes that are expressed in at least 200 cells.
There are about 6000 such genes.
Now finally, let's check if we can separate pre and post-transplant cells in low dimensional visualization.
We will use t-SNE, a widget for a t-distributed stochastic neighbor embedding,
give it the filtered data
and check its visualization.
Each dot in this plot denotes a distinct cell.
Let's color the cells according to their data source.
Looks like the two types of cells mix globally,
but seem to form clusters locally.
We can verify this by clustering the original data
and then checking the composition of the clusters.
Let's do this by invoking network-based Louvain clustering
and a Box Plot.
Cool! The clusters seem to indeed favor one or the other cell source.
Our largest cluster includes primarily post-transplant cells
and our second largest cluster contains mainly pre-transplant cells.
In this video, we showed you how to load data from 10x Genomics matrix file.
We also showed you that Orange's Load Data widget
can merge data from different experiments.
In the next video,
we'll load tab-delimited data file from Broad Institute's Single Cell Portal
and show you how to assign cell labels according to their textual annotation.
