starTracer is an R package for marker gene identification in single-cell RNA-seq data analysis. This package comprises two primary functional modules: “searchMarker” and “filterMarker”. The “searchMarker” function operates independently, taking an expression matrix as input and generating a marker gene matrix as output. On the other hand, the “filterMarker” function is tailored to work as a complementary pipeline to the Seurat “FindAllMarkers” function, offering a more accurate list of marker genes for each cluster in conjunction with Seurat results.
NOTE: Testing data from mouse kidney and human PFC data is up-loaded in Google-Drive(for reviewers). Other data (Human Heart, Mouse Kidney) could be obtained from Single Cell Portal.
A beta-version. This package will be able to used to search marker genes in
an efficient way. It takes an SeuratObject
or the direct output data.frame
of Seurat::FindAllMarkers()
function to search the ideal top N marker gene
with a higher specificity level and efficiency.
The function also provides a intuitive way to descibe the specificity of the selected top N marker genes.
starTracer could now be downloaded from github via the following command:
if (!requireNamespace("devtools", quietly = TRUE)){
install.packages("devtools")}
devtools::install_github("JerryZhang-1222/starTracer")
starTracer
could be used to search Marker Genes from the single-cell/nuclear
sequencing data. Marker genes are defined as the most up-regulated and
most-specific genes in each cluster. starTracer
will be able to generate the
topN marker genes for each cluster in a effective way. The two most commonly
used function are:
starTracer::searchMarker()
: a de-novo pipeline. The function takes in a
Seurat
object/Sparse matrix + Annotation/Average Expression Matrix and
then calculate the marker genes for each cluster.starTracer::filterMarker()
:an in-conjunction pipeline. The function takes
in a data.frame
from the Seurat::FindAllMArkers()
function, providing a
new data.frame with, which could be used to reordering the marker genes from
the data.frame
the user provided.searchMarker is designed to take multiple kinds of input data. Users may provide input data as the following 3 formats:
For more details:
importing a Seurat object
The object should meet certain requirements:
res <- searchMarker(
x = x, #the input seurat object
thresh.1 = 0.5,
thresh.2 = NULL,
method = "del_MI",
num = 2,
gene.use = NULL,
meta.data = NULL,
ident.use = NULL
)
Importing a sparse matrix
For users who may have processed the single cell experiment data in softwares other than Seurat, users are suggested to output their expression data as a Sparse Matrix(dgCMatrix) with cells in columns and features in rows and populated with the normalized data. An annotation matrix is also required with rows as cells corresponded with columns in the expression data.
res <- searchMarker(
x = x, #the input dgCMatrix
thresh.1 = 0.5,
thresh.2 = NULL,
method = "del_MI",
num = 2,
gene.use = NULL,
meta.data = meta.data, # the annotation matrix
ident.use = NULL #the ident to use in the annotation matrix
)
Importing an average expression matrix
For users who have an average expression matrix with columns as clusters and rows as features, you may directly input the data.
res <- searchMarker( x = x,
thresh.1 = 0.5, thresh.2 = NULL, method = "del_MI", num = 2, gene.use = NULL,
meta.data = NULL, # the annotation matrix
ident.use = NULL #the ident to use in the annotation matrix
)
Please refer to the following links for details in our original research article. You may also find the basic usage examples here:
In the following 3 scripts, we test the specificity, calculation speed and accuracy in three samples:
We here use the human brain sample, with the annotation of “bio_clust”,
“major_clust” and “sub_clust”, to test starTracer::searchMarker
’s ability to
find maker genes across different annotation levels.
Here we show the different results of using different thresh.2 (denoted as S2 in the original research article). You may refer to this results to better understand the influence of thresh.2 on your results.
Here we show the results of filterMarker. filterMarker takes the outputmatrix/data.frame of Seurat’s FindAllMarkers function. Note that calculating with samples of big cell numbers will be time consumable.