Title: | Guiding the Integration of Multiple Single-Cell RNA-Seq Datasets |
---|---|
Description: | The accumulation of single-cell RNA-seq (scRNA-seq) studies highlights the potential benefits of integrating multiple datasets. By augmenting sample sizes and enhancing analytical robustness, integration can lead to more insightful biological conclusions. However, challenges arise due to the inherent diversity and batch discrepancies within and across studies. SCIntRuler, a novel R package, addresses these challenges by guiding the integration of multiple scRNA-seq datasets. |
Authors: | Yue Lyu [aut, cre] |
Maintainer: | Yue Lyu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.99.3 |
Built: | 2024-11-14 05:32:57 UTC |
Source: | https://github.com/yuelyu21/scintruler |
The SCIntRuler package addresses the challenges of integrating multiple single-cell RNA-seq (scRNA-seq) datasets. It provides tools to enhance analytical robustness by augmenting sample sizes and reducing batch discrepancies. Developed using the Seurat framework, SCIntRuler includes both existing and novel workflows for single-cell analysis.
This is the main page for SCIntRuler package.
NA
Informed Decision Making: Helps researchers decide on the necessity of data integration and the most suitable method.
Flexibility: Suitable for various scenarios, accommodating different levels of data heterogeneity.
Robustness: Enhances analytical robustness in joint analyses of merged or integrated scRNA-seq datasets.
User-Friendly: Streamlines decision-making processes, simplifying the complexities involved in scRNA-seq data integration.
Refer to the "Getting Started with SCIntRuler" article in the package vignettes for detailed user instructions.
Yue Lyu
Calculate SCIntRuler
CalcuSCIR(fullcluster, seuratlist, testres, p = 0.1)
CalcuSCIR(fullcluster, seuratlist, testres, p = 0.1)
fullcluster |
A list of clusters that generated by the function GetCluster() |
seuratlist |
A list of Seurat objects, usually can be got by SplitObject(). |
testres |
Result from function PermTest() |
p |
P-value that will be used as the cut-off, default value is 0.1 |
SCIntRuler
data(sim_result) data(sim_data_sce) # Create example data for fullcluster (mock data) sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # seuratlist <- InputData(sim_data_sce,"Study") # fullcluster <- GetCluster(seuratlist) # testres <- PermTest(fullcluster,distmat,15) CalcuSCIR(sim_result[[1]], seuratlist, sim_result[[4]])
data(sim_result) data(sim_data_sce) # Create example data for fullcluster (mock data) sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # seuratlist <- InputData(sim_data_sce,"Study") # fullcluster <- GetCluster(seuratlist) # testres <- PermTest(fullcluster,distmat,15) CalcuSCIR(sim_result[[1]], seuratlist, sim_result[[4]])
Computes the pairwise Euclidean distance between rows of two matrices.
crossdist(m1, m2)
crossdist(m1, m2)
m1 |
Numeric matrix. |
m2 |
Numeric matrix. |
Numeric matrix of distances.
mat1 <- matrix(1:4, ncol = 2) mat2 <- matrix(5:8, ncol = 2) dist_matrix <- crossdist(mat1, mat2)
mat1 <- matrix(1:4, ncol = 2) mat2 <- matrix(5:8, ncol = 2) dist_matrix <- crossdist(mat1, mat2)
Find cells indicating shared biological features across conditions
FindCell(seuratobj, seuratlist, fullcluster, distmat, firstn = 15)
FindCell(seuratobj, seuratlist, fullcluster, distmat, firstn = 15)
seuratobj |
The Seurat object that all samples/subjects were merged together. |
seuratlist |
A list of Seurat objects, usually can be got by SplitObject(). |
fullcluster |
A list of clusters that generated by the function GetCluster(). |
distmat |
A list of distance vectors generated by the function FindNNDist(). |
firstn |
The number of nearest cells were detected that you want to include in the permutation test. Default to be 15. |
A list of two vectors: one is for which cluster of which sample will be highlighted and the second one is which cells will be selected.
# Create example Seurat object data(sim_data_sce) data(sim_result) # Create example list of Seurat objects sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # seuratlist <- InputData(sim_data_sce,"Study") # Create example fullcluster (mock data) # fullcluster <- GetCluster(seuratlist) # Create example distmat (mock data) # distmat <- FindNNDist( fullcluster, distmat, meaningn = 20) FindCell(sim_data, seuratlist, sim_result[[1]], sim_result[[3]], 15)
# Create example Seurat object data(sim_data_sce) data(sim_result) # Create example list of Seurat objects sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # seuratlist <- InputData(sim_data_sce,"Study") # Create example fullcluster (mock data) # fullcluster <- GetCluster(seuratlist) # Create example distmat (mock data) # distmat <- FindNNDist( fullcluster, distmat, meaningn = 20) FindCell(sim_data, seuratlist, sim_result[[1]], sim_result[[3]], 15)
Find the nearest neighbors
FindNNDist(fullcluster, normCount, meaningn = 20)
FindNNDist(fullcluster, normCount, meaningn = 20)
fullcluster |
A list of clusters that generated by the function GetCluster(). |
normCount |
A list of normalized gene count matrix generated by the function NormData(). |
meaningn |
default to be 20 |
A list of distance vectors
data(sim_result) # Create example data for fullcluster (mock data) # fullcluster <- GetCluster(seuratlist) # Create example data for normCount (mock data) # normCount <- NormData(seuratlist) # Define meaningn meaningn <- 20 FindNNDist(sim_result[[1]], sim_result[[2]], meaningn = meaningn)
data(sim_result) # Create example data for fullcluster (mock data) # fullcluster <- GetCluster(seuratlist) # Create example data for normCount (mock data) # normCount <- NormData(seuratlist) # Define meaningn meaningn <- 20 FindNNDist(sim_result[[1]], sim_result[[2]], meaningn = meaningn)
Find the nearest neighbors
FindNNDistC(fullcluster, normCount, meaningn = 20)
FindNNDistC(fullcluster, normCount, meaningn = 20)
fullcluster |
A list of clusters that generated by the function GetCluster(). |
normCount |
A list of normalized gene count matrix generated by the function NormData(). |
meaningn |
default to be 20 |
A list of distance vectors
data(sim_result) # Create example data for fullcluster (mock data) # fullcluster <- GetCluster(seuratlist) # Create example data for normCount (mock data) # normCount <- NormData(seuratlist) # Define meaningn meaningn <- 20 FindNNDistC(sim_result[[1]], sim_result[[2]], meaningn = meaningn)
data(sim_result) # Create example data for fullcluster (mock data) # fullcluster <- GetCluster(seuratlist) # Create example data for normCount (mock data) # normCount <- NormData(seuratlist) # Define meaningn meaningn <- 20 FindNNDistC(sim_result[[1]], sim_result[[2]], meaningn = meaningn)
Get broad and fine clusters
GetCluster(seuratlist, n1 = 50, n2 = 200)
GetCluster(seuratlist, n1 = 50, n2 = 200)
seuratlist |
A list of Seurat objects, usually can be got by SplitObject(). We also accept the SingleCellExperiment object input. |
n1 |
If the number of cells was smaller than n1, then the cluster will remain unchanged called rare cluster. The default value of n1 is 50. |
n2 |
If the count of cells within a broad cluster is more than n2, the cluster is subdivided randomly into three fine clusters. If the cell count falls within the range of n1 to n2, two fine clusters are generated randomly. Default value is 200. |
A list of data frames.
data(sim_data_sce) # Assuming "seuratlist" is a list of Seurat objects # if(is(sim_data_sce, "SingleCellExperiment")){ sim_data <- as.Seurat(sim_data_sce) } sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # seuratlist <- InputData(sim_data_sce,"Study") fullcluster <- GetCluster(seuratlist)
data(sim_data_sce) # Assuming "seuratlist" is a list of Seurat objects # if(is(sim_data_sce, "SingleCellExperiment")){ sim_data <- as.Seurat(sim_data_sce) } sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # seuratlist <- InputData(sim_data_sce,"Study") fullcluster <- GetCluster(seuratlist)
Normalized RNA data matrix
NormData(seuratlist)
NormData(seuratlist)
seuratlist |
A list of Seurat objects, usually can be got by SplitObject(). |
A list of matrix.
data(sim_data_sce) # seuratlist <- InputData(sim_data_sce,"Study") # if(is(sim_data_sce, "SingleCellExperiment")){ sim_data <- as.Seurat(sim_data_sce) } sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") normCount <- NormData(seuratlist)
data(sim_data_sce) # seuratlist <- InputData(sim_data_sce,"Study") # if(is(sim_data_sce, "SingleCellExperiment")){ sim_data <- as.Seurat(sim_data_sce) } sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") normCount <- NormData(seuratlist)
Permutation Test
PermTest(fullcluster, distmat, firstn)
PermTest(fullcluster, distmat, firstn)
fullcluster |
A list of clusters that generated by the function GetCluster() |
distmat |
A list of distance vectors generated by the function FindNNDist(). |
firstn |
The number of nearest cells were detected that you want to include in the permutation test. |
A list of two lists, one is the relative within-between distance and another is p-value of permutation test. Default to be 15.
data(sim_result) # fullcluster <- GetCluster(seuratlist) # Assuming 'distmat' is a list of normalized gene count matrix # distmat <- FindNNDist(fullcluster, normCount, meaningn = 20) testres <- PermTest(sim_result[[1]], sim_result[[3]],15)
data(sim_result) # fullcluster <- GetCluster(seuratlist) # Assuming 'distmat' is a list of normalized gene count matrix # distmat <- FindNNDist(fullcluster, normCount, meaningn = 20) testres <- PermTest(sim_result[[1]], sim_result[[3]],15)
Plot SCIntRuler
PlotSCIR(fullcluster, seuratlist, testres, legendtitle = NULL, title = NULL)
PlotSCIR(fullcluster, seuratlist, testres, legendtitle = NULL, title = NULL)
fullcluster |
A list of clusters that generated by the function GetCluster. |
seuratlist |
A list of Seurat objects, usually can be got by SplitObject(). |
testres |
Result from function PermTest() |
legendtitle |
Title of legend, default to be NULL |
title |
Title of figure, default to be NULL |
A ggplot2 object
# Create example data for fullcluster (mock data) data(sim_data_sce) data(sim_result) # if(is(sim_data_sce, "SingleCellExperiment")){ sim_data <- as.Seurat(sim_data_sce) } sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # fullcluster <- GetCluster(seurat_object_list) # testres <- PermTest(fullcluster,distmat,15) PlotSCIR(sim_result[[1]], seuratlist, sim_result[[4]])
# Create example data for fullcluster (mock data) data(sim_data_sce) data(sim_result) # if(is(sim_data_sce, "SingleCellExperiment")){ sim_data <- as.Seurat(sim_data_sce) } sim_data <- SCEtoSeurat(sim_data_sce) seuratlist <- Seurat::SplitObject(sim_data, split.by = "Study") # fullcluster <- GetCluster(seurat_object_list) # testres <- PermTest(fullcluster,distmat,15) PlotSCIR(sim_result[[1]], seuratlist, sim_result[[4]])
This function takes a SingleCellExperiment object and a variable by which to split it, converts it to a Seurat object, and then splits it according to the specified variable.
SCEtoSeurat(sce)
SCEtoSeurat(sce)
sce |
A SingleCellExperiment object. |
A Seurat objects.
data(sim_data_sce) # seuratlist <- InputData(sim_data_sce,"Study") seuratobj <- SCEtoSeurat(sim_data_sce)
data(sim_data_sce) # seuratlist <- InputData(sim_data_sce,"Study") seuratobj <- SCEtoSeurat(sim_data_sce)
An example PBMC data with SingleCellExperiment format, including 3000 cells and 800 genes.
sim_data_sce
sim_data_sce
An example PBMC data with SingleCellExperiment format
A DataFrame with 3000 rows and 1 column, storing simulated gene information.
A DataFrame with 800 rows and 3 columns, representing metadata for each cell.
A list containing two elements that provide additional global metadata about the experiment.
A CompressedGRangesList object providing genomic range data associated with each row/gene.
A DataFrame with 800 rows and 8 columns, detailing cell-level metadata.
A SimpleAssay object with matrix dimensions 3000x800, representing the gene expression matrix.
A DataFrame linked with assays, providing gene-level metadata.
The "sim_data_sce" object is designed to serve as a teaching and development aid for methods that require complex single-cell expression data. It includes several typical features found in single-cell datasets, such as varied levels of gene expression and metadata describing both cells and genes.
The data within this object are entirely synthetic and should not be used for real analysis. The main use case is for testing and development of single-cell analysis methodologies.
Simulation data to exemplify the usage of the method.
The data were generated using a combination of random number generation for expression values and curated sources for metadata to simulate realistic experimental scenarios.
data("sim_data_sce")
data("sim_data_sce")
An result example data with results from different functions.
sim_result
sim_result
An result example data
A runable example of GetCluster, which is a list of clusters for each study.
A runable example of NormData, which is a list of normalized RNA expression matrixs for each study.
A runable example of FindNNDist, which is a list of distance matrixs for each study.
A runable example of CalcuSCIR, which is a list of test results for each study.
Simulation data to examplify the usage of the method.
# Load the data data("sim_result")
# Load the data data("sim_result")
Get maximum number of broad clusters
SummCluster(fullcluster)
SummCluster(fullcluster)
fullcluster |
A list of clusters that generated by the function GetCluster() |
A list
data(sim_result) # Assuming "fullcluster" is a list of clusters # fullcluster <- GetCluster(seuratlist) SCout <- SummCluster(sim_result[[1]])
data(sim_result) # Assuming "fullcluster" is a list of clusters # fullcluster <- GetCluster(seuratlist) SCout <- SummCluster(sim_result[[1]])