Batch-effect correction over a list of single cell batches
CorrectBatches(
lsBatches,
hierarchical = TRUE,
queNumCelltypes = NULL,
maxMem = 5,
sampling = FALSE,
numSamples = NULL,
kNN = 30,
pcaDim = 50,
pairsFilter = FALSE,
perCellMNN = 0.08,
fuzzy = TRUE,
fuzzyPCA = 10,
estMethod = "Median",
clusterMethod = "louvain",
doCosNorm = FALSE,
fracSampling = NULL,
debug = FALSE,
verbose = FALSE,
...
)
List of batches to integrate. Batches should contain the same number of genes as rows.
Use hierarchical integration scheme when correcting more than two batches. If set to FALSE, the input batches are sorted by number of cells and integrated on descending order.
Number of cell types in the query batch. By default Canek searches the number of cell types using an heuristic algorithm. Change this parameter if you know the number of cell types in advanced.
Maximum number of memberships from the query batch. This parameter is used on the heuristic algorithm to find the number of cell types.
Use MNNs pairs sampling when using a Kalman filter to estimate the correction vector.
If sampling. Number of MNNs pairs samples to use on the estimation process.
Number of k-nearest-neighbors used to define the MNNs pairs.
Number of PCA dimensions to use.
Filter MNNs pairs before estimating the correction vectors. If TRUE, the pairs are filtered from outliers using an interquartile range method.
Threshold value to decide if a membership's correction value is calculated. As a rough interpretation, this values can be thought as the proportion of cells from a membership with an associated MNN pair. If the proportion is low, an specific correction vectors is not calculated for this membership.
Use fuzzy logic to join the local correction vectors.
Number of PCs to use in the fuzzy process.
Method to use when estimating the correction vectors:
Median. Use the cells median distance
EKF. Use an extended Kalman filter
Method used to identify memberships.
Whether to do cosine normalization.
Fraction of cells to sample in the hierarchical selection (default is NULL, no sampling).
Return correction's information
Print output.
Pass down methods from RunCanek().
A list containing the integrated datasets as matrix and the correction data .
CorrectBatches is a method to correct batch-effect from two or more single-cell batches. Batch-effects observations are defined using mutual nearest neighbors (MNNs) pairs and cell groups from the query batch are distinguished using clustering. We estimate a correction vector for each cluster using its MNNs pairs and use these vectors to remove the batch effect from the query batch in two ways:
A linear correction is performed by equally correcting the cells from the same cluster.
A non-linear correction is performed by differently correcting each cell using fuzzy logic.