Batch effect correction on two single-cell batches

CorrectBatch(
  refBatch,
  queBatch,
  cnRef = NULL,
  cnQue = NULL,
  queNumCelltypes = NULL,
  maxMem = 5,
  pairs = NULL,
  kNN = 30,
  sampling = FALSE,
  numSamples = NULL,
  idxQuery = NULL,
  idxRef = NULL,
  pcaDim = 50,
  perCellMNN = 0.08,
  fuzzy = TRUE,
  fuzzyPCA = 10,
  estMethod = "Median",
  clusterMethod = "louvain",
  pairsFilter = FALSE,
  doCosNorm = FALSE,
  verbose = FALSE
)

Arguments

refBatch

Reference batch.

queBatch

Query batch (batch to correct).

cnRef

Cosine normalization of the reference batch.

cnQue

Cosine normalization of the query batch.

queNumCelltypes

Number of cell types in the query batch. By default Canek searches the number of cell types using an heuristic algorithm. Change this parameter if you know the number of cell types in advanced.

maxMem

Maximum number of memberships from the query batch. This parameter is used on the heuristic algorithm to find the number of cell types.

pairs

A numerical matrix containing MNNs pairs cell indexes. First column corresponds to query batch cell indexes.

kNN

Number of k-nearest-neighbors used to define the MNNs pairs.

sampling

Use MNNs pairs sampling when using a Kalman filter to estimate the correction vector.

numSamples

If sampling. Number of MNNs pairs samples to use on the estimation process.

idxQuery

Numerical vector indicating the index of the cells from the query batch to use on the correction vector estimation.

idxRef

Numerical vector indicating the index of the cells from the reference batch to use on the correction vector estimation.

pcaDim

Number of PCA dimensions to use.

perCellMNN

Threshold value to decide if a membership's correction value is calculated. As a rough interpretation, this values can be thought as the proportion of cells from a membership with an associated MNN pair. If the proportion is low, an specific correction vectors is not calculated for this membership.

fuzzy

Use fuzzy logic to join the local correction vectors.

fuzzyPCA

Number of PCs to use in the fuzzy process.

estMethod

Method to use when estimating the correction vectors:

  • Median. Use the cells median distance.

  • EKF. Use an extended Kalman filter.

clusterMethod

Method used to identify memberships.

pairsFilter

Filter MNNs pairs before estimating the correction vectors. If TRUE, the pairs are filtered from outliers using an interquartile range method.

doCosNorm

Whether to do cosine normalization.

verbose

Print output.

Value

A list containing the input batches, the corrected query batch, and the correction data

Details

CorrectBatch is a method to correct batch-effect from two single-cell batches. Batch-effects observations are defined using mutual nearest neighbors (MNNs) pairs and cell groups from the query batch are distinguished using clustering. We estimate a correction vector for each cluster using its MNNs pairs and use these vectors to remove the batch effect from the query batch in two ways:

  • A linear correction is performed by equally correcting the cells from the same cluster.

  • A non-linear correction is performed by differently correcting each cell using fuzzy logic.

Examples

x <- SimBatches$batches[[1]]
y <- SimBatches$batches[[2]]
z <- CorrectBatch(x, y)
Corrected <- z$`Corrected Query Batch`

Uncorrected_PCA <- prcomp(t(cbind(x,y)))
plot(Uncorrected_PCA$x[,1:2])

Corrected_PCA <- prcomp(t(cbind(x,z$`Corrected Query Batch`)))
plot(Corrected_PCA$x[,1:2])