The Sequential and Orthogonalised PLS Discriminant Analysis method

Metabolomics is a powerful phenotyping tool, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between experimental factors and metabolism, discriminant statistical analyses are generally performed on metabolomic datasets. In this context, the SOPLSDA method makes it possible to assess the complementarity of data blocks in predicting or explaining a categorical variable.

Reference: Næs T, Tomic O, Afseth NK, Segtnan V, Måge I. Multi block regression based on combinations of orthogonalisation, PLS-regression and canonical correlation analysis. Chemom Intel Lab Syst. 2013;124:32-42.

A selection of reduced non-targeted metabolomics datasets from an article of Zhang et al. have been used to illustrate the method application. (Zhang, Y.; Barupal, D.K.; Fan, S.; Gao, B.; Zhu, C.; Flenniken, A.M.; McKerlie, C.; Nutter, L.M.J.; Lloyd, K.C.K.; Fiehn, O. Sexual Dimorphism of the Mouse Plasma Metabolome Is Associated with Phenotypes of 30 Gene Knockout Lines. Metabolites 2023, 13, 947. https://doi.org/10.3390/metabo13080947).

In our example, the 6 samples from 5 mutant groups (Dhfr, Gnpda1, Plk1, Sra1, Ulk3) and the 40 controls were retained, with the exception of 2 animals that had missing values in non-targeted metabolomics. Three HILIC-NEG and 2 HILIC-POS variables were then removed due to missing or infinite values. Lastly, we kept only 2 data blocks (GCTOF and HILIC NEG) to reduce the computation time. Always in our example, the objective was to highlight metabolites discriminating males and females.

R environment preparation

#install.packages("knitr")
library(knitr)
opts_chunk$set(echo = TRUE)

# To ensure reproducibility
set.seed(12)

pkgs <- c("rchemo")
sapply(pkgs, function(x) {
  if (!requireNamespace(x, quietly = TRUE)) {
    install.packages(x)
  }
})

library(rchemo)  # to load rchemo

Data preprocessing

A demonstration dataset used for this example is imported. It contains 2 metabolomics datasets and 1 sample metadata block, with the groups to be discriminated.

data(Zhang2023, package = "rchemo")

All data blocks should have exactly the same samples (rows). The block dimension can be checked with the following command:

# Check dimension
BlockNames <- names(Zhang2023)
nbrBlocs <- length(BlockNames)
dims <- lapply(X=Zhang2023[BlockNames], FUN=dim)
names(dims) <- BlockNames
dims

## $CHSNEG
## [1]  68 163
## 
## $CHSPOS
## [1]  68 288
## 
## $GCTOF
## [1]  68 108
## 
## $HILICPOS
## [1]  68 133
## 
## $HILICNEG
## [1] 68 44
## 
## $metadata
## [1] 68  2

# Remove unuseful object for the next steps
rm(nbrBlocs, dims)

The identical order of samples in the three blocks should be ensured.

# Check rows names in any order
row_names <- lapply(X=Zhang2023[BlockNames], FUN=rownames)
rns <- do.call(cbind, row_names)
rns.unique <- apply(rns, 1, function(x) length(unique(x)))
if (max(rns.unique) > 1) {
  stop("Rows names are not identical between blocks.")
}

# Check order of samples
check_row_names <- all(sapply(X=row_names, FUN=identical, y = row_names[[1]]))
if (!check_row_names && max(rns.unique) == 1) {
  print("Rows names are not in the same order for all blocks.")
}

# Remove unuseful object for the next steps
rm(row_names, rns, rns.unique, check_row_names)

GCTOF <- Zhang2023$GCTOF
HILICNEG <- Zhang2023$HILICNEG
sample_metadata <- Zhang2023$metadata
rm(Zhang2023)

Data visualization by PCA on separate data blocks

PCA on different datablocks is necessary to verify that there is no outlier, and that the data scaling is suited. In our example, unit variance scaling is suited: For both metabolomic dataset, the scatterplot of loadings is homogeneous, and there are no outliers. With other datasets, particularly those containing more noise, it would have been necessary to apply a log transformation with Pareto scaling.

# GCTOF with unit variance scaling
pcaGCTOF <- pcanipalsna(X = scale(GCTOF[,1:dim(GCTOF)[2]]), nlv = nrow(GCTOF), 
                         gs = TRUE,
                         tol = .Machine$double.eps^0.5, maxit = 200)

## diagram of explained variance
barplot(summary(pcaGCTOF,X = scale(GCTOF[,1:dim(GCTOF)[2]]))$explvar$pvar * 100, names.arg = 1:nrow(summary(pcaGCTOF,X = scale(GCTOF[,1:dim(GCTOF)[2]]))$explvar), main = "diagram of explained variance - PCA GCTOF")

## score plot
plotxy(X= pcaGCTOF$T, group = sample_metadata$Gender, 
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "scores - PCA GCTOF", ncol = 1,
       pch=16)

## loading plot
plotxy(X= pcaGCTOF$P, group = NULL, 
       asp = 0, col = NULL, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = TRUE,
       legend = FALSE, main = "loadings - PCA GCTOF", ncol = 1,
       cex=0.8)

#HILICNEG with unit variance scaling
pcaHILICNEG <- pcanipalsna(X = scale(HILICNEG[,1:dim(HILICNEG)[2]]), nlv = nrow(HILICNEG), 
                         gs = TRUE,
                         tol = .Machine$double.eps^0.5, maxit = 200)

## diagram of explained variance
barplot(summary(pcaHILICNEG,X = scale(HILICNEG[,1:dim(HILICNEG)[2]]))$explvar$pvar * 100, names.arg = 1:nrow(summary(pcaHILICNEG,X = scale(HILICNEG[,1:dim(HILICNEG)[2]]))$explvar), main = "diagram of explained variance - PCA HILIC NEG")

## score plot
plotxy(X= pcaHILICNEG$T, group = sample_metadata$Gender, 
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "scores - PCA HILIC NEG", ncol = 1,
       pch=16)

## loading plot
plotxy(X= pcaHILICNEG$P, group = NULL, 
       asp = 0, col = NULL, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = TRUE,
       legend = FALSE, main = "loadings - PCA HILIC NEG", ncol = 1,
       cex=0.8)

# Remove unuseful object for the next steps
rm(pcaGCTOF, pcaHILICNEG)

SO-PLS-DA model

In SO-PLS-DA, datablocks are involved sequentially in the model.

The first explanatory block is scaled and a PLS-DA model is build to predict the response variable.
Then, the next explanatory block is orthogonalised to the PLS components, before a PLS model to explain the residuals, etc…
The final prediction is the results of the sum of the predictions by the PLS models.

The interpretation is based on the separate PLS models included in the SO-PLS modelling.

The main steps to apply this method are:

the determination of the optimal number of latent variables of each model included in the SO-PLS-DA one, proposed in the package by cross-validation
the model validation, proposed by permutation test
the model analysis, based on the scatter plot, the loading plot, the Variable Importance in Projection of each PLS model
the calculation of the predicted categories for each observation.

In rchemo, all these steps can be performed with a single function.

Xlist: list of training X-data (n, p).
Xnames: names of the X-matrices
Xscaling: vector of Xlist length. X variable scaling among “none” (mean-centering only), “pareto” (mean-centering and pareto scaling), “sd” (mean-centering and unit variance scaling). If “pareto” or “sd”, uncorrected standard deviation is used.
Y: Training Y-data (n, 1).
Yscaling: Y variable scaling among “none” (mean-centering only), “pareto” (mean-centering and pareto scaling), “sd” (mean-centering and unit variance scaling). If “pareto” or “sd”, uncorrected standard deviation is used.
weights Weights (n, 1) to apply to the training observations. Internally, weights are “normalized” to sum to 1. Default to NULL (weights are set to 1/n).
newXlist: list of new X-data (m, p) to consider.
newXnames: names of the newX-matrices
method: method to apply among “soplsrda”,“soplslda”,“soplsqda”
prior: for soplslda or soplsqda models : The prior probabilities of the classes. Possible values are “unif” (default; probabilities are set equal for all the classes) or “prop” (probabilities are set equal to the observed proportions of the classes in y).
step: step of the analysis among “nlvtest” (cross-validation to help determine the optimal number of latent variables), “permutation” (permutation test), “model” (model calculation),“prediction” (prediction of newX-data or X-data if any))
nlv: if step is not “nlvtest”. A vector of same length as the number of blocks defining the number of scores to calculate for each block, or a single number. In this last case, the same number of scores is used for all the blocks.
nlvlist: if step is “nlvtest”. A list of same length as the number of X-blocks. Each component of the list gives the number of PLS components of the corresponding X-block to test.
modeloutput: if step is “model”: outputs among “scores”, “loadings”, “coef” (regression coefficients), “vip” (Variable Importance in Projection; the VIP calculation being based on the proportion of Y-variance explained by the components, as proposed by Mehmood et al (2012, 2020).)
cvmethod: if step is “nlvtest” or “permutation”: “kfolds” for k-folds cross-validation, or “loo” for leave-one-out.
nbrep: if step is “nlvtest” and cvmethod is “kfolds”: An integer, setting the number of CV repetitions. Default value is 30. Must me set to 1 if cvmethod is “loo”
seed: if step is “nlvtest” and cvmethod is “kfolds”, or if step is “permutation: a numeric. Seed used for the repeated resampling
samplingk: A vector of length n. The elements are the values of a qualitative variable used for stratified partition creation. If NULL, the first observation is set in the first fold, the second observation in the second fold, etc…
nfolds: if cvmethod is “kfolds”. An integer, setting the number of partitions to create.Default value is 10.
npermut: if step is “permutation”: An integer, setting the number of Y-Block with permutated responses to create. Default value is 30.
optimisation: if step is “nlvtest”, method for the error optimisation among “global” (the optimal number of latent variables is determined after all the ordred block combination have been computed), or “sequential” (the optimal number of latent variables is determined after each addition of X-block).
criterion: if step is “nlvtest” or “permutation” and method is “soplsrda”, “soplslda” or “soplsqda”: optimisation criterion among “rmse” and “err” (for classification error rate)))
selection: if step is “nlvtest”: a character indicating the selection method to use to choose the optimal combination of components, among “localmin”, “globalmin”,“1std”. If “localmin”: the optimal combination corresponds to the first local minimum of the mean CV rmse or error rate. If “globalmin” : the optimal combination corresponds to the minimum mean CV rmse or error rate. If “1std” (one standard error rule) : it corresponds to the first combination after which the mean cross-validated rmse or error rate does not decrease significantly.

determination by cross-validation of the optimal number of latent variables of the model

In this example,the soplslda method with an uniform prior is performed: a linear discriminant analysis is applied on the PLS scores to obtain the classifications. In the soplsrda method, the classification step is performed according to the predicted value, and consequently, it doesn’t take into account the variance of the scatter plot. And the soplsqda method is mode suited when the scatter plot has an atypical shape.

The cvmethod is to choose according to the number of observations. If this number is small, a leave-one-out cross-validation is more suited, but can lead to optimistic results. In the other cases, it could be better to apply a k-fold cross-validation, usually with 3, 5 or 10 folds. In order to reduce the computational time, in this example, the number of cross- validation repetition is set to 10. But it must be higher (at least 30).

Concerning the criterion, in a discriminant context, it is more logical to choose the classification error rate. However, this can result in an error plot depending on the number of latent variables that is not smooth, so some prefer to use rmsecv instead.

In the example, the selection parameter is set to “localmin”. Normally, the cross-validated error decreases until it reaches the optimal number of latent variables, before rising again. Sometimes it fluctuates before this rise, so choosing “localmin” allows for a more parsimonious model. The “1std” option is also very parsimonious because it only considers a larger number of latent variables if doing so significantly improves the cross-validated error.

The plot of the results (called “Mage plot”) indicates the number of latent variables of each block (in the order of the block involvement in the model), and the corresponding error rates and sum of latent variables.

nlvtestsoplsda <- soplsr_soplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                       HILICNEG = HILICNEG[,1:dim(HILICNEG)[2]]), 
                                          Xnames = c("GCTOF", "HILICNEG"), 
                                          Xscaling = c("none","pareto","sd")[3], 
                                          Y = sample_metadata[,"Gender",drop=FALSE], 
                                          Yscaling = c("none","pareto","sd")[1], weights = NULL,
                                          #newXlist = NULL, newXnames = NULL,
                                          
                                          method = c("soplsrda","soplslda","soplsqda")[2],
                                          prior = c("unif", "prop")[1],
                                          
                                          step = c("nlvtest","permutation","model","prediction")[1],
                                          # nlv = c(),
                                          nlvlist = list(0:3, 0:3), 
                                          # modeloutput = c("scores","loadings","coef","vip"), 
                                          
                                          cvmethod = c("kfolds","loo")[1], 
                                          nbrep = 10, 
                                          seed = 123, 
                                          samplingk = NULL, 
                                          nfolds = 3, 
                                          # npermut = 30, 
                                          
                                          optimisation = c("global","sequential")[1],
                                          criterion = c("err","rmse")[1], 
                                          selection = c("localmin","globalmin","1std")[1],
                                          
                                          #import = c("R","ChemFlow","W4M")[1],
                                          outputfilename = NULL)


# to plot the results of the cross-validation
plot(nlvtestsoplsda[,"nlvsum"],nlvtestsoplsda[,"mean"], type = "n", xlab = "nlv sum", ylab = "classification error rate", 
     main = "error rates obtained from the different numbers \n of latent variable combinaisons of the 2 datablocks")
text(x=nlvtestsoplsda[,"nlvsum"], y=nlvtestsoplsda[,"mean"], 
     labels=paste0(nlvtestsoplsda[,"Xlist1"],",",nlvtestsoplsda[,"Xlist2"]))

nlvoptsoplsda <- nlvtestsoplsda[which(nlvtestsoplsda[,"optimum"]==1),c("Xlist1","Xlist2")] # to obtain the optimal number of LV.

# Remove unuseful object for the next steps
rm(nlvtestsoplsda)

model validation by permutation test

Usually, the cross-validation parameters are the same than for the determination of the optimal number of latent variables. In this example, as for the determination of the optimal number of latent variables, in order to reduce the computational time, the number of cross- validation repetition and the number of permuted responses are set to 10. But they must be higher (at least 30).

To be valid, all the cross-validated errors obtained on permuted data must be lower than the cross-validated error obtained on the original data.

permutsoplsda <- soplsr_soplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                      HILICNEG = HILICNEG[,1:dim(HILICNEG)[2]]), 
                                         Xnames = c("GCTOF", "HILICNEG"), 
                                         Xscaling = c("none","pareto","sd")[3], 
                                         Y = sample_metadata[,"Gender",drop=FALSE], 
                                         Yscaling = c("none","pareto","sd")[1], weights = NULL,
                                         #newXlist = NULL, newXnames = NULL,
                                         
                                         method = c("soplsrda","soplslda","soplsqda")[2],
                                         prior = c("unif", "prop")[1],
                                         
                                         step = c("nlvtest","permutation","model","prediction")[2],
                                         nlv = nlvoptsoplsda,
                                         #nlvlist = list(),
                                         #modeloutput = c("scores","loadings","coef","vip"), 
                                         
                                         cvmethod = c("kfolds","loo")[1], 
                                         nbrep = 10, 
                                         seed = 123, 
                                         samplingk = NULL, 
                                         nfolds = 3, 
                                         npermut = 10, 
                                         
                                         # optimisation = c("global","sequential")[1],
                                         criterion = c("err","rmse")[1], 
                                         # selection = c("localmin","globalmin","1std")[1],
                                         
                                         #import = c("R","ChemFlow","W4M")[1],
                                         outputfilename = NULL)

## Warning in predict.Lda(object$fm[[2]], z): mF_214 has/have equal posterior
## probabilities and has/have been classified in the first category in
## alphabetical orderNA has/have equal posterior probabilities and has/have been
## classified in the first category in alphabetical order
## Warning in predict.Lda(object$fm[[2]], z): mF_214 has/have equal posterior
## probabilities and has/have been classified in the first category in
## alphabetical orderNA has/have equal posterior probabilities and has/have been
## classified in the first category in alphabetical order

#plot of the results
plot(permutsoplsda, pch = 16, ylab = "CV classification error rate", xlab = "dyssimilarity Y-Ypermuted")

# Remove unuseful object for the next steps
rm(permutsoplsda)

model analysis, based on the scatter plot, the loading plot, the Variable Importance in Projection

The score plot allows to show the group discrimination. Based on the VIP curve, the threshold to consider that a variable is important can be set.

In our example, latent variables of the 2 matrices are retained by the SO-PLS-DA method to best discriminate the groups. Consequently, we have two loading plots and two VIP curves, and the scores obtained from the two matrices could be concatenated.

modelsoplsda <- soplsr_soplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                     HILICNEG = HILICNEG[,1:dim(HILICNEG)[2]]), 
                                         Xnames = c("GCTOF", "HILICNEG"), 
                                         Xscaling = c("none","pareto","sd")[3], 
                                         Y = sample_metadata[,"Gender",drop=FALSE], 
                                         Yscaling = c("none","pareto","sd")[1], weights = NULL,
                                         #newXlist = NULL, newXnames = NULL,
                                         
                                         method = c("soplsrda","soplslda","soplsqda")[2],
                                         prior = c("unif", "prop")[1],
                                         
                                         step = c("nlvtest","permutation","model","prediction")[3],
                                         nlv = nlvoptsoplsda,
                                         #nlvlist = list(), 
                                         modeloutput = c("scores","loadings","coef","vip"), 
                                         
                                         # cvmethod = c("kfolds","loo")[1], 
                                         # nbrep = 30, 
                                         # seed = 123, 
                                         # samplingk = NULL, 
                                         # nfolds = 5, 
                                         # npermut = 30, 
                                         # 
                                         # criterion = c("err","rmse")[1], 
                                         # selection = c("localmin","globalmin","1std")[1],
                                         
                                         #import = c("R","ChemFlow","W4M")[1],
                                         outputfilename = NULL)
# score plot

plotxy(do.call("cbind",modelsoplsda$scores)[,1:2], group = sample_metadata[,"Gender"],
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "scores - SO PLS DA \n first model (GCTOF variables)", ncol = 1,
       pch=16,
       xlab="lv1 GCTOF",
       ylab="lv2 GCTOF")

plotxy(do.call("cbind",modelsoplsda$scores)[,3:4], group = sample_metadata[,"Gender"],
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "scores - SO PLS DA \n 2nd model (HILIC NEG variables)", ncol = 1,
       pch=16,
       xlab="lv1 HILICNEG",
       ylab="lv2 HILICNEG")

# loading plot
plotxy(modelsoplsda$loadings[[1]], pch = 16, xlab="lv1 GCTOF", ylab="lv2 GCTOF",
       main = "loadings - SO PLS DA \n first model (GCTOF variables)")

plotxy(modelsoplsda$loadings[[2]], pch = 16, xlab="lv1 HILIC NEG", ylab="lv2 HILIC NEG",
       main = "loadings - SO PLS DA \n 2nd model (HILIC NEG variables)")

# vip curve
plot(modelsoplsda$vip[[1]][order(modelsoplsda$vip[[1]][,2], decreasing = TRUE),1,drop=FALSE], pch = 16, ylab = "VIP", xlab="GCTOF variables", main = "VIP curve - SO PLS DA \n first model (GCTOF variables)")

plot(modelsoplsda$vip[[2]][order(modelsoplsda$vip[[2]][,2], decreasing = TRUE),1,drop=FALSE], pch = 16, ylab = "VIP", xlab="HILIC NEG variables", main = "VIP curve - SO PLS DA \n 2nd model (HILIC NEG variables)")

# Remove unuseful object for the next steps
rm(modelsoplsda)

predicted values

The prediction step provides a table with:

subjects scores
predicted categories
probabilities to be predicted for each category.

predsoplsda <- soplsr_soplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                    HILICNEG = HILICNEG[,1:dim(HILICNEG)[2]]), 
                                        Xnames = c("GCTOF", "HILICNEG"), 
                                        Xscaling = c("none","pareto","sd")[3], 
                                        Y = sample_metadata[,"Gender",drop=FALSE], 
                                        Yscaling = c("none","pareto","sd")[1], 
                                        weights = NULL,
                                        newXlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                        HILICNEG = HILICNEG[,1:dim(HILICNEG)[2]]),
                                       newXnames =  c("GCTOF", "HILICNEG"),
                                        
                                        method = c("soplsrda","soplslda","soplsqda")[2],
                                        prior = c("unif", "prop")[1],
                                        
                                        step = c("nlvtest","permutation","model","prediction")[4],
                                        nlv = nlvoptsoplsda, 
                                        # nlvlist = list(),
                                        # modeloutput = c("scores","loadings","coef","vip"), 
                                        # 
                                        # cvmethod = c("kfolds","loo")[1], 
                                        # nbrep = 30, 
                                        # seed = 123, 
                                        # samplingk = NULL, 
                                        # nfolds = 5, 
                                        # npermut = 30, 
                                        # 
                                        # criterion = c("err","rmse")[1], 
                                        # selection = c("localmin","globalmin","1std")[1],
                                        # 
                                        #import = c("R","ChemFlow","W4M")[1],
                                        outputfilename = NULL)

predsoplsda

##                lv1         lv2         lv1         lv2 pred.y1
## wM_001  3.11048385 -1.87764285  0.07703043  0.40560800    Male
## wM_002  0.87060911  2.77370914  2.22259032 -0.66682222    Male
## wF_003 -2.07946854 -4.15133913  1.03471994 -2.22329950  Female
## wF_004 -1.39293371 -0.93299524  1.67831788 -2.45446928  Female
## wM_005  3.73347374  2.13941368 -5.29254309 -1.92777007    Male
## wM_006  0.79420753  3.89885886  2.41085087  0.36220058    Male
## wM_007  3.66433724  2.85203300  0.26996045 -0.78772362    Male
## wF_008 -1.34245891 -3.43316001 -1.34020906 -1.30099182  Female
## wF_009 -2.26798161  2.48626585 -2.51749830  2.71652521  Female
## wF_010 -1.13327108 -1.60290864 -1.74383378 -0.20346970  Female
## wM_011  3.67967654 -0.39408434  0.27481603 -2.22186237    Male
## wM_012  3.26335437 -2.31999874 -0.35752365  0.80059801    Male
## wM_013  0.28617419 -1.14537735  1.44888301  1.34292534    Male
## wF_014  0.31653696 -1.86713047 -0.10836807 -1.35806505  Female
## wF_015 -2.05331000 -0.36417140 -1.45994661  0.49581052  Female
## wF_016 -4.37108996  7.03947003  0.48924128 -0.20012545  Female
## wM_017  0.03405322  0.91245594  3.02901373  1.40782450    Male
## wM_018  2.92341303  3.40909537 -0.41843520 -4.54487412    Male
## wM_019  1.41997309  1.63142203  0.12173141  0.22611565    Male
## wM_020  1.14646647 -0.05933422  0.85424291  0.15782799    Male
## wM_021  2.91331290 -0.12790182 -0.80740423 -0.12004078    Male
## wF_022 -4.38236310  1.61506631  2.83359600 -0.31928091  Female
## wF_023 -3.73307346  1.07067921  2.11996141 -5.05835436  Female
## wF_024 -1.79119878 -3.46607164  0.18851712  1.65479451  Female
## wF_025 -1.19432280 -4.48501114 -0.99634312 -0.26788161  Female
## wF_026 -3.40180630 -1.53255007  0.32958956 -1.56531662  Female
## wM_027 -0.31580245  4.32983783  2.45070669  2.64769814    Male
## wM_028  3.22809519 -2.11611622  0.71038364  0.65983284    Male
## wM_029  1.28918990  2.96171395 -0.30720409 -0.29907168    Male
## wF_030 -0.59476696 -2.45510247 -0.99264843 -0.30641279  Female
## wF_031 -1.89409108 -1.71153510 -1.11516255  0.33334948  Female
## wF_032 -2.76739212  0.84571671 -1.31968358  1.15376127  Female
## wM_033  1.18433490  1.02501023  3.10232066  1.24839620    Male
## wM_034  3.53597797 -1.82823743 -0.07881221  0.12013004    Male
## wF_036 -1.01177879 -4.12317919 -0.24680675 -0.94535253  Female
## wF_037 -1.50619042 -4.03770384  0.24479962 -0.83956520  Female
## wF_038 -0.11802562 -2.82055613 -1.21345672 -0.19008816  Female
## wM_039  0.81119312  0.25210333  1.94807822  2.26753765    Male
## wF_040 -0.89750684 -5.50596899  0.59761712  0.56970938  Female
## mM_077  0.32201122  0.11719980  1.00430388  0.62820934    Male
## mM_078  2.82174175 -1.62255957 -0.75339441  0.44230566    Male
## mM_079  1.52374663  3.05135921  0.00363999  0.31691976    Male
## mF_080 -1.83388003 -0.34978173 -0.18411593 -1.04286550  Female
## mF_081 -4.83752048  5.06195942 -0.87061435 -0.59902890  Female
## mF_082 -5.84194116  4.61675751 -0.50142998  1.70096399  Female
## mM_101  4.19265500 -0.33141613 -0.55409465  0.12231407    Male
## mM_103  0.47759999 10.71062129 -1.98085854 -0.25556320    Male
## mF_104 -1.87999495  0.28392773  0.78165744 -0.73089689  Female
## mF_105 -1.57420778 -2.15507349 -4.61599082  1.91167672  Female
## mF_106 -1.30576203 -1.81752239 -1.20014718 -0.42684276  Female
## mM_173  2.02029990  4.50306614  0.59269895  0.43698540    Male
## mM_174  3.66185025 -2.16089236  2.72712243 -2.21065658    Male
## mM_175  1.78528418 -0.82972874  1.16083129  0.84689935    Male
## mF_176 -1.61109937 -0.63015106 -1.42866563 -0.84219515  Female
## mF_177 -2.74312195  1.44198338 -0.37029148  0.51838869  Female
## mF_178 -1.79013055 -1.84912269  2.34012558  1.62072161  Female
## mM_203  3.29639363  1.18897175  0.73049352  0.68540307    Male
## mM_204  3.34374067 -2.22016873  2.58276182  1.76523303    Male
## mM_205  3.11432814  1.98396576 -5.79302280  1.91924356    Male
## mF_206 -1.75425841 -1.67341083  1.23090014  1.69358936  Female
## mF_207 -3.66826684 -0.30655520  0.91166430  1.10380270  Female
## mF_208 -2.56267570 -2.22961926 -1.62267456  0.32346780  Female
## mM_209  2.46644494  3.20068102  0.68571456  0.07981585    Male
## mM_210  2.40821580  2.18490300  0.06169249  0.40696045    Male
## mM_211  2.91903373  1.44113267  0.13797149  0.15769498    Male
## mF_212 -1.52050980  1.98476848 -1.95914905 -2.27232878  Female
## mF_213 -1.55495652 -1.01428806 -0.43492260  0.34890810  Female
## mF_214  0.16894898 -9.46578196 -0.80329472  0.58106680  Female
##        pred.posterior.Female pred.posterior.Male
## wM_001          1.909432e-08        1.000000e+00
## wM_002          7.721731e-08        9.999999e-01
## wF_003          1.000000e+00        2.645330e-12
## wF_004          9.999978e-01        2.206130e-06
## wM_005          6.889095e-07        9.999993e-01
## wM_006          8.358348e-10        1.000000e+00
## wM_007          2.896084e-14        1.000000e+00
## wF_008          1.000000e+00        4.281580e-11
## wF_009          9.999914e-01        8.586268e-06
## wF_010          1.000000e+00        4.194876e-08
## wM_011          8.512437e-10        1.000000e+00
## wM_012          2.598071e-08        1.000000e+00
## wM_013          8.219362e-03        9.917806e-01
## wF_014          9.962255e-01        3.774474e-03
## wF_015          1.000000e+00        9.257643e-09
## wF_016          9.999988e-01        1.160460e-06
## wM_017          6.658316e-06        9.999933e-01
## wM_018          8.437392e-09        1.000000e+00
## wM_019          8.073413e-07        9.999992e-01
## wM_020          5.822417e-05        9.999418e-01
## wM_021          2.762182e-08        1.000000e+00
## wF_022          1.000000e+00        6.621931e-10
## wF_023          1.000000e+00        4.320536e-13
## wF_024          1.000000e+00        1.794883e-08
## wF_025          1.000000e+00        1.584084e-10
## wF_026          1.000000e+00        7.570534e-14
## wM_027          9.267739e-09        1.000000e+00
## wM_028          1.940262e-09        1.000000e+00
## wM_029          6.502148e-07        9.999993e-01
## wF_030          9.999987e-01        1.261775e-06
## wF_031          1.000000e+00        1.969323e-09
## wF_032          1.000000e+00        5.275432e-09
## wM_033          1.773692e-09        1.000000e+00
## wM_034          2.134195e-09        1.000000e+00
## wF_036          1.000000e+00        2.398871e-09
## wF_037          1.000000e+00        3.711564e-10
## wF_038          9.999891e-01        1.087818e-05
## wM_039          3.888605e-07        9.999996e-01
## wF_040          1.000000e+00        2.671013e-08
## mM_077          3.588627e-03        9.964114e-01
## mM_078          5.660816e-07        9.999994e-01
## mM_079          1.493111e-08        1.000000e+00
## mF_080          9.999999e-01        5.783844e-08
## mF_081          1.000000e+00        6.672721e-12
## mF_082          1.000000e+00        3.614085e-13
## mM_101          1.915756e-12        1.000000e+00
## mM_103          1.053338e-10        1.000000e+00
## mF_104          9.999964e-01        3.620401e-06
## mF_105          1.000000e+00        2.301161e-11
## mF_106          1.000000e+00        1.873175e-08
## mM_173          2.709685e-12        1.000000e+00
## mM_174          1.549463e-10        1.000000e+00
## mM_175          5.340681e-07        9.999995e-01
## mF_176          1.000000e+00        9.745647e-09
## mF_177          9.999999e-01        8.112066e-08
## mF_178          9.998437e-01        1.563202e-04
## mM_203          4.160209e-13        1.000000e+00
## mM_204          1.417789e-12        1.000000e+00
## mM_205          2.001426e-07        9.999998e-01
## mF_206          9.999770e-01        2.304964e-05
## mF_207          1.000000e+00        1.314364e-10
## mF_208          1.000000e+00        1.479503e-12
## mM_209          4.067581e-12        1.000000e+00
## mM_210          1.725894e-10        1.000000e+00
## mM_211          3.704410e-11        1.000000e+00
## mF_212          9.999998e-01        1.756915e-07
## mF_213          9.999994e-01        6.128309e-07
## mF_214          1.000000e+00        1.263761e-10

# Remove unuseful object for the next steps
rm(predsoplsda, nlvoptsoplsda)

Reproducibility

This vignette was produced with the following R session configuration.

sessionInfo()

## R version 4.6.1 (2026-06-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 26.04 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rchemo_0.1-4 knitr_1.51  
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.6         rlang_1.2.0       xfun_0.59         jsonlite_2.0.0   
##  [5] data.table_1.18.4 buildtools_1.0.0  htmltools_0.5.9   maketools_1.3.2  
##  [9] e1071_1.7-17      sys_3.4.3         sass_0.4.10       rmarkdown_2.31   
## [13] evaluate_1.0.5    jquerylib_0.1.4   MASS_7.3-65       fastmap_1.2.0    
## [17] yaml_2.3.12       lifecycle_1.0.5   FNN_1.1.4.1       compiler_4.6.1   
## [21] digest_0.6.39     signal_1.8-1      R6_2.6.1          class_7.3-23     
## [25] bslib_0.11.0      tools_4.6.1       proxy_0.4-29      cachem_1.1.0