Multi-Block PLS Discriminant Analysis

The Multi-Block PLS Discriminant Analysis method

Metabolomics is a powerful phenotyping tool, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between experimental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables of the different data blocks in discriminating groups of observations, taking into account data structure.

References:

  • Wold, S. (Ed.). (1984). Three PLS algorithms according to SW. In Report from the symposium MULTDAST (multivariate data analysis in science and technology) (pp. 26–30). Umeå, Sweden

  • Westerhuis, J. A., Kourti, T., & Macgregor, J. F. (1998). Analysis of multi block and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301–321.

A selection of reduced non-targeted metabolomics datasets from an article of Zhang et al. have been used to illustrate the method application. (Zhang, Y.; Barupal, D.K.; Fan, S.; Gao, B.; Zhu, C.; Flenniken, A.M.; McKerlie, C.; Nutter, L.M.J.; Lloyd, K.C.K.; Fiehn, O. Sexual Dimorphism of the Mouse Plasma Metabolome Is Associated with Phenotypes of 30 Gene Knockout Lines. Metabolites 2023, 13, 947. https://doi.org/10.3390/metabo13080947).

In this study, plasma were analyzed by several metabolomics platforms.

In our example, the 6 samples from 5 mutant groups (Dhfr, Gnpda1, Plk1, Sra1, Ulk3) and the 40 controls were retained, with the exception of 2 animals that had missing values in non-targeted metabolomics. Three HILIC-NEG and 2 HILIC-POS variables were then removed due to missing or infinite values. Lastly, we kept only 2 data blocks (GCTOF and HILIC POS) to reduce the computation time. Always in our example, the objective was to highlight metabolites discriminating males and females.

R environment preparation

#install.packages("knitr")
library(knitr)
opts_chunk$set(echo = TRUE)

# To ensure reproducibility
set.seed(12)
pkgs <- c("rchemo")
sapply(pkgs, function(x) {
  if (!requireNamespace(x, quietly = TRUE)) {
    install.packages(x)
  }
})
library(rchemo)  # to load rchemo

Data preprocessing

A demonstration dataset used for this example is imported. It contains 2 metabolomics datasets and 1 sample metadata block, with the groups to be discriminated.

data(Zhang2023, package = "rchemo")

All data blocks should have exactly the same samples (rows). The block dimension can be checked with the following command:

# Check dimension
BlockNames <- names(Zhang2023)
nbrBlocs <- length(BlockNames)
dims <- lapply(X=Zhang2023[BlockNames], FUN=dim)
names(dims) <- BlockNames
dims
## $CHSNEG
## [1]  68 163
## 
## $CHSPOS
## [1]  68 288
## 
## $GCTOF
## [1]  68 108
## 
## $HILICPOS
## [1]  68 133
## 
## $HILICNEG
## [1] 68 44
## 
## $metadata
## [1] 68  2
# Remove unuseful object for the next steps
rm(nbrBlocs, dims)

The identical order of samples in the three blocks should be ensured.

# Check rows names in any order
row_names <- lapply(X=Zhang2023[BlockNames], FUN=rownames)
rns <- do.call(cbind, row_names)
rns.unique <- apply(rns, 1, function(x) length(unique(x)))
if (max(rns.unique) > 1) {
  stop("Rows names are not identical between blocks.")
}

# Check order of samples
check_row_names <- all(sapply(X=row_names, FUN=identical, y = row_names[[1]]))
if (!check_row_names && max(rns.unique) == 1) {
  print("Rows names are not in the same order for all blocks.")
}

# Remove unuseful object for the next steps
rm(row_names, rns, rns.unique, check_row_names)
GCTOF <- Zhang2023$GCTOF
HILICPOS <- Zhang2023$HILICPOS
sample_metadata <- Zhang2023$metadata
rm(Zhang2023)

Data visualization by PCA on separate data blocks

PCA on different datablocks is necessary to verify that there is no outlier, and that the data scaling is suited. In our example, unit variance scaling is suited: For both metabolomic dataset, the scatterplot of loadings is homogeneous, and there are no outliers. With other datasets, particularly those containing more noise, it would have been necessary to apply a log transformation with Pareto scaling.

# GCTOF with unit variance scaling
pcaGCTOF <- pcanipalsna(X = scale(GCTOF[,1:dim(GCTOF)[2]]), nlv = nrow(GCTOF), 
                         gs = TRUE,
                         tol = .Machine$double.eps^0.5, maxit = 200)

## diagram of explained variance
barplot(summary(pcaGCTOF,X = scale(GCTOF[,1:dim(GCTOF)[2]]))$explvar$pvar * 100, names.arg = 1:nrow(GCTOF), main = "diagram of explained variance - PCA GCTOF")

## score plot
plotxy(X= pcaGCTOF$T, group = sample_metadata$Gender, 
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "components - PCA GCTOF", ncol = 1,
       pch=16)

## loading plot
plotxy(X= pcaGCTOF$P, group = NULL, 
       asp = 0, col = NULL, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = TRUE,
       legend = FALSE, main = "loadings - PCA GCTOF", ncol = 1,
       cex=0.8) 

#HILICPOS with unit variance scaling
pcaHILICPOS <- pcanipalsna(X = scale(HILICPOS[,1:dim(HILICPOS)[2]]), nlv = nrow(HILICPOS), 
                         gs = TRUE,
                         tol = .Machine$double.eps^0.5, maxit = 200)

## diagram of explained variance
barplot(summary(pcaHILICPOS,X = scale(HILICPOS[,1:dim(HILICPOS)[2]]))$explvar$pvar * 100, names.arg = 1:nrow(HILICPOS), main = "diagram of explained variance - PCA HILIC POS")

## score plot
plotxy(X= pcaHILICPOS$T, group = sample_metadata$Gender, 
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "components - PCA HILIC POS", ncol = 1,
       pch=16)

## loading plot
plotxy(X= pcaHILICPOS$P, group = NULL, 
       asp = 0, col = NULL, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = TRUE,
       legend = FALSE, main = "loadings - PCA HILIC POS", ncol = 1,
       cex=0.8)

# Remove unuseful object for the next steps
rm(pcaGCTOF, pcaHILICPOS)

MB-PLS-DA model

In MB-PLS-DA, datablocks are separatly scaled, and weigthed by their Frobenius norm. Then, they are concatenated before a PLS-DA to predict or explain a categorical variable. The main steps to apply this method are:

  • the determination of the optimal number of latent variables of the model, proposed in the package by cross-validation

  • the model validation, proposed by permutation test

  • the model analysis, based on the scatter plot, the loading plot, the Variable Importance in Projection

  • the calculation of the predicted categories for each observation.

In rchemo, all these steps can be performed with a single function. The arguments are:

  • Xlist: list of training X-data (n, p).

  • Xnames: names of the X-matrices

  • Xscaling: vector of Xlist length. X variable scaling among “none” (mean-centering only), “pareto” (mean-centering and pareto scaling), “sd” (mean-centering and unit variance scaling). If “pareto” or “sd”, uncorrected standard deviation is used.

  • Y: Training Y-data (n, 1)

  • Yscaling: Y variable scaling among “none” (mean-centering only), “pareto” (mean-centering and pareto scaling), “sd” (mean-centering and unit variance scaling). If “pareto” or “sd”, uncorrected standard deviation is used.

  • weights Weights (n, 1) to apply to the training observations. Internally, weights are “normalized” to sum to 1. Default to NULL (weights are set to 1/n).

  • newXlist: list of new X-data (m, p) to consider.

  • newXnames: names of the newX-matrices

  • method: method to apply among “mbplsrda”,“mbplslda”,“mbplsqda”

  • prior: for mbplslda or mbplsqda models : The prior probabilities of the classes. Possible values are “unif” (default; probabilities are set equal for all the classes) or “prop” (probabilities are set equal to the observed proportions of the classes in y).

  • step: step of the analysis among “nlvtest” (cross-validation to help determine the optimal number of latent variables), “permutation” (permutation test), “model” (model calculation),“prediction” (prediction of newX-data or X-data if any))

  • nlv: number of latent variables to test if step is “nlvtest”; number of latent variables of the model if step is not “nlvtest”.

  • modeloutput: if step is “model”: outputs among “scores”, “loadings”, “coef” (regression coefficients), “vip” (Variable Importance in Projection; the VIP calculation being based on the proportion of Y-variance explained by the components, as proposed by Mehmood et al (2012, 2020).)

  • cvmethod: if step is “nlvtest” or “permutation”: “kfolds” for k-folds cross-validation, or “loo” for leave-one-out.

  • nbrep: if step is “nlvtest” and cvmethod is “kfolds”: An integer, setting the number of CV repetitions. Default value is 30. Must me set to 1 if cvmethod is “loo”

  • seed: if step is “nlvtest” and cvmethod is “kfolds”, or if step is “permutation: a numeric. Seed used for the repeated resampling

  • samplingk: A vector of length n. The elements are the values of a qualitative variable used for stratified partition creation. If NULL, the first observation is set in the first fold, the second observation in the second fold, etc…

  • nfolds: if cvmethod is “kfolds”. An integer, setting the number of partitions to create.Default value is 10.

  • npermut: if step is “permutation”: An integer, setting the number of Y-Block with permutated responses to create. Default value is 30.

  • criterion: if step is “nlvtest” or “permutation” and method is “mbplsrda”, “mbplslda” or “mbplsqda”: optimisation criterion among “rmse” and “err” (for classification error rate)))

  • selection: if step is “nlvtest”: a character indicating the selection method to use to choose the optimal combination of components, among “localmin”, “globalmin”,“1std”. If “localmin”: the optimal combination corresponds to the first local minimum of the mean CV rmse or error rate. If “globalmin” : the optimal combination corresponds to the minimum mean CV rmse or error rate. If “1std” (one standard error rule) : it corresponds to the first combination after which the mean cross-validated rmse or error rate does not decrease significantly.

determination by cross-validation of the optimal number of latent variables of the model

In this example,the mbplslda method with an uniform prior is performed: a linear discriminant analysis is applied on the PLS scores to obtain the classifications.

In the mbplsrda method, the classification step is performed according to the predicted value, and consequently, it doesn’t take into account the variance of the scatter plot. And the mbplsqda method is mode suited when the scatter plot has an atypical shape.

The cvmethod is to choose according to the number of observations. If this number is small, a leave-one-out cross-validation is more suited, but can lead to optimistic results. In the other cases, it could be better to apply a k-fold cross-validation, usually with 3, 5 or 10 folds. In order to reduce the computational time, in this example, the number of cross- validation repetition is set to 10. But it must be higher (at least 30).

Concerning the criterion, in a discriminant context, it is more logical to choose the classification error rate. However, this can result in an error plot depending on the number of latent variables that is not smooth, so some prefer to use rmsecv instead.

In the example, the selection parameter is set to “localmin”. Normally, the cross-validated error decreases until it reaches the optimal number of latent variables, before rising again. Sometimes it fluctuates before this rise, so choosing “localmin” allows for a more parsimonious model. The “1std” option is also very parsimonious because it only considers a larger number of latent variables if doing so significantly improves the cross-validated error.

nlvtestmbplsda <- mbplsr_mbplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                       HILICPOS = HILICPOS[,1:dim(HILICPOS)[2]]),
                                          Xnames = c("GCTOF", "HILICPOS"), 
                                          Xscaling = c("none","pareto","sd")[3], 
                                          Y = sample_metadata[,"Gender", drop=FALSE], 
                                          Yscaling = c("none","pareto","sd")[1], 
                                          weights = NULL,
                                          newXlist = NULL, newXnames = NULL,
                                          method = c("mbplsrda","mbplslda","mbplsqda")[2],
                                          prior = c("unif", "prop")[1],
                                          step = c("nlvtest","permutation","model","prediction")[1],
                                          nlv = 4, 
                                          #modeloutput = c("scores","loadings","coef","vip"), 
                                          cvmethod = c("kfolds","loo")[1], 
                                          nbrep = 10, 
                                          seed = 123, 
                                          samplingk = NULL, 
                                          nfolds = 3, 
                                          #npermut = 30, 
                                          
                                          criterion = c("err","rmse")[1], 
                                          selection = c("localmin","globalmin","1std")[1],
                                          
                                          outputfilename = NULL)

nlvtestmbplsda
##   nblv    err_mean     err_sd optimum
## 1    0 0.560294118 0.05162209       0
## 2    1 0.039705882 0.02689386       0
## 3    2 0.008823529 0.01028244       1
## 4    3 0.008823529 0.01028244       0
## 5    4 0.008823529 0.01028244       0
nlvoptmbplsda <- nlvtestmbplsda[nlvtestmbplsda$optimum==1,"nblv"] # to obtain the optimal number of LV.

# to plot the results of the cross-validation
plot(nlvtestmbplsda$nblv, nlvtestmbplsda$err_mean, xlab = "number of LV", ylab = "CV classification error rate", pch = 16, ylim = c(0,0.6))
segments(nlvtestmbplsda$nblv,nlvtestmbplsda$err_mean-nlvtestmbplsda$err_sd,nlvtestmbplsda$nblv,nlvtestmbplsda$err_mean+nlvtestmbplsda$err_sd)
segments(nlvtestmbplsda$nblv-0.1,nlvtestmbplsda$err_mean-nlvtestmbplsda$err_sd,nlvtestmbplsda$nblv+0.1,nlvtestmbplsda$err_mean-nlvtestmbplsda$err_sd)
segments(nlvtestmbplsda$nblv-0.1,nlvtestmbplsda$err_mean+nlvtestmbplsda$err_sd,nlvtestmbplsda$nblv+0.1,nlvtestmbplsda$err_mean+nlvtestmbplsda$err_sd)

# Remove unuseful object for the next steps
rm(nlvtestmbplsda)

model validation by permutation test

Usually, the cross-validation parameters are the same than for the determination of the optimal number of latent variables. In this example, as for the determination of the optimal number of latent variables, in order to reduce the computational time, the number of cross- validation repetition and the number of permuted responses are set to 10. But they must be higher (at least 30). To be valid, all the cross-validated errors obtained on permuted data must be lower than the cross-validated error obtained on the original data.

permutmbplsda <- mbplsr_mbplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                      HILICPOS = HILICPOS[,1:dim(HILICPOS)[2]]), 
                                         Xnames = c("GCTOF", "HILICPOS"), 
                                         Xscaling = c("none","pareto","sd")[3], 
                                         Y = sample_metadata[,"Gender",drop=FALSE], 
                                         Yscaling = c("none","pareto","sd")[1], weights = NULL,
                                         newXlist = NULL, newXnames = NULL,
                                         
                                         method = c("mbplsrda","mbplslda","mbplsqda")[2],
                                         prior = c("unif", "prop")[1],
                                         
                                         step = c("nlvtest","permutation","model","prediction")[2],
                                         nlv = nlvoptmbplsda, 
                                         modeloutput = c("scores","loadings","coef","vip"), 
                                         
                                         cvmethod = c("kfolds","loo")[1], 
                                         nbrep = 10, 
                                         seed = 123, 
                                         samplingk = NULL, 
                                         nfolds = 3, 
                                         npermut = 10, 
                                         
                                         criterion = c("err","rmse")[1], 
                                         # selection = c("localmin","globalmin","1std")[1],
                                         
                                         import = c("R","ChemFlow","W4M")[1],
                                         outputfilename = NULL)

#plot of the results
plot(permutmbplsda, pch = 16, ylab = "CV classification error rate", xlab = "dyssimilarity Y-Ypermuted")

# Remove unuseful object for the next steps
rm(permutmbplsda)

model analysis, based on the scatter plot, the loading plot, the Variable Importance in Projection

The score plot allows to show the group discrimination. Based on the VIP curve, the threshold to consider that a variable is important can be set.

modelmbplsda <- mbplsr_mbplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                     HILICPOS = HILICPOS[,1:dim(HILICPOS)[2]]), 
                                        Xnames = c("GCTOF", "HILICPOS"), 
                                        Xscaling = c("none","pareto","sd")[3], 
                                        Y = sample_metadata[,"Gender",drop=FALSE], 
                                        Yscaling = c("none","pareto","sd")[1], 
                                        weights = NULL,
                                        newXlist = NULL, newXnames = NULL,
                                        
                                        method = c("mbplsrda","mbplslda","mbplsqda")[2],
                                        prior = c("unif", "prop")[1],
                                        
                                        step = c("nlvtest","permutation","model","prediction")[3],
                                        nlv = nlvoptmbplsda, 
                                        modeloutput = c("scores","loadings","coef","vip"), 
                                        
                                        cvmethod = c("kfolds","loo")[1], 
                                        # nbrep = 30, 
                                        # seed = 123, 
                                        # samplingk = NULL, 
                                        # nfolds = 5, 
                                        # npermut = 30, 
                                        
                                        # criterion = c("err","rmse")[1], 
                                        # selection = c("localmin","globalmin","1std")[1],
                                        
                                        import = c("R","ChemFlow","W4M")[1],
                                        outputfilename = NULL)

# score plot
plotxy(X= modelmbplsda$scores, group = sample_metadata$Gender, 
       asp = 0, col = 3:4, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "scores - MB PLS DA", ncol = 1,
       pch=16)

# loading plot
plotxy(X=  modelmbplsda$loadings, group = substr(rownames(modelmbplsda$loadings),1,6), 
       asp = 0, col = NULL, alpha.f = .8,
       zeroes = TRUE, circle = FALSE, ellipse = FALSE,
       labels = FALSE,
       legend = TRUE, main = "loadings - MB PLS DA", ncol = 1,
       cex=0.8, pch = 16)

# VIP curve
plot(modelmbplsda$vip[order(modelmbplsda$vip[,nlvoptmbplsda], decreasing = TRUE),nlvoptmbplsda], pch = 16,cex = 0.8,
     col = as.numeric(as.factor(substr(rownames(modelmbplsda$vip[order(modelmbplsda$vip[,nlvoptmbplsda], decreasing = TRUE),nlvoptmbplsda, drop=FALSE]),1,6))), ylab = "VIP value",
     main = "VIP curve MB PLS DA")
legend("topright", legend = c("GCTOF", "HILICPOS"), pch = 16, col = 1:2)

# Remove unuseful object for the next steps
rm(modelmbplsda)

predicted values

The prediction step provides a table with:

  • subjects scores

  • predicted categories

  • probabilities to be predicted for each category.

predmbplsda <- mbplsr_mbplsda_allsteps(Xlist = list(GCTOF = GCTOF[,1:dim(GCTOF)[2]], 
                                                    HILICPOS = HILICPOS[,1:dim(HILICPOS)[2]]), 
                                        Xnames = c("GCTOF", "HILICPOS"), 
                                        Xscaling = c("none","pareto","sd")[3], 
                                        Y = sample_metadata[,"Gender",drop=FALSE], 
                                       Yscaling = c("none","pareto","sd")[1], 
                                       weights = NULL,
                                        newXlist = NULL, newXnames = NULL,
                                        
                                        method = c("mbplsrda","mbplslda","mbplsqda")[2],
                                        prior = c("unif", "prop")[1],
                                        
                                        step = c("nlvtest","permutation","model","prediction")[4],
                                        nlv = nlvoptmbplsda, 
                                        # modeloutput = c("scores","loadings","coef","vip"), 
                                        # 
                                        # cvmethod = c("kfolds","loo")[1], 
                                        # nbrep = 30, 
                                        # seed = 123, 
                                        # samplingk = NULL, 
                                        # nfolds = 5, 
                                        # npermut = 30, 
                                        # 
                                        # criterion = c("err","rmse")[1], 
                                        # selection = c("localmin","globalmin","1std")[1],
                                        
                                        import = c("R","ChemFlow","W4M")[1],
                                        outputfilename = NULL)

predmbplsda
##                lv1           lv2 pred.y1 pred.posterior.Female
## wM_001  0.54558754 -0.3480701786    Male          3.021370e-22
## wM_002  0.27672623  0.5525506059    Male          6.644988e-23
## wF_003 -0.18953500 -0.6066209461  Female          1.000000e+00
## wF_004 -0.25792171 -0.1626404652  Female          1.000000e+00
## wM_005  0.56744443  0.1747950458    Male          1.129001e-31
## wM_006  0.41047657  0.3127838504    Male          6.872642e-26
## wM_007  0.62432444  0.0665651227    Male          7.633840e-33
## wF_008 -0.26793088 -0.5210826748  Female          1.000000e+00
## wF_009 -0.43812743  0.7033443778  Female          1.000000e+00
## wF_010 -0.36240950 -0.1472302702  Female          1.000000e+00
## wM_011  0.68346912 -0.3273525088    Male          1.385751e-29
## wM_012  0.45290904 -0.0350832487    Male          1.633604e-22
## wM_013  0.24736228 -0.1652307834    Male          5.393351e-10
## wF_014 -0.20535172 -0.7695462248  Female          1.000000e+00
## wF_015 -0.25984530 -0.4444141502  Female          1.000000e+00
## wF_016 -0.78621166  1.1800474687  Female          1.000000e+00
## wM_017  0.37249331  0.0913761085    Male          1.947827e-20
## wM_018  0.36660722  0.2772086830    Male          4.295314e-23
## wM_019  0.33774376  0.1574541099    Male          1.011542e-19
## wM_020  0.34257086 -0.0126229122    Male          2.915889e-17
## wM_021  0.42810439  0.1353859046    Male          5.789615e-24
## wF_022 -0.42117471 -0.1965165871  Female          1.000000e+00
## wF_023 -0.43007949 -0.0829182271  Female          1.000000e+00
## wF_024 -0.30443954 -0.3161993062  Female          1.000000e+00
## wF_025 -0.27525802 -0.4767367782  Female          1.000000e+00
## wF_026 -0.42887830 -0.3696893442  Female          1.000000e+00
## wM_027  0.23855883  0.7513316279    Male          3.989847e-24
## wM_028  0.54189604 -0.1226002419    Male          1.206804e-25
## wM_029  0.47860932  0.1794120552    Male          3.126452e-27
## wF_030 -0.30062152 -0.3746063899  Female          1.000000e+00
## wF_031 -0.63819763  0.1374083309  Female          1.000000e+00
## wF_032 -0.51121074  0.2163930345  Female          1.000000e+00
## wM_033  0.43275430 -0.1176136958    Male          3.551174e-20
## wM_034  0.56214830 -0.1994148151    Male          1.878976e-25
## wF_036 -0.34102658 -0.5445495665  Female          1.000000e+00
## wF_037 -0.24651172 -0.7841320956  Female          1.000000e+00
## wF_038 -0.15350386 -0.3413986114  Female          1.000000e+00
## wM_039  0.24214357  0.2142606496    Male          9.143881e-16
## wF_040 -0.15144467 -1.0875471713  Female          1.000000e+00
## mM_077  0.45765804  0.2197480622    Male          8.289178e-27
## mM_078  0.51978319  0.0272823009    Male          6.633571e-27
## mM_079  0.40675803  0.3043368636    Male          1.447116e-25
## mF_080 -0.46775285 -0.0005502785  Female          1.000000e+00
## mF_081 -0.77371976  0.7389887822  Female          1.000000e+00
## mF_082 -0.73247181  0.7521846206  Female          1.000000e+00
## mM_101  0.59862750 -0.2587550942    Male          2.308859e-26
## mM_103 -0.01715897  1.5414920887    Male          1.076581e-23
## mF_104 -0.38675396 -0.1153383479  Female          1.000000e+00
## mF_105 -0.53988268  0.1643913533  Female          1.000000e+00
## mF_106 -0.43159893 -0.0493457028  Female          1.000000e+00
## mM_173  0.28114216  0.5823233714    Male          1.331882e-23
## mM_174  0.56080465 -0.3801265587    Male          1.646418e-22
## mM_175  0.38261063 -0.0586969348    Male          1.452927e-18
## mF_176 -0.48741429  0.0046199111  Female          1.000000e+00
## mF_177 -0.58208729  0.1995090610  Female          1.000000e+00
## mF_178 -0.49321891  0.2434577064  Female          1.000000e+00
## mM_203  0.56597280 -0.0499160055    Male          5.030964e-28
## mM_204  0.47625034 -0.1217012783    Male          2.537079e-22
## mM_205  0.50231341  0.1538464184    Male          4.974237e-28
## mF_206 -0.11621764 -0.3973940955  Female          1.000000e+00
## mF_207 -0.51776158 -0.0145000918  Female          1.000000e+00
## mF_208 -0.47720999 -0.0883721908  Female          1.000000e+00
## mM_209  0.41289099  0.2703443875    Male          2.451390e-25
## mM_210  0.33125672  0.3419254886    Male          2.515514e-22
## mM_211  0.33599218  0.4426191774    Male          3.616769e-24
## mF_212 -0.34935922  0.1392809556  Female          1.000000e+00
## mF_213 -0.41441692 -0.1738904810  Female          1.000000e+00
## mF_214 -0.22728541 -1.0142632710  Female          1.000000e+00
##        pred.posterior.Male
## wM_001        1.000000e+00
## wM_002        1.000000e+00
## wF_003        1.092976e-20
## wF_004        4.207508e-17
## wM_005        1.000000e+00
## wM_006        1.000000e+00
## wM_007        1.000000e+00
## wF_008        2.594243e-23
## wF_009        1.727822e-12
## wF_010        3.611270e-22
## wM_011        1.000000e+00
## wM_012        1.000000e+00
## wM_013        1.000000e+00
## wF_014        4.397232e-24
## wF_015        1.107853e-21
## wF_016        1.334751e-22
## wM_017        1.000000e+00
## wM_018        1.000000e+00
## wM_019        1.000000e+00
## wM_020        1.000000e+00
## wM_021        1.000000e+00
## wF_022        6.116427e-26
## wF_023        1.382545e-24
## wF_024        6.560388e-22
## wF_025        5.583315e-23
## wF_026        4.370119e-29
## wM_027        1.000000e+00
## wM_028        1.000000e+00
## wM_029        1.000000e+00
## wF_030        1.207860e-22
## wF_031        1.163407e-31
## wF_032        5.986934e-24
## wM_033        1.000000e+00
## wM_034        1.000000e+00
## wF_036        2.113745e-27
## wF_037        2.083913e-26
## wF_038        1.226588e-14
## wM_039        1.000000e+00
## wF_040        2.115853e-26
## mM_077        1.000000e+00
## mM_078        1.000000e+00
## mM_079        1.000000e+00
## mF_080        3.433359e-25
## mF_081        5.560755e-29
## mF_082        1.126667e-26
## mM_101        1.000000e+00
## mM_103        1.000000e+00
## mF_104        6.720559e-23
## mF_105        3.108831e-26
## mF_106        3.957580e-24
## mM_173        1.000000e+00
## mM_174        1.000000e+00
## mM_175        1.000000e+00
## mF_176        4.154279e-26
## mF_177        8.049299e-28
## mF_178        1.325162e-22
## mM_203        1.000000e+00
## mM_204        1.000000e+00
## mM_205        1.000000e+00
## mF_206        1.240074e-13
## mF_207        5.911871e-28
## mF_208        4.551203e-27
## mM_209        1.000000e+00
## mM_210        1.000000e+00
## mM_211        1.000000e+00
## mF_212        6.000164e-17
## mF_213        3.089580e-25
## mF_214        4.323746e-29
# Remove unuseful object for the next steps
rm(predmbplsda, nlvoptmbplsda)

Reproducibility

This vignette was produced with the following R session configuration.

sessionInfo()
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rchemo_0.1-4 knitr_1.51  
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.6         rlang_1.2.0       xfun_0.58         jsonlite_2.0.0   
##  [5] data.table_1.18.4 buildtools_1.0.0  htmltools_0.5.9   maketools_1.3.2  
##  [9] e1071_1.7-17      sys_3.4.3         sass_0.4.10       rmarkdown_2.31   
## [13] evaluate_1.0.5    jquerylib_0.1.4   MASS_7.3-65       fastmap_1.2.0    
## [17] yaml_2.3.12       lifecycle_1.0.5   FNN_1.1.4.1       compiler_4.6.0   
## [21] digest_0.6.39     signal_1.8-1      R6_2.6.1          class_7.3-23     
## [25] bslib_0.11.0      tools_4.6.0       proxy_0.4-29      cachem_1.1.0