MIMIC SKEW-NORMAL MODEL FOR DIFFERENTIAL ANALYSIS OF ANTIBODYOME DATA: APPLICATION TO MAJOR DEPRESSION
Antibodyome; Differential Analysis; Latent Variables; MIMIC Model; Skew-Normal Distribution; Bayesian Inference; Protein Microarray; Depression; Bioinformatics.
Differential analysis of antibodyome data—antibody reactivity profiles obtained from highdensity protein microarrays—lacks dedicated multivariate methods, relying predominantly on univariate tests (𝑡-test, Wilcoxon) with multiple testing correction, an approach that frequently fails to detect biologically relevant but modest signals. This work proposes the MIMIC (Multiple Indicators Multiple Causes) model with a univariate skew-normal extension as an alternative to this analytical strategy. Unlike conventional methods that test each protein independently, the model aggregates information from multiple reactivities into a latent variable, eliminating the need for multiple testing correction and increasing the signal-to-noise ratio. The skew-normal extension, with centered parameterization, allows investigating latent asymmetry while preserving direct interpretation of regression coefficients. Four estimation methods were developed and compared: Normal MIMIC (baseline), maximum likelihood with Laplace approximation, Bayesian estimation via MCMC, and a Hybrid method combining Bayesian inference for the asymmetry parameter 𝜆 with MLE for remaining parameters. Monte Carlo simulations revealed that MLE for 𝜆 is unstable at sample sizes around 𝑛 = 60 (RMSE 3.2–8.2; coverage 50–93%), while Bayesian and Hybrid methods maintain nominal coverage (∼95%). Variable selection follows a two-stage pipeline: univariate pre-filter (a dimensionality reduction heuristic) followed by genetic algorithm with cross-validation. In the application to antibodyome data (HuProt microarray) from patients with major depression (𝑛 = 60), univariate analysis with FDR correction failed to identify any significant protein among 14,887 tested, while the MIMIC model detected a significant difference in the latent variable (𝑝 < 0.001) for both immunoglobulin channels (IgG and IgM). Credible intervals for 𝜆 contain zero in both channels, indicating that the skewnormal extension does not add information in this specific case. The genetic algorithm, guided exclusively by statistical criteria, converged to proteins with documented neuropsychiatric connections (RPL30, DNASE1L3, PARK7), although the biological plausibility assessment is based on narrative review and requires complementation through formal functional enrichment analysis. The central contribution is the empirical demonstration that latent variable-based differential analysis recovers patterns in antibodyome data where univariate methods fail.