Multi-Variable Expanded Latent Space Autoencoder for Image Classification Applied to Visual Monitoring of Coral Reefs and Navigation of Unmanned Surface Vehicles
Computer Vision, Image Analysis, Underwater Images, Multi-variable Autoencoder, Expanded Latent Space, Image Classification
We propose the use of the Multi-Variable Expanded Latent Space Autoencoder (MVELSA) to classify aquatic imagery, encompassing underwater domain, which can be applied to enable aquatic monitoring and other autonomous navigation tasks of Unmanned Surface Vehicles (USVs) in complex obstacle-ridden scenarios. Our core hypothesis is that MVELSA can identify objects of interest with efficacy and precision comparable or superior to traditional convolutional models and Vision Transformers (ViT) by leveraging an expanded latent representation that preserves critical morphological features. To validate this, we used as dataset the public AQUA20 benchmark, which consists of 20 subaquatic classes. Experimental results demonstrate that MVELSA, when used with Principal Component Analysis (PCA) and Self-Organizing Map (SOM), achieves a macro-average F1 score of 0.97 achieves a macro-averaged F1-Score of 0.97, outperforming baseline models in handling highly imbalanced data. Integrated into visual monitoring of coral reefs or navigation algorithms of a robotic sailboat, this system can help preservation missions and/or facilitate autonomous movement across oceans and lagoons with high reliability and minimal human intervention.