From Alarm to Diagnosis: Model Explainability and LLMs for Weakly Supervised Fault Classification in Hydropower Plants
Predictive maintenance. Weak supervision. Failure diagnosis. Explainable AI (XAI). SHAP. LIME. Large language models. Hydroelectric power plants. Thrust bearing.
Modern industrial systems log anomalies at scale through alarms, threshold violations, and protection trips, yet rarely provide actionable diagnoses about the underlying failure mechanism. In hydroelectric power plants, this gap is particularly critical: the volume of events exceeds the capacity of specialists to perform root-cause analysis, increasing the risk of unplanned outages and reactive decision-making. This thesis investigates the feasibility of transforming binary failure detections into plausible failure-mode hypotheses through an evidence-driven framework under weak supervision. The approach is organized into two macro stages. In the first stage, a supervised classifier is trained exclusively with binary labels derived from operational events. In the second stage, diagnostic refinement is performed from structured evidence: local feature-attribution signals (SHAP/LIME) and categorical percentile contextualization (very low to very high), which are provided to a Large Language Model (LLM) to synthesize diagnostic hypotheses traceable to the evidence. Framework plausibility is first validated on the synthetic AI4I 2020 Predictive Maintenance dataset, showing that local explanations and percentile-based contextualization can support zero-shot/few-shot failure-mode inference, with best performance when SHAP is combined with percentile contextualization. The framework is then applied to a real industrial case study at the Água Vermelha Hydroelectric Power Plant, focusing on thrust-bearing condition monitoring. Results confirm applicability under real-world constraints and show that cross-unit heterogeneity is the dominant factor in predictive performance, motivating a hybrid deployment strategy that combines unit-specific models with a combined model as a robust contingency. Explainability analysis reveals the dominance of thermal attributes and non-monotonic relationships between variables and predicted risk, reinforcing the need for instance-level explanations and contextualized evidence. LLM-based inference exhibits high consistency when evidence is clear and controlled divergences in ambiguous scenarios, which can be leveraged as a trigger for human review. Finally, a Weibull-based reliability analysis complements the operational discussion, providing quantitative support for planning and prospective assessment of impacts on asset lifetime and availability.