EDUCATIONAL DATA MINING AND MACHINE LEARNING FOR ANALYSIS AND PREVENTION OF SCHOOL EVASION IN AN UNDERGRADUATE COURSE
Educational Data Mining, Machine Learning, Random Forest, Self-Organizing Maps, SHapley Additive exPlanations
Universities face the challenge of transforming a large amount of student data into actionable insights to enhance academic management and reduce dropout rates in higher education. A promising approach to identify factors influencing academic performance is Educational Data Mining (EDM) and Machine Learning (ML). This dissertation aims to explore data from the Bachelor of Science and Technology (BC&T) program at the Federal University of Rio Grande do Norte (UFRN), focusing on students enrolled between 2014 and 2023. The objective is to develop analytical models capable of identifying intervention strategies that contribute to students' academic development.Through a literature review, suitable ML algorithms for a hybrid approach were identified, combining Random Forest (classification) and Self-Organizing Maps (clustering), with SHapley Additive exPlanations for explainability analysis. The process involved adapting Knowledge Discovery in Databases with stages such as data collection, preprocessing, feature mapping, training and testing, and explainability analysis. The expected outcome is to develop a predictive model that can identify a set of explainable characteristics while enhancing the predictive power of the model.Ultimately, the goal is to develop a Minimum Viable Product (MVP) as proof of concept to demonstrate prediction results, explainability of findings, and descriptive and predictive analyses of patterns influencing student retention in the program