A Data-Oriented Process for Generation of School Dropout Prediction Model
educational data mining, dropout, predictive model, imbalanced classes.
School dropout is an extremely complex problem, as it involves not only a variety of perspectives, but also a variety of different types of dropout behavior. Historically, the most cited school dropout models had their origin in education, however the emerging area of Data Science applied in Education is capable of developing new predictive models, with generally better results when compared to the most used traditional statistical methods. The main objective of this thesis is the proposition of a Process for the generation of a Predictive School Dropout Model based on Data Sciences. To this end, a sequence of steps is defined in order to model an information flow from the definition of the problem to the generation of useful information for managers and teachers. The steps consist of: Understanding the Problem, Understanding the Data, Feature Engineering, Feature Selection, Data Balancing, Models, Evaluation and Interpretation. The proposal’s contribution is found in the indication of which techniques and algorithms should be used in each phase of knowledge discovery, and show that the phenomenon of school dropout must be addressed as a problem of imbalanced classes, which must use tools and appropriate metrics, in order to generate a robust and easy to interpret prediction model. The proposed process was validated on educational and socioeconomic data of students at Federal Institute of Rio Grande do Norte (IFRN).