A Method for Educational Data Mining Applied to the School Drop-out.
KDD, EDM, educational data mining, dropout, predictive model, imba-lanced classes.
This document corresponds to the proposal of doctoral thesis, whose main objective is the proposal of a method based on the methodology Knowledge Discovery in Databases (KDD) for the educational context with emphasis in the problem of school dropout. For this, the use of statistical techniques, data mining and visualization are investigated and proposed, in order to define a method based on the KDD phases with the objective of generating a model with a better prediction of evasion in the educational context. The concern is due to the fact that this phenomenon represents less skilled labor in the labor market, less chance of social mobility, especially in a country with social inequalities such as Brazil. However, it was verified that this problem is very complex and contextualized. Therefore, developing a method to detect evasion can make it possible to prevent the student from leaving the institution of education according to the school context, instead of developing a general and complex model that can compromise the quality of prediction. In statistical terms, it is considered as an imbalanced class problem, so we need to use appropriate metrics such as recall and confusion matrix in order to generate models with reliable predictions. To validate the proposed method, educational data are used of students from the integrated courses of the Instituto Federal do Rio Grande do Norte (IFRN). Preliminary results show that the social characterization attributes influence the prediction of school dropout, but they are not determinant factors, such as school performance. However, these two information, when used together, produce a good predictive model.