Unsupervised Learning Applied to Stratification of Premature Births in Brazil
Machine learning, unsupervised learning, SINASC, CADU, premature birth
According to data recently published by UNICEF in 2016, the leading cause of infant mortality up to 5 years of age is premature birth, representing about 17.9% of global infant mortality. In Brazil, the scenario is the same, reaching 17.1% lower than just mortality due to congenital problems. However, it is noteworthy that by restricting the data for the neonatal period, problems related to prematurity are the leading causes of infant mortality, representing 15.1% of the total. Studies have shown that many of the causes of prematurity are associated with social, economic, and cultural issues. Thus, this work aims to stratify this severe problem by identifying correlations between prematurity and socioeconomic data, aiming to direct more effective public policies to reduce mortality from prematurity. The stratification will be performed using machine learning tools based on unsupervised algorithms. Two-level cluster analysis is applied to two datasets collected by the Federal Government of Brazil: The Information System on Live Births (SINASC) and the Single Registry (CADU). Results show that the prematurity rate per municipality is correlated with socioeconomic conditions.