Promoting Research Relevance: A Natural Language Processing-based Model for Identifying SDG-aligned Scholarly Publications
Sustainable Development Goals, Multilabel classification, Deep Learning, Natural Language Processing, Scientometrics.
In 2015, the United Nations established the 17 Sustainable Development Goals (SDGs) to promote environmental stewardship, economic advancement, and social equity. Within this framework, scientific research plays a pivotal role in addressing the challenges encompassed by the SDGs. An exemplary tool, SciVal, facilitates the correlation of scientific outputs with the SDGs through expert analyses. However, in order to mitigate the reliance on specialized expertise and offer a self-reliant solution, this endeavor proposes a natural language processing-based, deep machine learning-powered, multi-target classification model bolstered by interpretability techniques and good practices for the development and analysis of data streams.. The objective is to effectively map academic publications to the SDGs. By employing this proposed model, the vast potential of scholarly research can be harnessed, directly aligning it with the global agenda for sustainable development. Researchers, policymakers, and organizations can adeptly navigate the extensive landscape of research papers and identify those that harmonize with their specific areas of interest within the SDG framework. Over one million scientific publications were utilized to train and evaluate the model. The corpus encompassed publication titles extracted from the Scopus database, accessed via the SciVal tool, and annotated with respect to 16 of the 17 SDGs. To substantiate the efficacy of the proposed model, it was applied to associate publications from the Brazilian Automation Congress (CBA 2020) with the SDGs, thereby measuring the contribution of scientific endeavors in automation towards the attainment of the SDGs. The outcomes within the context of CBA 2020 revealed prevalent themes affiliated with SDGs 7 and 9, relating to clean energy and industrial innovation, respectively. Given the extensive training data and the comprehensive range of SDGs addressed, the model can confidently be deployed to correlate academic output from diverse domains with the SDGs.