Integrando Técnicas de Recuperação de Informação e Agrupamento de Relatórios de Erros para Melhorar a Localização de Bugs para Sistemas Web
Bug localization, Textual similarity, Information retrieval, Crash Report Grouping, Industrial Projects
Bug localization is a challenging and costly task that involves massive amounts of data, especially for large projects. Many natural language information retrieval-based bug localization techniques have been proposed and applied. However, only a few of them investigated industrial projects. Furthermore, current state-of-the-art studies rely too much on the quality of the involved software artifacts, particularly the bug reports and the source code files. Another approach for bug localization is the use of crash report grouping, which consolidates a considerable number of crash reports into clusters to help find the causes of a bug. Our key insight is that the combination of the ideas of these approaches can enable and improve bug localization, especially for scenarios of industrial projects with realistic software artifacts. In this work, we propose LucyBug, a technique that integrates natural language information retrieval and crash report grouping to locate the buggy files of a reported bug. We perform a study on a large web-based system, focusing on reducing the dependency on high-quality software artifacts. Our best results show that our approach can find the buggy files on 69.20\% of the cases for a top 10 ranking. We also present an alternative that performs less than the complete method but does not depend on the textual similarity between source code classes and bug reports