Integrando Técnicas de Recuperação de Informação e Agrupamento de Relatórios de Erros para Melhorar a Localização de Bugs para Sistemas Web
Bug localization, Textual similarity, Information retrieval, Crash Report Grouping, Industrial Projects
Bug localization is a challenging and costly task that involves massive amounts of data, especially for large projects. Many textual similarity information retrieval techniques have been proposed and are applied for bug localization. However, only a few of them investigated industrial projects. Furthermore, current state-of-the-art studies rely on the quality of the involved software artifacts, particularly the bug reports and the source code classes. Another approach for bug localization is the use of crash report grouping, which consolidates a huge amount of log information into clusters, that help find the causes of a bug. Our key insight is that the combination of the ideas of these approaches can enable and improve bug localization, especially for scenarios of industrial projects with realistic software artifacts. In this work, we propose LucyBug, a technique that combines textual similarity information retrieval and crash report grouping to locate the buggy classes of a bug report. We perform a study on a large web-based system, focusing on reducing the dependency on high-quality software artifacts. Our best results show that our approach can find the buggy files on 54.71% of the cases, for a top 10 ranking. We also present an alternative, which performs less than the full method but does not depend on the textual similarity between source code classes and bug reports.