Assistente de Busca: A RAG approach for semantic search in documents from ALERN
Retrieval-Augmented Generation; Large Language Models; Semantic Search.
The unprecedented growth in the creation and persistence of unsctructured textual documents in public institutions poses challenges for efficient information retrieval and data analysis. This research addresses these challenges, by proposing a prototype of a search assistant using the Retrieval-Augmented Generation (RAG) approach, specifically applied to documents produced by Assembleia Legislativa do Estado do Rio Grande do Norte (Alern). The proposed system leverages Natural Language Processing (NLP) techniques, vector databases, and Large Language Models (LLMs) to enable semantic search and the generation of relevant content as answers to query inputs. The research introduces an architecture capable of retrieving document fragments based on semantic similarity. User-provided queries are processed and used to search content with contextual relevance, which is then synthesized into coherent and contextually appropriate responses through an LLM. Results from automated evaluations using BERTScore demonstrate the system’s effectiveness in retrieving information based on user input data – with precision and recall achieving values of 79% and 69% respectively, which are satisfactory values in text generation scenarios. Being powered by the RAG approach, the proposed assistant not only reduces the cognitive load associated with the manual analysis of large document collections but also provides a scalable and adaptable solution for continuously evolving datasets. This research contributes to bridging the gap between the availability of public data and the generation of searchable information, aligning with goals of transparency and accessibility in the legislative environment.