Banca de QUALIFICAÇÃO: SAINT CLAIR DA CUNHA LIMA

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
STUDENT : SAINT CLAIR DA CUNHA LIMA
DATE: 29/04/2025
TIME: 14:00
LOCAL: meet.google.com/aip-kyxx-zev
TITLE:

Assistente de Busca: A RAG approach for semantic search in documents from ALERN


KEY WORDS:

Retrieval-Augmented Generation; Large Language Models; Semantic Search.


PAGES: 112
BIG AREA: Ciências Exatas e da Terra
AREA: Ciência da Computação
SUMMARY:

The unprecedented growth in the creation and persistence of unsctructured textual documents in public institutions poses challenges for efficient information retrieval and data analysis. This research addresses these challenges, by proposing a prototype of a search assistant using the Retrieval-Augmented Generation (RAG) approach, specifically applied to documents produced by Assembleia Legislativa do Estado do Rio Grande do Norte (Alern). The proposed system leverages Natural Language Processing (NLP) techniques, vector databases, and Large Language Models (LLMs) to enable semantic search and the generation of relevant content as answers to query inputs. The research introduces an architecture capable of retrieving document fragments based on semantic similarity. User-provided queries are processed and used to search content with contextual relevance, which is then synthesized into coherent and contextually appropriate responses through an LLM. Results from automated evaluations using BERTScore demonstrate the system’s effectiveness in retrieving information based on user input data – with precision and recall achieving values of 79% and 69% respectively, which are satisfactory values in text generation scenarios. Being powered by the RAG approach, the proposed assistant not only reduces the cognitive load associated with the manual analysis of large document collections but also provides a scalable and adaptable solution for continuously evolving datasets. This research contributes to bridging the gap between the availability of public data and the generation of searchable information, aligning with goals of transparency and accessibility in the legislative environment.


COMMITTEE MEMBERS:
Presidente - 1669545 - DANIEL SABINO AMORIM DE ARAUJO
Interno - 2353000 - ELIAS JACOB DE MENEZES NETO
Externo ao Programa - 2668551 - ANDRE MORAIS GURGEL - UFRN
Notícia cadastrada em: 06/05/2025 18:13
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2025 - UFRN - sigaa10-producao.info.ufrn.br.sigaa10-producao