Banca de DEFESA: JOÃO HELIS JUNIOR DE AZEVEDO BERNARDO

Uma banca de DEFESA de DOUTORADO foi cadastrada pelo programa.
DISCENTE : JOÃO HELIS JUNIOR DE AZEVEDO BERNARDO
DATA : 04/09/2025
HORA: 15:00
LOCAL: Google Meet (remota)
TÍTULO:

Uncovering the Relationship Between Continuous Integration and Machine Learning Projects


PALAVRAS-CHAVES:

Continuous Integration;Machine Learning; Build Duration; Test Coverage.


PÁGINAS: 248
RESUMO:

Continuous Integration (CI) is a cornerstone of modern software development. However, while widely adopted in traditional software projects, applying CI practices to Machine Learning (ML) projects presents distinctive challenges involving not only code testing but also data validation and model evaluation. Therefore, this thesis investigates the differences, challenges, and strategies of CI adoption in ML through four complementary studies, combining quantitative analyses of large-scale open-source repositories with qualitative insights from practitioner surveys. Study 1, analyzing 93 ML and 92 non-ML GitHub projects, shows that ML projects have longer build durations and lower test coverage. Study 2, surveying 155 practitioners from 47 ML projects, identifies eight main differences in CI adoption, with challenges such as test complexity, infrastructure demands, data handling, and dependency management. Study 3, based on responses from 450 practitioners across a diverse set of open-source projects, establishes a baseline for how CI affects pull request (PR) delivery time, finding that CI streamlines review and quality control but does not necessarily accelerate PR delivery. Study 4, analyzing 27 ML and 31 non-ML projects, reveals that ML projects have significantly longer delivery times and PR lifetimes, receive fewer PRs per release, reject a smaller proportion, have higher merge-to-reject ratios, and follow slower release cadences, about one release every eight months versus every four to five months in non-ML projects. Overall, while core CI principles remain relevant, ML projects require tailored practices, such as tracking model performance metrics, prioritizing test execution, and improving dependency management. The findings highlight the need for standardized guidelines to address these challenges and strengthen CI workflows in ML. By integrating quantitative data and practitioner insights, this thesis advances the understanding of CI in ML, paving the way for more effective and robust CI strategies in the ML domain.


MEMBROS DA BANCA:
Presidente - 1644456 - UIRA KULESZA
Interno - 1678918 - NELIO ALESSANDRO AZEVEDO CACHO
Externo ao Programa - 2180207 - ITAMIR DE MORAIS BARROCA FILHO - UFRNExterno à Instituição - DANIEL ALENCAR DA COSTA - UO
Externo à Instituição - FILIPE ROSEIRO COGO
Externo à Instituição - GUSTAVO HENRIQUE LIMA PINTO - UFPA
Notícia cadastrada em: 01/09/2025 14:52
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2026 - UFRN - sigaa02-producao.info.ufrn.br.sigaa02-producao