Banca de QUALIFICAÇÃO: ANGELO GUSTAVO SOUZA MARINHO MORAIS DE SALES

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
STUDENT : ANGELO GUSTAVO SOUZA MARINHO MORAIS DE SALES
DATE: 12/02/2026
TIME: 08:00
LOCAL: Online
TITLE:

COMPUTATIONAL LANGUAGE MODELS FOR MARMOSET MONKEY VOCALIZATIONS


KEY WORDS:

Computational bioacoustics; Callithrix; Language models; Deep learning; Transformer; Acoustic embeddings.


PAGES: 58
BIG AREA: Ciências Biológicas
AREA: Biologia Geral
SUMMARY:

The vocal communication of marmosets (Callithrix) stands out for its acoustic sophistication and ontogenetic plasticity, presenting structural properties that suggest the existence of a complex syntax. While bird bioacoustics already employs language models based on Deep Learning, research with marmosets still lacks tools capable of modeling the sequential and acoustic complexity of their repertoires. This dissertation investigated the structure of vocal sequences in marmosets through the development and comparison of computational language models. The study used a dataset comprising 91,086 vocalizations from 9 marmosets during their first two months of life. The methodology was divided into three phases: (I) establishing a baseline with Markov Models of orders 0 to 19; (II) applying Deep Learning architectures (RNN, LSTM, and Transformer) using categorical syllable labels; and (III) implementing generative models based on acoustic embeddings extracted via Swin Transformer from spectrograms. Evaluation was performed using Kullback-Leibler Divergence (𝐷𝐾𝐿 ) , BLEU score, and Syllable Proportion metrics. Results demonstrated that for discrete symbolic data, the 13th-order Markov Model established the best performance, outperforming neural networks which, in this scenario, suffered from mode collapse and excessive repetition. However, the introduction of acoustic embeddings reversed this scenario: the Transformer architecture fed with rich spectral characteristics achieved the best global performance, surpassing the stochastic baseline by significantly reducing 𝐷𝐾𝐿 and maintaining structural coherence in long sequences (up to 40 syllables). It is concluded that the richness of acoustic information is indispensable for modeling primate communication and that the proposed hybrid architecture (Swin Transformer + Transformer) represents a methodological advancement capable of capturing temporal dependencies and bioacoustic nuances that escape traditional approaches.


COMMITTEE MEMBERS:
Interno - 3086031 - DANIEL YASUMASA TAKAHASHI
Notícia cadastrada em: 02/02/2026 18:54
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2026 - UFRN - sigaa02-producao.info.ufrn.br.sigaa02-producao