Portal de Programas de Pós-Graduação (UFRN)

SIGAA - Sistema Integrado de Gestão de Atividades Acadêmicas

PPgSC/UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃO ADMINISTRAÇÃO DO CCET Phone: (84)3342-2225/115 E-mail: ppgsc@ppgsc.ufrn.br https://posgraduacao.ufrn.br/ppgsc

Banca de QUALIFICAÇÃO: THALES AGUIAR DE LIMA

Uma banca de QUALIFICAÇÃO de DOUTORADO foi cadastrada pelo programa.
STUDENT : THALES AGUIAR DE LIMA
DATE: 07/07/2022
TIME: 09:30
LOCAL: https://shu.zoom.us/my/profmarjory
TITLE: An investigation of accent inclusion in Brazilian Portuguese Speech

KEY WORDS:

Speech biometrics, accent inclusion, Brazilian Portuguese, speech corpus, dataset

PAGES: 56
BIG AREA: Ciências Exatas e da Terra
AREA: Ciência da Computação
SUBÁREA: Metodologia e Técnicas da Computação
SPECIALTY: Processamento Gráfico (Graphics)
SUMMARY:

The use of artificial intelligence is becoming increasingly present in people’s lives, even
if not always noticeable. While the majority of speech technologies have achieved high
accuracy, they fail when tested for accents that deviate from the “standard” of a language.
This becomes more crucial for Brazilian Portuguese, given its lack of resources for properly
developing such systems. The excluding behaviour of speech systems and the lack of
resources, has inspired the objectives of this work. First, to explore news ways for Accent
Conversion for this language using a light-weight model called Sparse Anchor-Based
Representation of Speech with Residual Information (SABr+Res), which should convert
from paulista to nordestino. Second, to collect and release the largest speech dataset for
Brazilian Portuguese to the date. The dataset leverages the availability of public audio
and individuals in video platforms. The TEDx Talks posts a reliable environment for clean
speech from such persons, and therefore this work collects automatically the data, while
manually annotating the demographic information required for the first objective of this
work and also for other possible speech related tasks. With a current validation of 18.7%
it has 110 hours of speech from 520 audios and approximately 515 unique speakers. The
dataset already covers 21 out of the 27 Brazilian states, making the TEDx Talks Brazilian
Accents the most inclusive and representative dataset for the language.

COMMITTEE MEMBERS:
Presidente - 2524467 - MARJORY CRISTIANY DA COSTA ABREU
Interno - 2177445 - BRUNO MOTTA DE CARVALHO
Externo à Instituição - MARCOS ANTONIO SIMPLICIO JUNIOR - USP

Notícia cadastrada em: 30/06/2022 10:27