Investigating fuzzy methods for multilingual speech recognition
speech recognition, speech-to-text, speaker recognition, signal processing, multilingual speech systems, Portuguese, English, Mandarin.
Speech is a crucial ability for humans to interact and communicate. Speech-based technologies are becoming more popular with speech interfaces, real-time translation, and budget healthcare diagnosis. Thus, this work aims to explore an important but under-investigated topic on the field: multilingual speech recognition. We employed three languages: English, Brazilian Portuguese, and Mandarin. To the best of our knowledge, those three languages were not compared yet. The objectives are to explore Brazilian Portuguese in comparison with the other two more well-investigated languages, by verifying speaker recognition robustness in multilingual environments, and further investigate fuzzy methods for both speaker identification and text-to-speech translation. We have performed an analysis for speaker recognition using log-Energy, 13-MFCCs, Deltas, and Double Deltas with four classifiers. The closed-set text-independent speaker identification results indicated that this problem presents some robustness on multilingual environments, since adding a second language, it degrades the accuracy by 5.45%, and 5.32% for a three language dataset using an SVM classifier. Then, we have proposed a methodology to achieve a language-independent model using Bottleneck features, an Adaptive-Network-Based Fuzzy Inference System, along with a data-driven universal phoneme set.