Banca de QUALIFICAÇÃO: DIEGO SOARES DOS SANTOS

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
DISCENTE : DIEGO SOARES DOS SANTOS
DATA : 28/03/2018
HORA: 14:00
LOCAL: B109
TÍTULO:

A machine learning distributed platform for big data: a case study applied to the Tax Office of Rio Grande do Norte


PALAVRAS-CHAVES:

Text Mining, Machine Learning, Big Data, Data Warehouse


PÁGINAS: 75
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
SUBÁREA: Sistemas de Computação
RESUMO:

The volume of data stored and accessed daily is growing on a geometric scale. About 2.5 billion bytes (2.5 billion gigabytes) are generated every day. In addition, 90% of the world's data has been produced in the last two years. Many terms have been used to describe this giant volume of stored data in a structured or unstructured way. Big Data is one of these terms. According to researchers, Big Data is the phenomenon where data is produced in various formats, and stored by a large number of devices and equipment. A lot of effort has been put in terms of offering open source tools and frameworks which can handle, or offer features that can handle this huge amount of data. However, since the nature of data is quite diverse, choosing or developing appropriate tools to deal with such data becomes a non trivial problem. In addition, few tools provide packages or libraries of Machine Learning techniques. This lack of proper packages or libraries makes difficult to analyze data with very specific characteristics, such as the description of a product (brand-name, type, etc), mainly because this type of attribute is totally flexible and without validation. For this reason, in certain problem domains, it is necessary to use Machine Learning techniques in free-text attributes in order to extract standard values from it. The main objective of this work is to propose a machine learning distributed platform for the Tax Department of Rio Grande do Norte able to store, access, manipulate and analyze large data volumes. In addition, the characteristics of the data to be manipulated (electronic invoices) require the use of machine learning techniques that can extract data from textual attributes.


MEMBROS DA BANCA:
Presidente - 4351681 - JOAO CARLOS XAVIER JUNIOR
Interno - 2859562 - LEONARDO CESAR TEONACIO BEZERRA
Externo ao Programa - 1363515 - ANDRE MAURICIO CUNHA CAMPOS
Externo à Instituição - ALBERTO SIGNORETTI - UERN
Notícia cadastrada em: 12/03/2018 16:20
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa13-producao.info.ufrn.br.sigaa13-producao