The Proposal of an Automated Process of Inclusion of New Instances in Semi-Supervised Learning Algorithms
Machine Learning, Semi-supervised Learning, Self-Training.
Machine learning is a field of artficial inteligence that is dedicated to the study and
development of computational techniques which obtain knowledge through acumulated
experiences. According to the nature of information provided, machine learning was inicially
divided into two types: supervised and unsupervised learning. In supervised learning,
the data used in training have labels, while in the unsupervised learning the instances
to be trained have no labels. Over the years the academic community started studying
the third type of learning that is regarded as the middle ground between supervised and
unsupervised learning and is known as semi-supervised learning. In this type of learning,
most of training set labels are unknown, but there is a small part of data that has known
labels.The semi-supervised learning is attractive because of its potential to use labeled
and unlabelled data to achieve better performance than supervised learning. This paper
consists of a study in the field of semi-supervised learning and implements changes on
the self-training and co-training algorithms. In the literature, it´s common to develop
researches that change the structure of these algorithms, however, none of them propose
some variation in the rate of inclusion of new instances in the labeled data set, which is
the main purpose of this work. In order to achieve this goal, three methods are proposed:
FlexCon-G, FlexCon e FlexCon-C. The main diference between this methods is: 1) In the
way that they perform the calculation of a new value for the minimum confidence rate
to include new patterns and 2) The strategy used to choose a label of each instance. In
order to evaluate the proposed methods, we will performed experimentations on 30 datasets
with diversified characteristics. The obtained results indicate that the three proposed
methods perform better than original self-training and co-training methods in most cases.