A Threshold Flexibility Proposal for Inclusion of New Examples in the Self-Training Semi-Supervised Learning Algorithm
Machine Learning, Semi-supervised Learning, Self-Training.
Machine learning is a eld of artcial inteligence that is dedicated to the study and
development of computational techniques which obtain knowledge through acumulated
experiences. According to the nature of information provided, machine learning can be
divided into two types: supervised and unsupervised learning. In supervised learning, the
data used in training have labels, while in the unsupervised learning the instances to
be trained have no labels. Over the years the academic community started studying the
third type of learning that is regarded as the middle ground between supervised and unsupervised
learning and is known as semi-supervised learning. In this type of learning,
most of training set labels are unknown, but there is a small part of data that has known
labels.The semi-supervised learning is attractive because of its potential to use labeled
and unlabelled data to achieve better performance than supervised learning. This paper
consists of a study in the eld of semi-supervised learning and implements changes on
the self-training algorithm in order to propose some variation in the rate of inclusion of
new observations to the labeled dataset. In order to achieve this goal, several methods
are proposed, which dier in the way that they perform the calculation of a new value for
the minimum condence rate to include new patterns. In order to evaluate the proposed
methods, we will performed experimentations on 20 datasets with diversied characteristics.
The obtained results indicate that the three proposed methods perform better than
self-training method in most cases.