Assessing iSklearn for automated machine learning applied to natural language processing
Automated machine learning; algorithm configuration; natural language processing; deep learning; transfer learning.
Automated machine learning (AutoML) has been gaining importance both in academia and industry and proving to be an important approach so that non-experts in the field can extract useful information from data. Some of these tools are powered by algorithm configurators that have proven to be efficient, among which is irace. In this work, we assess iSklearn, the first AutoML tool based on algorithm configuration to use irace as a configurator, specifically addressing the natural language processing (NLP) domain. To do so, we apply this tool to common datasets in the NLP field and compare it with a benchmark obtained with the scikit-learn library, both using standard ML algorithms and one of the most popular AutoML tools (Auto-sklearn). In addition, we analyze the effects of alternative configurations and how much iSklearn can benefit from the use of transfer learning to get closer to the state-of-the-art in NLP. Preliminary results demonstrate that iSklearn is capable of generating competitive models w.r.t. Auto-sklearnfor the NLP field.