A process for evaluating machine learning models in healthcare.
ML Evaluation; MLOps; Continuous Monitoring; Continual Learning; Feedback loop.
Several works in the literature address how Machine Learning (ML) can support and advance work in the most varied areas of knowledge, like games, financial management, computer vision, and biological data analysis, to name a few. Even though it is an active research sub-area of Information Technology that has been around for decades, since the 1950’s, results had been meager until very recently. Fostered by hardware and software advances and the new-found usage of Graphic Processors (GPUs), capable of performing teraflops of small arithmetic operations in parallel, ML started to ramp up exponentially. In such scenario, it has spread to diverse areas of application, from Marketing to Healthcare, from Autonomous Vehicles to Robotics, from Natural Language Processing to Computer-generated Art. Though generally restricted to controlled-space experiments, results have been outstanding, which gave rise to ML popularity to the extent of being hard to find some area of human knowledge not approached by Machine Learning. Inside Machine Learning there is a discipline still in its infancy, called Machine Learning Operations (or MLOPS, for short). MLOPS concerns itself with the management of ML Models’ lifecycle, from conception and experimentation to deployment in production (real world) environments, which also includes concerns of what happens after ML models are faced with real-world scenarios of use as well as how to monitor it’s real-world performance. Once deployed, models are subject to performance decay issues, such as drift, which has motivated recent studies on continual learning and Continuous Monitoring of Machine Learning Model Performance. This works focuses on identifying state-of-the-art techniques for evaluating model fitness in real-world usage scenarios, and on how to establish a feedback loop to incorporate the ability to account for changes in ML models’ lifecycle management. Finally, the present work aims to apply validation techniques in a case study of ML models applied to Healthcare, and to establish a process for evaluating models. The target models were developed as part of the Remote Assistance Platform (PAR), and is currently in effective use in an oncologic ICU.