Exploiting Quantization and Pruning in Deep Learning to Identify Pareto-optimal Trade-off for Accuracy and Energy Consumption
Deep Learning, Software Energy, Graphics Processing Unit, Pareto Frontier
The tradeoff between accuracy and energy consumption has become relevant in the Deep Neural Network area. In the inference mode, energy consumption is strongly related to the use of memory by the network. Aware of this problem, some recent studies have applied compaction techniques to the models to reduce energy consumption. Quantization and pruning are the main techniques, which work by reducing the size of the models to provide a reduction in energy consumption. But as a consequence reducing the network accuracy too. In this context, knowing the compress technique impacts the energy consumption reduction and, at the same time cause, a less impact on the network accuracy becomes relevant. To exploit these limits is not trivial due deal the big data to build a solution space. Our approach uses a new architecture based on the container to manage the collecting storing of the GPU data counters, especially power draw measurement. Our research space is the Pareto Frontier of the pair accuracy/energy.