Machine learning (ML) is on the rise. ML performance is strongly dependent upon three fundamental cornerstones: ML models, ML software, and ML hardware. Machine learning software (frameworks and runtimes) are the glue that holds ML models and ML hardware together, and that's the focus area under this research thrust. ML models are written in high-level frameworks like TensorFlow, PyTorch, and MXNet, and executed using high-performance libraries that are tuned to the characteristics of the underlying hardware. The performance of these frameworks, be it for training or inference, matters quite a bit; performance here is loosely defined as execution time, as well as their efficiency in terms of power consumption (especially on mobile devices), and resource consumption (e.g., memory on a microcontroller).
We are interested in optimizing ML runtime systems. Optimizations include looking at how ML should be served to end-users, so that it meets the needs of ML customers (e.g., using approaches such as ensembles to reduce cost), as well as tweaking the runtimes so that the frameworks are better optimized to run on the hardware they are targetting.