Air Learning

In recent years, machine learning based approaches for autonomous robot tasks like navigation, pick-n-place, etc., have gained traction. Thanks to advances in deep learning it has become possible to train complex policies and architectures that can replace the traditional perception, planning and control pipelines that are typically found in most robot systems. But there are many open research challenges in this field, such as bridging the simulation to reality gap, understanding how to improve training, developing new power and energy-efficient policies so that machines can operate within limited energy budgets, and so forth.

Our group specializes in addressing many of these different concerns, particularly from a systems' and engineering perspective. For instance, we develop simulators for exploring end to end learning on resource-constrained platforms. Also, we focus on hardware and software co-design, such as designing energy efficient policies that are customized and tuned to the characteristics of the underlying resouce-constrained hardware. Overall, our goals are to help engineer and design the systems that can minimize the training time, understanding the computational bottlenecks to optimize the system, etc. as a whole.

Publications

S. Krishnan, B. Boroujerdian, W. Fu, A. Faust, and V. J. Reddi, “Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots,” Springer Machine Learning Journal, no. Special Issue on Reinforcement Learning for Real Life, Forthcoming. arXiv VersionAbstract
We introduce Air Learning, an AI research platform for benchmarking algorithm-hardware performance and energy efficiency trade-offs. We focus in particular on deep reinforcement learning (RL) interactions in autonomous unmanned aerial vehicles (UAVs). Equipped with a random environment generator, AirLearning exposes a UAV to a diverse set of challenging scenarios. Users can specify a task, train different RL policies and evaluate their performance and energy efficiency on a variety of hardware platforms. To show how Air Learning can be used, we seed it with Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) to solve a point-to-point obstacle avoidance task in three different environments, generated using our configurable environment generator. We train the two algorithms using curriculum learning and non-curriculum-learning. Air Learning assesses the trained policies' performance, under a variety of quality-of-flight (QoF) metrics, such as the energy consumed, endurance and the average trajectory length, on resource-constrained embedded platforms like a Ras-Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 79.43% longer trajectories in one of the environments. To understand the source of such differences, we use Air Learning to artificially degrade desktop performance to mimic what happens on a low-end embedded system. QoF metrics with hardware-in-the-loop characterize those differences and expose how the choice of onboard compute affects the aerial robot's performance. We also conduct reliability studies to demonstrate how Air Learning can help understand how sensor failures affect the learned policies. All put together, Air Learning enables a broad class of RL studies on UAVs. More information and code for Air Learning can be found here.
S. Krishnan, B. Boroujerdian, A. Faust, and V. J. Reddi, “Toward Exploring End-to-End Learning Algorithms for Autonomous Aerial Machines,” Workshop Algorithms And Architectures For Learning In-The-Loop Systems In Autonomous Flight with International Conference on Robotics and Automation (ICRA). 2019.Abstract

We develop AirLearning, a tool suite for endto-end closed-loop UAV analysis, equipped with a customized yet randomized environment generator in order to expose the UAV with a diverse set of challenges. We take Deep Q networks (DQN) as an example deep reinforcement learning algorithm and use curriculum learning to train a point to point obstacle avoidance policy. While we determine the best policy based on the success rate, we evaluate it under strict resource constraints on an embedded platform such as RasPi 3. Using hardware in the loop methodology, we quantify the policy’s performance with quality of flight metrics such as energy consumed, endurance and the average length of the trajectory. We find that the trajectories produced on the embedded platform are very different from those predicted on the desktop, resulting in up to 26.43% longer trajectories.

Quality of flight metrics with hardware in the loop characterizes those differences in simulation, thereby exposing how the choice of onboard compute contributes to shortening or widening of ‘Sim2Real’ gap.

T. - W. Chin, C. - L. Yu, M. Halpern, H. Genc, S. - L. Tsao, and V. J. Reddi, “Domain-Specific Approximation for Object Detection,” IEEE Micro, vol. 38, no. 1, pp. 31–40, 2018. Publisher's VersionAbstract

In summary,

our contributions are as follows: • We investigate DSA and characterize the effectiveness of category-awareness. • We conduct a limit study to understand the benefit of applying approximation in a perframe manner with category-awareness (category-aware dynamic DSA). • We present the challenges of harnessing DSA and a proof-of-concept runtime.