Unlocking the Secrets of GPU Scheduling: How Virtualization is Revolutionizing Autonomous Driving
"Dive into the inner workings of NVIDIA's GPU scheduling on Drive PX platforms and explore how virtualization can enable real-time performance in autonomous vehicles."
The race to fully autonomous vehicles is fueled by advanced computing platforms, with Graphics Processing Units (GPUs) at the forefront. These GPUs offer the massively parallel processing power required for complex tasks such as real-time object detection, path planning, and sensor fusion. To ensure the safety and reliability of autonomous driving systems, it's critical to have GPU scheduling approaches that provide strong real-time guarantees. This means ensuring that critical tasks are completed within strict time constraints, regardless of other system activities.
Previous research has focused on reverse engineering the GPU ecosystem to understand and control GPU scheduling on NVIDIA platforms. However, this article offers an in-depth look at NVIDIA's standard approach to GPU application scheduling on a Drive PX platform, providing valuable insights into the inner workings of this complex system. Furthermore, we'll explore how a privileged scheduling server can be used to enforce custom scheduling policies in a virtualized environment, opening up new possibilities for real-time GPU performance.
Advanced Driver-Assistance Systems (ADAS) rely heavily on integrated GPUs, shared across various applications with different timing needs. We'll examine NVIDIA's GPU scheduling approach for graphic and compute tasks on the Drive PX-2 'AutoCruise' platform. This board features a Tegra Parker SoC with an exa-core CPU and an integrated GPU (gp10b), a version of the Pascal Architecture with two Streaming Multiprocessors (SMs) and 128 CUDA cores each.
GPU Scheduling: A Deep Dive

The NVIDIA GPU scheduler uses a hardware controller embedded within the GPU, called the 'Host.' This component dispatches work to GPU engines (Copy, Compute, Graphics) in a round-robin manner, asynchronously and parallel to the CPU. The Host scheduler manages channels, which are independent streams of work for user-space applications. These channels are transparent to programmers, who use APIs (CUDA, OpenGL) to specify GPU workloads.
- Timeslice Length: The duration a channel can execute before preemption.
- Interleaving Level: The number of times a channel appears in the runlist.
- Preemption Policy: Determines if a channel can be preempted.
- Channel establishment: Channels are set at the start of application launch.
The Future of GPU Virtualization
NVIDIA GPU virtualization technology enables multiple guests to run and access GPU engines via a privileged hypervisor guest, the RunList Manager (RLM). Guests interact with the RLM server for channel allocations, scheduling, memory management, and runlist construction. Future work involves modifying the GPU to RLM communication to allow the RLM to intercept command submissions, define SW scheduling policies, and enforce them by constructing runlists with scheduled application channels. This enables testing event-based approaches for stronger real-time guarantees compared to NVIDIA's interleaved scheduler. Preliminary results of an Earliest Deadline First with Constant Bandwidth Server (EDF+CBS) prototype show significant improvements in schedulability and Worst Case Response Time (WCRT).