Futuristic autonomous vehicle navigating a virtual cityscape, symbolizing GPU scheduling and virtualization.

Unlocking the Secrets of GPU Scheduling: How Virtualization is Revolutionizing Autonomous Driving

Avery Sinclair in Tech & Innovation August 2025 • 4 min read.

"Dive into the inner workings of NVIDIA's GPU scheduling on Drive PX platforms and explore how virtualization can enable real-time performance in autonomous vehicles."

The race to fully autonomous vehicles is fueled by advanced computing platforms, with Graphics Processing Units (GPUs) at the forefront. These GPUs offer the massively parallel processing power required for complex tasks such as real-time object detection, path planning, and sensor fusion. To ensure the safety and reliability of autonomous driving systems, it's critical to have GPU scheduling approaches that provide strong real-time guarantees. This means ensuring that critical tasks are completed within strict time constraints, regardless of other system activities.

Previous research has focused on reverse engineering the GPU ecosystem to understand and control GPU scheduling on NVIDIA platforms. However, this article offers an in-depth look at NVIDIA's standard approach to GPU application scheduling on a Drive PX platform, providing valuable insights into the inner workings of this complex system. Furthermore, we'll explore how a privileged scheduling server can be used to enforce custom scheduling policies in a virtualized environment, opening up new possibilities for real-time GPU performance.

Advanced Driver-Assistance Systems (ADAS) rely heavily on integrated GPUs, shared across various applications with different timing needs. We'll examine NVIDIA's GPU scheduling approach for graphic and compute tasks on the Drive PX-2 'AutoCruise' platform. This board features a Tegra Parker SoC with an exa-core CPU and an integrated GPU (gp10b), a version of the Pascal Architecture with two Streaming Multiprocessors (SMs) and 128 CUDA cores each.

GPU Scheduling: A Deep Dive

Futuristic autonomous vehicle navigating a virtual cityscape, symbolizing GPU scheduling and virtualization.

The NVIDIA GPU scheduler uses a hardware controller embedded within the GPU, called the 'Host.' This component dispatches work to GPU engines (Copy, Compute, Graphics) in a round-robin manner, asynchronously and parallel to the CPU. The Host scheduler manages channels, which are independent streams of work for user-space applications. These channels are transparent to programmers, who use APIs (CUDA, OpenGL) to specify GPU workloads.

Workloads consist of sequences of GPU commands inserted into a Command Push Buffer, a memory region written by the CPU and read by the GPU. Channels are linked to application's Command Push buffers. Each channel has a timeslice value for timesharing the GPU. Context switches occur when a channel's work is done or its timeslice expires. The Host then dispatches work from the next channel on a list called the runlist.

Timeslice Length: The duration a channel can execute before preemption.
Interleaving Level: The number of times a channel appears in the runlist.
Preemption Policy: Determines if a channel can be preempted.
Channel establishment: Channels are set at the start of application launch.

The GPU Host uses a list-based scheduling policy, snooping each channel for work by browsing the runlist. Each application has runlist entries proportional to its interleaving level. The scheduler checks each entry for workload in the Command Push Buffer. If work exists, the channel is scheduled until completion or timeslice expiration, leading to preemption and later resumption. If no work exists, the scheduler skips to the next application's channels. An open-source version of the runlist construction algorithm is available in the NVIDIA kernel driver stack (L4T, Linux For Tegra).

The Future of GPU Virtualization

NVIDIA GPU virtualization technology enables multiple guests to run and access GPU engines via a privileged hypervisor guest, the RunList Manager (RLM). Guests interact with the RLM server for channel allocations, scheduling, memory management, and runlist construction. Future work involves modifying the GPU to RLM communication to allow the RLM to intercept command submissions, define SW scheduling policies, and enforce them by constructing runlists with scheduled application channels. This enables testing event-based approaches for stronger real-time guarantees compared to NVIDIA's interleaved scheduler. Preliminary results of an Earliest Deadline First with Constant Bandwidth Server (EDF+CBS) prototype show significant improvements in schedulability and Worst Case Response Time (WCRT).

About this Article -

This article was crafted using a human-AI hybrid and collaborative approach. AI assisted our team with initial drafting, research insights, identifying key questions, and image generation. Our human editors guided topic selection, defined the angle, structured the content, ensured factual accuracy and relevance, refined the tone, and conducted thorough editing to deliver helpful, high-quality information.See our About page for more information.

This article is based on research published under:

DOI-LINK: 10.1109/emsoft.2018.8537220, Alternate LINK

Title: Work-In-Progress: Nvidia Gpu Scheduling Details In Virtualized Environments

Journal: 2018 International Conference on Embedded Software (EMSOFT)

Publisher: IEEE

Authors: Nicola Capodieci, Roberto Cavicchioli, Marko Bertogna

Published: 2018-09-01

Everything You Need To Know

How does NVIDIA's GPU scheduler manage and dispatch tasks, and what are the key components involved in this process?

The NVIDIA GPU scheduler utilizes a hardware controller referred to as the 'Host,' which is embedded within the GPU. This Host dispatches tasks to GPU engines (Copy, Compute, Graphics) asynchronously and in parallel with the CPU, using a round-robin approach. It manages channels, which are essentially independent streams of work for user-space applications. Programmers interact with these channels through APIs like CUDA and OpenGL, specifying GPU workloads without directly managing the channel mechanics. Context switches occur when a channel's work is completed, or when its timeslice expires, prompting the Host to dispatch work from the next channel on the runlist.

How does NVIDIA GPU virtualization work, and what role does the RunList Manager (RLM) play in managing GPU resources in a virtualized environment?

NVIDIA GPU virtualization utilizes a privileged hypervisor guest, the RunList Manager (RLM), to enable multiple guests to run and access GPU engines. Guests interact with the RLM server for tasks such as channel allocations, scheduling, memory management, and constructing the runlist. Future developments aim to modify the GPU to RLM communication, allowing the RLM to intercept command submissions, define software scheduling policies, and enforce them by constructing runlists with scheduled application channels. This enables the testing of event-based scheduling approaches, such as Earliest Deadline First with Constant Bandwidth Server (EDF+CBS), to provide stronger real-time guarantees.

What are the main parameters that affect GPU scheduling on NVIDIA Drive PX platforms, and how do they influence the execution of GPU workloads?

The key parameters that influence GPU scheduling include the timeslice length, which determines how long a channel can execute before preemption; the interleaving level, which dictates how many times a channel appears in the runlist; and the preemption policy, which determines whether a channel can be interrupted during its execution. The establishment of channels at the start of application launch also influences scheduling.

What are the hardware specifications of the Drive PX-2 'AutoCruise' platform, and how does its integrated GPU contribute to Advanced Driver-Assistance Systems (ADAS)?

The Drive PX-2 'AutoCruise' platform features a Tegra Parker System on Chip (SoC) with an exa-core CPU and an integrated GPU (gp10b). The GPU is a version of the Pascal Architecture, equipped with two Streaming Multiprocessors (SMs), each containing 128 CUDA cores. NVIDIA's GPU scheduling approach on this platform manages both graphic and compute tasks, playing a critical role in Advanced Driver-Assistance Systems (ADAS) applications with diverse timing requirements.

What are the potential benefits of using alternative scheduling approaches like Earliest Deadline First with Constant Bandwidth Server (EDF+CBS) in virtualized GPU environments, particularly in terms of real-time guarantees?

Preliminary results from using an Earliest Deadline First with Constant Bandwidth Server (EDF+CBS) prototype show improvements in schedulability and Worst Case Response Time (WCRT) in comparison to NVIDIA's interleaved scheduler. EDF+CBS represents an event-based approach for stronger real-time guarantees by intercepting command submissions and enforcing custom scheduling policies, unlike the native NVIDIA interleaved scheduler.