Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime
Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches like super resolution, denoising, and neural rendering help real-time engines work more efficiently, offering new creative possibilities while keeping performance in mind. Unreal Engine 5 (UE5) has taken several steps in this direction with the introduction of the Neural Network Engine (NNE), which serves as an abstraction layer that unifies inference workloads across multiple backends. Developers can use various runtimes on a GPU or fall back to a CPU depending on available hardware for seamless integration of neural network features in real-time graphics workflows. This blog post covers the new plugin that adds NVIDIA TensorRT for RTX as an NNE runtime option (NNERuntimeTRT) for efficient inferencing on NVIDIA RTX GPUs. To show its benefits, I’ll use a simplified UE project that runs a post-process AI model to highlight gains over other GPU runtimes, like DirectML. First, let’s briefly discuss the different components involved in the project. TensorRT for RTX Overview TensorRT for RTX enables users to deploy AI models on RTX GPUs more efficiently. It uses a Just-In-Time (JIT) optimizer within the runtime to generate inference engines tailored to the user’s GPU. This compilation occurs once on the user’s machine and optimizes the model for their specific hardware. As a result, TensorRT for RTX can offer higher throughput compared to default execution providers. For example, throughput comparisons across various models show improvements when using TensorRT for RTX versus DirectML, as measured on an NVIDIA GeForce RTX 5090 GPU. TensorRT for RTX is only compatible with NVIDIA RTX GPUs, from the Turing generation (compute capability 7.5) up to the NVIDIA Blackwell generation (compute capability 10.0). Unreal Engine neural network engine overview NNE supports multiple runtimes for invoking inferencing tasks and choosing between CPUs and GPUs. TensorRT for RTX is for GPUs and this overview focuses on NNE GPU runtimes. NNE can run inference on the GPU, either: - Synchronously from the CPU, requiring memory synchronization. - Asynchronously through the Render Dependency Graph (RDG), aligning with frame rendering. The synchronous method works well for editors and event-based inference tasks like LLMs, where copying data between host and device is not a concern. In contrast, RDG ties model evaluation to rendering resources, making it ideal for AI post-processing, upscaling, or denoising. The NNE TensorRT for RTX plugin supports both GPU and RDG methods, offering flexibility for various AI applications such as rendering, animation, language, and speech while maintaining strong performance on consumer-grade devices. The style transfer post-processing sample project I built a basic UE5 project to test the NNE TensorRT for RTX plugin, which applies style transfer models during post-processing. For testing, I set up a simple level using a few basic primitives and a fixed camera to keep the visuals consistent while switching between DirectML and TensorRT, making it easier to compare both results and performance. Prerequisites While the project is nearly ready for use, having experience with UE5, post-process materials, and…

