DLSS 3 vs DLSS 2 vs Native

When NVIDIA unveiled the GeForce RTX 4000 series graphics cards as the big announcement of the GTC 2022 GeForce Beyond special broadcast, it was immediately clear that DLSS 3 played a key role in achieving the unprecedented performance leap (2x-4x) claimed by NVIDIA.

Almost all benchmarks shared by the manufacturer included the new DLSS 3 technology, and the few that showed no performance gains over the GeForce RTX 3000 series that were more in line with what we’ve come to expect from next-generation graphics Cards.

Now that the GeForce RTX 4090, the flagship GPU (at least until the inevitable Ti model) and also the first of the brand new Ada Lovelace architecture to be launched, has been in the hands of reviewers for a while, we’ve been able to verify how much DLSS 3 the performance improves. But first, let’s take a look at what’s behind the hood.

The new GeForce RTX graphics cards feature fourth-generation Tensor Cores, which include a new 8-Bit Floating Point (FP8) Tensor Engine, increasing throughput by up to 5x to an estimated 1.32 Tensor-petaFLOPS on the RTX 4090.

However, with DLSS 3, NVIDIA goes a step further than DLSS Super Resolution. There is now a new DLSS Frame Generation convolutional autoencoder that itself generates a full frame based on optical flow fields calculated with the Optical Flow Accelerator.

Optical flow accelerators have been available in NVIDIA GPUs since the Turing architecture. However, as previously explained by VP of Applied Deep Learning Research Bryan Catanzarothe new graphics cards are equipped with a significantly faster and more advanced version of the OFA, therefore DLSS 3 is currently exclusive to GeForce RTX 4000 graphics cards.

The generated frame is located between frames reconstructed with DLSS Super Resolution. As such, NVIDIA claims that in every two frames, only one-eighth of the pixels rendered were rendered normally, while the rest were reconstructed between Super Resolution and Frame Generation, delivering massive improvements in frame rate.

To account for the increased latency caused by Frame Generation, NVIDIA has embedded its latency-reducing Reflex technology to ensure that the response remains optimal.

Our Hassan has the GeForce RTX 4090 . be able to test featuring all of the DLSS 3 compatible games NVIDIA has shared with reviewers. He chose the quality preset (with 4K resolution, of course) because he felt the new graphics card would run most games fast enough that it wouldn’t make sense to lower the base display resolution by lowering the DLSS presets.

First up is CD Projekt RED’s Cyberpunk 2077, the last game to have previously used the studio’s internal Red Engine the move to Unreal Engine 5. Please note that the Cyberpunk 2077 build did not include the upcoming Ray Tracing Overdrive Mode, which was also announced during the GeForce Beyond broadcast. Overdrive mode adds advanced, taxing ray tracing techniques such as RTX Direct Illumination, full-resolution reflections, and multi-bounce indirect lighting. NVIDIA estimates it will reduce performance by about 51 FPS at 4K with DLSS 3, although it may also absorb the hit better than DLSS 2.

However, with the current game, DLSS 3 only improved the average FPS by 16.1% and the frame rate of one percentile by 15.3% over DLSS 2.

Then, one of the first games to be publicly released with DLSS 3 support, Asobo Studios A Plague Tale: Requiem (expected next week – look forward to our review soon). Powered by Unreal Engine 4, A Plague Tale: Requiem includes updated technology which can support a much larger number of rats compared to the original game, as well as improved dynamic lighting. The final version will also include some form of ray tracing, but the tested build did not.

In this case, DLSS 3 offers a 29% performance improvement over DLSS 2 in average FPS and a 39.1% improvement in one percentile frame rate. However, the boost will likely be greater if ray tracing is enabled.

Powered by the EGO Engine 4.0, Codemasters’ F1 22 is by far the least taxing of all the games tested and delivers the fastest frame rate even with the ray tracing option enabled.

As such, in this year’s edition of the officially licensed Formula 1 game, DLSS 3 can only further increase the average FPS by 20.5% and the minimum FPS by 22.4%.

The real power of DLSS 3 can be seen in Microsoft Flight Simulator. While DLSS 2 couldn’t improve CPU bound games in any meaningful way, the most important part of DLSS 3’s new version, Frame Generation, is completely independent of CPU bottlenecks.

As such, there is a massive 106% increase in average FPS and an even greater 115% improvement in minimum FPS over the DLSS 2 implementation.

NVIDIA’s last DLSS 3 test was the wonderful Unity Engine Enemies tech demo, originally showcased at GDC 2022. However, in this case we couldn’t make a direct comparison with DLSS 2 because it wasn’t available as an option in the demo. Compared to native rendering, DLSS 3 offers an average FPS increase of 235% and a boost of 319% in a frame rate of one percentile.

Overview

As NVIDIA pointed out during the technology presentation, DLSS 3 can really improve performance during CPU-bound scenarios like Microsoft Flight Simulator and in the most advanced ray-traced games. As such, its true potential will be unlocked with tomorrow’s games.

When tested in titles already running at very high frame rates, the boost compared to regular DLSS 2 is more limited (at least when using the Quality preset – I think the Performance and Ultra Performance preset can widen the gap). That’s mainly because the RTX 4090 is a beast in its own right, delivering significant performance gains over the top-of-the-line cards of the previous generation, even when using DLSS 2 or native rendering. If you’ve ever wanted to play games at 4K, 144+FPS with all graphics settings maxed out, RTX 4090 and DLSS 3 can easily deliver that.

As first noted in Digital Foundry’s first hands-on with the technology, the Frame Generation component can sometimes introduce artifacts. However, those are very hard to notice during normal gameplay. It’s also possible that the Frame Generation algorithm will be improved over time to reduce these glitches, just as NVIDIA did with DLSS Super Resolution.

Last but not least, I have to admit that I was most impressed with the latency measurements. During press presentations, NVIDIA engineers had hinted that the lowest latency would be obtained by a combination of DLSS 2 and Reflex instead of DLSS 3 due to the Frame Generation component. However, the data show that DLSS 3 came out on top in all cases, sometimes with a significant difference from DLSS 2 + Reflex. More testing will be needed, but it seems RTX 4000 series owners have no reason to turn off Frame Generation.

Products mentioned in this post

Leave a Reply

Your email address will not be published. Required fields are marked *