How about the performance of the GeForce RTX 4090 and the quality of DLSS 3.0

Is the performance final, or is Nvidia keeping a margin?

Among other things, Nvidia published the performance graph below, which you could see in our article after the official event. The chart has drawn some criticism pointing to its lack of clarity, such as mixing DLSS and non-DLSS performance, mixing released and unreleased games (or unreleased versions of them), and the like. Nvidia’s statement that DLSS and raytracing were used where possible does not give the user a clear answer as to where and what was used, as it is not clear for unreleased games/builds whether DLSS or raytracing can be turned on in the version that Nvidia tested.

It’s safe to say that neither was used in the first three games: Resident Evil Village, Assassin’s Creed Valhalla, and The Division 2. So we’re definitely evaluating the increase in hardware performance here, not the effect of the new version of DLSS 3.0 capable of, among other things, every rendered frame complement one “made up” by artificial intelligence.

In this trio of games, the GeForce RTX 4090 averages 57% more performance than the GeForce RTX 3090 Ti (which would equate to about 72-73% performance over the GeForce RTX 3090). This raises questions as to why a chip with a 2.7x higher transistor budget (76.3 billion versus 28.3 billion) is combined with half the clock frequency (2520 MHz versus 1695 MHz) and 16x larger L2 cache (96 MB vs. 6 MB) only 73% faster.

There are essentially three types of opinions or explanations. First it is based on the hypothesis that Nvidia keeps space and, despite the official announcement, does not want to unload all the cards. Therefore, she deliberately chose games where the intergenerational shift is below average. So that high performance data does not force AMD to finalize performance-enhancing specifications (e.g. higher clocks, higher TDP, higher numbers of active stream-processors on trimmed models).

Second is essentially similar and based on reports that the GeForce RTX 4090 can be switched from the default 450W TDP mode to the 600W TDP mode, where it achieves significantly higher performance. In this case, the results are measured at 450W TDP, and the surprise should be the performance at 600W.

Third it doesn’t expect any surprises and expects the performance to actually be what we see in the graph. The transistor budget in that case fell mainly on improving RT-cores (raytracing) and Tensor Cores (AI acceleration). Even in such a case, however, due to the increase in arithmetic and texturing performance, the overall shift in game FPS should be higher. This is then explained by a possible limit. There is no consensus on what should limit performance. There are opinions (or possibilities) that data throughput, TDP (however, we would be giving considerable space to hypothesis number 2) or simply some unknown architectural element.

Whatever the reason, it would mean that the ratio of game FPS per unit of theoretical performance (arithmetic, texturing) has dropped. Here, questions arise as to whether the situation should be the opposite with regard to Shader Execution Reordering, which is supposed to increase the efficiency of the use of arithmetic units. However, there are two facts to keep in mind:

First of all, the essence of this technology is also used by Intel on Arc graphics chips (under the name Thread Sorting), and it cannot be said that this alone would be enough to achieve efficiency or performance above the level of current competing products that do not support Shader Execution Reordering (Thread Sorting). Furthermore, everything indicates that Shader Execution Reordering (similar to Threadr Sorting) is only supported at the RT-Cores level. In other words, it is only for raytracing.

The whole third explanation would then mean that Nvidia relegates classic rasterization not to the second, but to the “very secondary” track and basically does not care what performance the GPUs will achieve in games without raytracing / DLSS 3.0. Even this possibility, although surprising, does not have to be unrealistic, because there is a precedent for it. When Nvidia released a compute GPU Almoste A100:

At that time, it practically stopped dealing with standard (vector) computing power, and most of the transistors went to increasing performance and supporting matrix operation formats (Tensor Cores, AI), which Nvidia considered more important. It should be added that with a minimal shift in performance in double-precision and in normal computing load, AMD served a part of the market, which thus got room to return to the market with computing accelerators. It probably goes without saying that thanks to the Instinct accelerators, AMD was able to use the opportunity quickly.

DLSS 3.0 quality

DLSS 3.0 brings one major innovation, namely that, in addition to the classic procedure of obtaining FPS by internal rendering at a lower resolution and enlarging (reconstructing) images to the desired resolution, it also brings the generation of entire images via AI (“Frame Generation”). For each realistically rendered (+enlarged frame), another frame is generated (imagined) using AI and motion vectors. From the sketchy data so far, it seems that this will not automatically mean a doubling of FPS compared to DLSS 2.0, since the process of generating new frames also costs some (and probably not negligible) performance.

Anyway, as Nvidia’s official video itself showed, images generated with DLSS 3.0 achieve significantly worse quality than images rendered and (only) scaled up. If you scroll through the video frame by frame, you can easily note which frames were rendered and which were made up. Even on the thumbnails, you can see that moving objects are out of focus, doubled, blurred, etc. in every second frame:

Left only rendered images, right with DLSS 3.0 Frame Generation.
Top right odd (rendered) image, bottom right even (imaginary) image.
For full resolution, open in a new panel/window and click.

The result is thus reminiscent of functions once promoted by TV manufacturers, when the TV processor was supposed to be able to create a smoother 60FPS video corresponding to the refresh rate of the screen from a 24/25/30FPS video by reconstruction. The results were often of dubious visual quality. In the demo above, the main problem will be that sharp and blurry images alternate, so a visually disturbing result cannot be ruled out at lower FPS. So we can only hope that Nvidia will be able to fine-tune these shortcomings by the October release. For now, however, the possibility cannot be ruled out that Nvidia considers this to be the final state and is not yet able to achieve anything better.

Source: by

*The article has been translated based on the content of by If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!