Desktop Renoir has unexpectedly low memory access latencies

When AMD released the first generation of Ryzens (1000) in 2017, there was room for improvement, especially in terms of latencies in accessing memory (partly also L3 cache and L2 cache). In the top model Ryzen 7 1800X, the latency with the cheapest ~ DDR4-2133 memories reached over 100 ns, with a slightly decent DDR4-2400 around 95 ns and when using better DDR4-2667 around 85 ns. In the second generation, Ryzen 7 2700X, AMD worked on the latencies and when using DDR4-2667 it got from ~ 85 ns to ~ 75 ns, which was quite well known in some types of loads. This value could no longer be criticized too much, because Intel did not offer a lower one on the then processors with a similar number of cores (HEDT segment).

With the advent of the chiplet architecture, AMD further improved the controller, but these improvements were partially offset by a slight increase in latency due to the interface between the processor and central chiplet. Ryzen 7 3800X (if we remain for clarity in eight-core models) reached according to the memory in the range of 70-80 ns. Using the higher quality / faster memories that the new controller in the central chiplet supported, it was possible to get in the range of about 65-70 ns.

The mentioned latency compensation of the chiplet design consisted, among other things, in increasing the capacity of the L3 cache to up to 32 MB for an eight-core configuration. When then the first reports appeared that the 7nm APU Renoir will be equipped with 8MB, ie 4x lower cache capacity, there were concerns that it would not have a very significant impact on processor performance. These were partly dispelled by the first tests of the mobile version and partly by the argument that the L3 cache capacity of chiplet Ryzens on average compensates for design latency rather than generating some performance. Sure, there will be a load that will better fit a processor with 32MB L3 cache and higher latencies, but there will also be a load that will better fit a processor with 8MB L3 cache with lower latencies. The question was, how much lower would they actually be?

The answer is given by the first test of a 35W sample of the Ryzen 7 4700GE model. He doesn’t have the final clock frequency set, but that doesn’t matter. More importantly, it answers the question posed above: It achieves a latency of 47.6 ns when using DDR4-4333, while the chiplet Ryzen reaches about 76 ns with such fast memories.

47.6 ns is an excellent result (latency 2 × lower than the first generation of Ryzens) and it is not so surprising that at such values ​​AMD did not want to increase (increase) the core with a higher capacity L3 cache.

In terms of further development, we know that AMD is the third generation Zen plans L3 cache optimization. Currently, there are two L3 cache partitions in an eight-core configuration, one for each of the four cores. Zen 3 these sections will be merged, which should theoretically bring some improvements. First, higher immunity to the nonsense of the Windows scheduler, which tends to shuffle the running of a single-threaded task between different physical cores, as a result of which it is necessary to move the contents of the L3 cache to the core to which Windows will move the load. With a single L3 for all cores, these shifts will not be as painful (in terms of power loss). The second advantage should be the more efficient use of a single large cache. For example, if one application runs on two cores but requires the same L3 cache capacity to run efficiently as another that runs in parallel on eight cores, the configuration will be Zen 3 more suitable than u Zen 2. The third advantage should be games. The situation where a kernel needs to access the contents of the L3 cache belonging to another kernel is one of the last major reasons that leave Intel a few percent performance advantage in games (the other is a slightly higher boost for a single kernel or a low number of kernels in games that do not use).