Slippage will be the least problem with a supercomputer called the Aurora

Intel originally promised to provide unified memory with the CXL interface, but now there has been some change.

We wrote yesterday about the Argonne National Laboratory tuning in to a Aurora that has been sliding for years with a Polaris supercomputer. The latter will not be completed this year either, but it is likely to be built by 2022, but based on this year’s Hot Chips event, there is some interesting new data on the Ponte Vecchio accelerator, one of Aurora’s important foundations.


Originally, Aurora’s main goal was to optimize scalability for 1 EFLOPS performance, which can be achieved by each chip accessing data stored in associated memory in a memory-coherent manner, eliminating direct memory copy management on the program side. In essence, it is much easier to program such a design, which is not a negligible factor at such a fast computing speed, since providing memory copies at the software level is not easy.

In the case of the Ponte Vecchio, Intel is the X based on the CXL standarde He marked a link for this purpose, and because Sapphire Rapids supports CXL 1.1, in theory, everything was in place for the chips used within a single node to access each other’s memory with hardware-implemented memory coherence.

At the Sapphire Rapids presentation, Intel also recently pointed out that the processor supports CXL 1.1, but in connection with Ponte Vecchio at this year’s Hot Chips event – the AnandTech report The answer to a question that arose was that the Xe Link has nothing to do with CXL. This is extremely strange based on the slide above, as the company used to advertise the exact opposite. In addition, in connection with another issue, it was also revealed that the accelerators are connected to the Sapphire Rapids CPU via a PCI Express 5.0 interface, which is not a memory-coherent interface. Thus, the hardware-implemented memory coherence will be realized between the accelerators, based on the new data, a modified Xe Using a link.

Of course, the unified memory image between CPUs and accelerators can only be solved at the software level, but this requires significant extra work on the program code side, and this method is far from as efficient as a hardware implementation, it was no accident Intel invented the CXL. t.

It should be noted that Polaris, intended as a kind of test bed, has a similar structure, as its AMD EPYC processors are connected to the NVIDIA A100 GPUs via PCI Express interface, but the accelerators themselves are already connected to NVLink, which provides hardware-implemented memory coherence, but CPUs and this is not the case between GPUs. Of course, the situation is more favorable for Polaris in that it is nowhere near 1 EFLOPS performance for double accuracy, so this typical design does not cause such a problem, but for Aurora, scaling performance will no longer be easy.

Source: Hírek és cikkek – PROHARDVER! by

*The article has been translated based on the content of Hírek és cikkek – PROHARDVER! by If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!