After the Instinct MI25, MI50 / 60, MI100 and MI200 comes the Instinct MI300. Despite the name not suggesting any change in concept and giving the impression of a simple next generation, this series comes with several changes that put it in a different category than all its predecessors.
With the MI300, the Instinct series stops being just a compute / GPGPU accelerator and becomes a complete solution that integrates the processor, accelerator and system memory in one case. So basically it is more than an APU / SoC, the MI300 can be described as a SiP (System in Package).
The MI300A, as the variant for the El Capitan supercomputer is called, consists of four main chiplets produced on TSMC’s 5nm process. One integrates 24 cores Zen 4then three accelerators with CDNA 3 architecture.
Because their performance would not be enough even for eight HBM3 chips with a data throughput of over 6.5 TB/s and a common cache integrated in the chiplet (MI250X had a 16MB L2 cache), the 6nm substrate carrying the chiplets integrates the 4th generation Infinity Fabric bus to interconnect the chiplets and cases , so large-capacity Infinity Cache.
AMD has not stated the specific value yet (it can be assumed that it wants to keep something for the HotChips event in August), but I think it is safe to say that compared to the 16MB cache of the MI250X, it will be at least an order of magnitude difference.
The entire case contains silicon composed of 146 billion transistors, almost double that of Nvidia Hopper H100 and almost half as much as Intel Old Bridge / Xe-HPC / Max Series 1550 GPU. The solution is built on a unified memory architecture, so the 128 GB of integrated HBM3 is fully shared by both processor and GPGPU cores (there is no data duplication or copying between system memory and accelerator memory). AMD expects the performance of the MI300 to increase up to 8x in AI acceleration and up to 5x in energy efficiency compared to the MI250X.
In terms of energy efficiency, the El Capitan supercomputer was originally designed to consume around 40MW, but ultimately expects to consume only around 30MW at full real-world load.
AMD expects that the shift in performance and efficiency of the MI300 will allow the company to exceed the set target called 30×25. This is a sign of a commitment from 2019, when the company set a thirtyfold increase in energy efficiency in HPC systems by 2025.
The National Nuclear Security Administration (NNSA), which will operate El Capitan, intends to use its older Sierra supercomputer in parallel with it. For that reason, the power supply options for the entire computing park were expanded from 45 to 85 MW, and the cooling system was expanded from a capacity of 10,000 tons of water to 28,000 tons. The power supply of the entire cooling system is then rated at 15 MW.
Source: Diit.cz by diit.cz.
*The article has been translated based on the content of Diit.cz by diit.cz. If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!
*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.
*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!