Intel unveiled the performance of the Ponte Vecchio

For the time being, of course, it is mostly only in numbers, but these are also quite important data.

Intel has been developing the design codenamed Ponte Vecchio for quite some time, the structure of which was already explained at the beginning of the year, and the capacity of the caches was also known data earlier, and the company has now revealed more details about the system at the Hot Chips 34 event, in the framework of which the expected computing performance can be known concretely.

According to the blues, the fully equipped Ponte Vecchio provides 52 TFLOPS with single and double precision, and in the case of XMX operations, you can expect 419 TFLOPS with TF32, 839 TFLOPS with BF16 and FP16, and 1678 TOPS with Int8 data type.


The upcoming accelerator is considered very special in that it can operate in both SIMT and SIMD modes. With GPGPUs, the SIMT model is typically used, because such an amazing amount of data is processed that it is extremely difficult to manually optimize the ideal vectors for processing. Of course, Intel has already come up with such an idea in the past – a settled on Larrabee , but now the situation is so much better that even if the SIMD mode will not always be usable, in the case of the Ponte Vecchio there is the SIMT option as an escape route. Regardless, the company strongly emphasizes that it will be easier to transfer the codes written on the CPU to SIMD mode, which is true in itself, the question is whether they will scale, or whether it is better to rewrite them to the SIMT model.

As a start, Intel will include a toolkit called DPC++ Compatibility Tool, the most important element of which is SYCLomatic. This automatically converts CUDA codes to SYCL code, and here you can expect 90-95% good results, some parts of the code may have problems, the optimization of which must be completed manually. Here, the manufacturer helps to the extent that it provides the developers with automatically generated file information about what to do next, which contributes to getting the most performance out of the port. This concept is very similar to AMD’s HIPify tool, but the situation is so much more favorable that, unlike it, it does not provide HIP code, but SYCL, which is considered an industry standard, which can then be transferred to quite a lot of hardware, even unlike HIP.

By the way, Intel’s main weapon on the software front will be adherence to industry standards. The importance of porting CUDA codes obviously does not need to be explained, but unlike AMD, they did not create their own, so to speak, more open platform, but where possible, they really try to build on the standards, and this in itself will be impressive for the market.

Source: Hírek és cikkek – PROHARDVER! by

*The article has been translated based on the content of Hírek és cikkek – PROHARDVER! by If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!