Silicon Mechanics How PCIe Will Change HPC and AI System Design

How PCIe Will Change HPC and AI System Design

By Curtis Elgin, Solution Engineer

February 15, 2021

Intel® has been the go-to for high-performance computing (HPC) and artificial intelligence (AI) processors for years. Lately though, the second generation AMD® EPYC™ has been getting a lot of attention from end users.

There are some key differences between Intel and AMD systems that impact how HPC & AI systems are designed and these have a huge impact on how workloads are processed. The biggest differences are those related to peripheral component interconnect express (PCIe), since it provides so many more options for HPC and AI designs. That’s why PCIe can be a game-changer for end users.

Per AMD-provided benchmarks, the AMD EPYC has 25% better performance than Intel. It costs one quarter the price per processor while also having more cores. But the one difference between the two processors that is most critical in my opinion is AMD’s PCIe slots.

First, AMD has more PCIe slots and more PCIe lanes which can really impact building a balanced system. This gives you much more flexibility in terms of architectural options. You can connect more equipment and even reduce the need for additional switches in some cases. But, it’s not just that having more PCIe lanes is good, though. It’s also about this PCIe generation being faster, because that changes *what* you can connect to them.

These new AMD processors use the new fourth generation PCIe technology, which supports a 16 GT/s bit rate. That means you can add lots of high bandwidth devices solution architects like me only used to dream about.

For some vendors, the added flexibility doesn’t help because the supply chain, vendor agreements, or similar issues preclude designers from pushing for custom-tailored, potentially one-of-a-kind architectures for each client.

But, for firms that are vendor agnostic and have access to the latest technology, the latest generation of PCIe really opens up new options. That means we can improve effectiveness and efficiency of the designs, even for larger systems that must be more generic so all users can access some HPC resources.

The impact of the fast PCIe technology available today is spread over several areas. The most important, in my opinion, are:

The ability to use more x16 devices (such as graphics processing units (GPUs) and network cards) at full speed – which means data can be transferred at a faster rate
The ability to use higher bandwidth network cards – which means more quantities of data can be transferred per second
Non-volatile memory express (NVMe) storage was already incredibly fast and with PCIe Gen 4 it is even faster. In some cases, there is twice the performance in speed and throughput

The impact of these new, speed-enabled capabilities becomes clear when you look at ever-evolving tech stacks. In the storage space, for example, you now have composable infrastructure technology from companies like Liqid that allows resources to be dynamically reconfigured as needed. But you can’t really take advantage of all these exciting capabilities with older, slower peripheral component interconnects.

There Large Hadron Collider (LHC) at CERN works with amazing quantities of data and has publicly stated that they get much higher I/O and memory bandwidth — more than a terabit per second of data – with their AMD-based system. If they get that kind of performance, other end users will be in great shape. Plus, more PCIe lanes means more NVMe drives at native speed, versus storage interfaces running at switched speeds (which adds a latency and bottleneck points). Full utilization will make a huge difference in stored data access and processing.

The impact of PCIe 4.0 on the compute and GPU stacks is equally impressive. The data sets and complexity of calculations being asked of modern CPUs and GPUs continue to increase. Networking demands have also increased as we push larger and larger data sets. PCIe 4.0 lets us to bring in these bigger data pipes by doubling our bandwidth.

Beyond being the first to make PCIe 4.0 available, AMD is also being used in 21 of the top 50 supercomputers because it works well with powerful GPUs from companies like NVIDIA®. Today’s GPUs are so powerful that its technically possible to use a single, specifically architected server to do HPC and AI. (That won’t work for everyone – because you put your research at risk with only one node and, for AI, the more nodes you have, the more training you can do in a shorter period of time – but the raw power is certainly improving.)

The dramatic expansion of options from fourth generation PCIe is even showing itself in a separate push by AMD to optimize for GPUs (via the open source ROCm platform for HPC/Hyperscale-class GPU computing). They have even provided tools to convert CUDA code.

But, making the decision to change from Intel to AMD so you can take advantage of PCIe should not be made without real consideration. Even at a basic level, code is compiled differently on AMD than Intel. There is no virtualization on AMD, though you can do containerization on AMD, and there is no transition path. In addition, not every end user will get the benefits of PCIe 4.0 or GPUs from these AMD systems.

There is also another issue. You hear a lot about the lack of experienced system administrators and how leading universities and even national labs are trying to address. But end users need to realize that there is a similar experience gap among HPC and AI system designers who have used or even have been trained on the latest technologies that could be used now that we have more and faster PCIe. It’s worth finding a vendor who can take advantage of PCIe 4.0 so they can help evaluate the applicability for each specific system design, though.

As exciting as PCIe 4.0 is, future generations of PCIe are certain to be even faster and provide even greater options, regardless of who brings them to market first. Most exciting of all is that, because the PCIe 4.0 specification took longer than expected to be finalized and so much work has already been done on the PCIe 5.0 specification, that new technology should start to start to roll out soon. And, because of the constant doubling of capacity, PCIe 5.0 is expected to have bandwidth of 32 GT/s or 8 GB/s per lane (bi-directional) and a total of 128 GB/s for 16 lanes. PCIe 6.0, when it arrives, will likely have 64 GT/s or 256 GB/s for 16 lanes. To paraphrase Dr. Seuss, oh the places we’ll be able to go!

About Silicon Mechanics

Silicon Mechanics, Inc. is one of the world’s largest private providers of high-performance computing (HPC), artificial intelligence (AI), and enterprise storage solutions. Since 2001, Silicon Mechanics’ clients have relied on its custom-tailored open-source systems and professional services expertise to overcome the world’s most complex computing challenges. With thousands of clients across the aerospace and defense, education/research, financial services, government, life sciences/healthcare, and oil and gas sectors, Silicon Mechanics solutions always come with “Expert Included” ^SM.

Latest News

Introducing DataFlow NAS

July 9, 2025

Your data is growing, your applications are evolving, and your business needs more than a basic box of disks.

In today's data-driven environment, a one-size-fits-all storage solution isn't enough.

December 2, 2024