Welcome, Please Sign In

Get in touch with your rep, view past orders, save configurations and more. Don't have an account? Create one in seconds below.

login

Infrastructure Requirements for AI Inference vs Training

June 24, 2022

Investing in deep learning (DL) is a major decision that requires understanding of each phase of the process, especially if you’re considering AI at the edge. Below are practical tips to help you make a more informed decision about DL technology and the composition of your AI cluster.

For the purposes of this article, let’s define the terms we’ll be using:

Neural Network: Artificial neural networks are computing systems inspired by the organic neural networks found in human and other animal brains, where nodes (artificial neurons) are connected (artificial synapses) to work together.

Training: Learning a new capability from existing data

Inference: Applying this capability to new data (usually via an application or service)

How does AI deep learning work?

In order for deep learning of an artificial neural network, your team must curate huge quantities of data into a designated structure, then feed that training dataset into a DL framework

Once the DL framework is trained, it has learned what inputs lead to what logical conclusion. The DL framework can now use this capability when it is exposed to novel data and make inferences about the new data that allow action.

For example, after seeing 50,000 images of dogs with solid color coats, upon seeing an image of a multicolored dogs, it should be able to infer that this image is also of a dog and not something else, like a car or a bicycle. The app or service using the inference model then uses the data in some way.

However, the infrastructure needed to achieve training versus inference is different in some critical ways.

What are key elements to look for in DL training infrastructure?

It is crucial to get as much raw compute power and as many nodes as you can afford. Think multi-core processors and GPUs. Why? The most critical issues our clients are facing today is getting accurately trained AI models. The more nodes and the more mathematical accuracy you can build into your cluster, the faster and more accurate your training will be.

Training often requires incremental addition of new data sets that remain clean and well-structured. That means these resources cannot be shared with others in the datacenter. Focus on optimization for this workload and you’ll have better performance and more accurate training than if you try to make a general compute cluster with the assumption that it can take on other jobs in its free time.

Huge training datasets require massive networking and storage capabilities to hold and transfer the data, especially if your data is image-based or heterogeneous. Plan ahead for adequate networking and storage capacity, not just for strong computing.

The greatest challenge in designing hardware for neural network training is scaling. Doubling the amount of training data doesn’t mean doubling the number of resources used to process it. It means expanding exponentially.

What are key elements to look for in DL inference infrastructure?

Inference clusters should be optimized for performance. Think simpler hardware with less power than the training cluster but with the lowest latency possible.

Throughput is critical to inference. The process requires high I/O bandwidth and enough memory to hold both the required training model(s) and the input data without having to make calls back to the storage components of the cluster.

Datacenter resource requirements for inference are typically not as great for a single instance compared to training needs. This is because the amount of data or number of users an inference platform can support is limited to the performance of the platform and the application requirements. Think of speech recognition software, which can only operate when there is one, clear input stream. More than one input stream renders the application inoperable. It’s the same with inference input streams.

What are special considerations for inference on the edge?

Edge-based computers are significantly less powerful than the massive compute power that’s located at data centers and the cloud. But that's ok because inference requires much less processing power than training clusters.

If you have hundreds or thousands of instances of the neural network model to support, though, remember that each of these multiple incoming data sources needs sufficient resources to process the data.

Normally, you want your storage and memory as close to the processor as possible, to reduce latency. When you have edge devices, though, the memory is sometimes nowhere near the processing and storage components of the system. This means you either need a device that supports GPU or FPGA compute and storage at the edge, and/or access to a high-performance, low-latency network.

You could also use a hybrid model, where the edge device gathers data but sends it to the cloud, where the inference model is applied to the new data. So long as the inherent latency of moving data to the cloud is acceptable (it is not in some real time applications, such as self-driving cars), this could work for you.

Are there unique reference architectures for DL training and inference?

To support these unique needs, Silicon Mechanics offers several unique reference architectures clients can use to start their DL deployments:

Take a more in-depth look at DL technology with these additional resources.


About Silicon Mechanics

Silicon Mechanics, Inc. is one of the world’s largest private providers of high-performance computing (HPC), artificial intelligence (AI), and enterprise storage solutions. Since 2001, Silicon Mechanics’ clients have relied on its custom-tailored open-source systems and professional services expertise to overcome the world’s most complex computing challenges. With thousands of clients across the aerospace and defense, education/research, financial services, government, life sciences/healthcare, and oil and gas sectors, Silicon Mechanics solutions always come with “Expert Included” SM. Learn more at www.siliconmechanics.com.

Latest News

Aerospace/Defense Can Improve AI Inference w/ GPU Acceleration

July 20, 2022

AI workloads have led to new hardware standards that need GPU acceleration. GPU-accelerated neural net training/inference is now an advantageous option.

READ MORE

Overcome Limitations with Edge AI and Edge HPC Workloads

July 15, 2022

Form factor, latency, and power can all be key limitations, but find out how key advancements in technology will allow higher performance at the edge.

READ MORE

Latest in Social

Expert Included

Our engineers are not only experts in traditional HPC and AI technologies, we also routinely build complex rack-scale solutions with today's newest innovations so that we can design and build the best solution for your unique needs.

Talk to an engineer and see how we can help solve your computing challenges today.