Click to learn more about author Zivan Ori.
While self-driving cars and robots tend to dominate the conversations around artificial intelligence and machine learning (AI/ML), the real focus should be on improvements made to the automation of computing technology and data algorithms. Due to these advances, AI/ML have taken on elevated and diverse functions that extend beyond the technical space.
The popularity of AI/ML has been present in highly technical capacities across various industries, and it has also seeped into aspects of everyday life. Financial companies use machine learning algorithms to score credit and detect fraud, AI assists radiologists by scanning images and identifying cancerous growths, and motor vehicles increasingly incorporate data to alert drivers of maintenance issues. On a daily basis, virtual assistants like Siri and Cortana make it easy to play music and set appointments, streaming services provide suggestions based off of a user’s history, and social media pulls data from a user’s digital footprint to tailor its advertising.
Regardless of the scope or task, the above use cases all require a high level of parallel computing power, coupled with a high-performance low latency architecture to enable parallel processing of data in real-time across the compute cluster. The “training” phase of machine learning is critical and can take an excessively long time, especially as the training data set grows exponentially to enable deep learning for AI.
Since storage performance is a vital aspect of AI/ML application performance, the next step is to identify the ideal storage platform. Non-Volatile Memory Express (NVMe) based storage systems have gained traction as the storage media of choice to deliver the best throughput and latency. Shared NVMe storage systems unlock the performance of NVMe and offer a strong alternative to using local NVMe SSDs inside of GPU nodes.
The Increased Use of GPUs
GPUs were originally created for high performance image creation and are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them much more efficient than general purpose CPUs for algorithms where the processing of large blocks is done in parallel. For this reason, GPUs have found strong adoption in the AI/ML use case as they allow for a high degree of parallel computing and current AI focused applications have been optimized to run on GPU based computing clusters.
With the powerful compute performance of GPUs, the bottleneck moves to other areas of the AI/ML architecture. For example, the volume of data required to feed machine learning requires massive parallel read access to shared files from the storage subsystem across all nodes in the GPU cluster. This creates a performance challenge that NVMe shared storage systems are ideally suited to address.
Turning to Shared NVMe Storage
One of benefits of shared NVMe storage is the ability to create even deeper neural networks due to the inherent high performance of shared storage, opening the door for future models that cannot be achieved today with non-shared NVMe storage solutions.
Today, there are storage solutions that offer patented architectures built from the ground up to leverage NVMe. The key to performance and scalability is the separation of control and data path operations between the storage controller software and the host-side agents. The storage controller software provides centralized control and management, while the agents manage data path operations with direct access to shared storage volumes.
While AI/ML workloads are run exclusively on the GPUs within the cluster, that doesn’t mean that CPUs have been eliminated from the GPU clusters completely. The operating system and drivers still leverage the CPUs, but while the machine learning training is in progress, the CPU is relatively idle. This provides the perfect opportunity for an NVMe based storage architecture to leverage the idle CPU computing capacity for a high-performance distributed storage approach.
With NVMe protocol supporting exponentially more connections per SSD, the storage agents use RDMA to give each GPU node a direct connection to the drives. This approach enables the agents to perform up to 90 percent of the data path operations between the GPU nodes and storage, reducing latency to be on par with local SSDs.
In this scenario, running the NVMe based storage agent on the idle CPU cores of the GPU nodes enables the NVMe based storage to deliver 10x better performance than competing all-flash solutions, while leveraging existing compute resources that are already installed and available to use.