A guest post by Scott Delandy
GenAI is a new paradigm of artificial intelligence that leverages unstructured data to create and deploy innovative AI models and applications. Examples of unstructured data types used to build AI models can include emails, documents, web pages, social media posts, multimedia content, sensor data and rich media. Effectively managing these different unstructured data types can pose many data storage challenges such as performance and resource sizing, scalability for both big and small datasets, and flexibility to adapt to changing requirements. Whether you are starting a GenAI pilot project or ramping up models into production, choosing the storage right architecture to build your AI factory is a critical strategic decision.
PowerScale’s proven scale out architecture enables GenAI developers to store, access, and manage their unstructured data with ease and efficiency. It can support a wide range of GenAI use cases, such as content generation and augmentation, image generation, developing chatbot agents, and building code generation. Let’s dig into the details of how and why PowerScale’s architecture can help develop a storage strategy to help deliver your Gen AI models faster, easier, and more cost effectively.
PowerScale Architecture Overview
Let’s start with some of the basics around PowerScale and OneFS. It is based on a scale out architecture that has proven to be a powerful solution for managing unstructured data in a distributed environment. It consists of three layers that work together to provide fast, flexible, and scalable file access.
These are:
Client Access Layer – This layer is a key component of the network file system that enables efficient and versatile access to unstructured data from various clients and workloads. It supports high speed ethernet connectivity with multiple protocols, such as NFS, SMB, and HDFS, to simplify and unify file access across different workloads. This includes support for NVIDIA GPUDirect Storage over high speed ethernet and RDMA via NFSoRDMA, which allows direct data transfer between GPU memory and storage devices for Gen AI applications. The client access layer also implements intelligent load-balancing policies based on IP ports, CPU utilization, network bandwidth, and other factors to optimize the performance and availability of client access. Moreover, it provides multitenancy controls to ensure security and service levels for different tenants and users.
OneFS File Presentation Layer – This layer combines the advantages of a distributed environment with the simplicity of a single namespace. It allows users to access data from any location in the cluster, without worrying about where the data is physically stored. OneFS also integrates volume management, data protection, and tiering capabilities, making it easy to manage large amounts of data across different storage types. OneFS ensures high availability and reliability by eliminating any single point of failure, supporting any-to-any failover, and providing multiple levels of redundancy. OneFS also enables non-disruptive operations, such as upgrades, expansions, and migrations. With OneFS, users can enjoy a smart and efficient file system that meets their diverse needs.
PowerScale Compute and Storage Cluster Layer – This layer provides the nodes and internode networking elements to provide a file cluster solution that can scale and provide high availability across clusters. It allows you to customize your file clusters according to your performance and scale needs. You can start with a small and affordable cluster that can handle capacity and computational tasks such as training, tuning, and inferencing. You can also easily scale out and auto balance your cluster from 3 nodes (<50TB) to 252 nodes (186PB) without any extra administration work. The nodes are designed to easy lifecycle management allowing you to perform upgrades, migrations, node additions, and tech refreshes without disrupting your cluster operations.
Each of the layers are essential for deploying gen AI applications, as they enable high-performance data ingestion, processing, and analysis in a flexible, scalable, and “always on” manner.
PowerScale Core Capabilities to Enable Gen AI Workloads
With the latest enhancements to PowerScale nodes and OneFS software, developers can speed up the AI lifecycle from data preparation to model inference. New PowerScale storage systems, powered by the newest Dell PowerEdge servers provide Gen AI models with more performance to boost streaming reads and writes, further enhancing AI model training and fine-tuning to make smarter data-driven decisions with higher speed and accuracy. In addition to the new high performance and high-density nodes, here are some other core capabilities that can better enable your GenAI workloads.
PowerScale with GPUDirect for Ultra High Performance – GPUDirect Storage is a technology that allows faster data access for GPUs by creating a direct path between GPU memory and storage. This eliminates the need for data to go through the CPU, which reduces latency and increases bandwidth. PowerScale supports GPUDirect Storage and integrates seamlessly with GPUDirect enabled servers, NFS over RDMA, and PowerScale. Let’s take a deeper look at each of these.
GPUDirect enabled servers that can communicate directly with PowerScale and is designed for high bandwidth use cases such as AI training and HPC. NFS over RDMA enables zero-copy networking, which means data from storage is transferred directly to the client without copying it to memory or the OS data buffers. PowerScale’s all flash direct data access allows high throughput and bandwidth for single connection and read-intensive workloads. The combined stack not only improves bandwidth and throughput by 2-8 times, it also reduces CPU utilization on both the cluster and the client allowing for more processing power overhead.
PowerScale Client Driver for High Throughput Ethernet Support – The optional client drives is a software tool that enhances the performance of NFS clients that access PowerScale storage clusters over high-speed Ethernet networks. It enables clients to use multiple TCP connections to different PowerScale nodes simultaneously, instead of being limited to a single node per NFS mount point. This way, clients can leverage the distributed architecture of PowerScale and achieve higher throughput for their I/O operations.
The driver works by creating a logical server that consists of a group of IP addresses, each corresponding to a PowerScale node. The client can then mount the logical server as a single NFS mount point and send I/O requests to any of the IP addresses in the group. The driver also handles failover and load balancing among the nodes, ensuring reliability and efficiency. The key benefit that it enables NFS clients to perform I/O operations to multiple nodes in the cluster through a single NFS mount point and allows clients to achieve higher throughput and better load balancing. This means better single NFS mount performance for applications that need to access multiple files simultaneously. It also can increase the bandwidth for a single NFS mount by distributing the read and write operations across multiple streams, reducing the latency and contention. And it can improve the performance for heavily utilized NICs by balancing the network traffic to a single PowerScale node, avoiding bottlenecks and congestion.
Support for NFS over TCP via the client driver can provide a cost effective alternative to NFS over RDMA for many performance requirements. NFS over TCP uses standard IP switches and does not require special hardware or software. PowerScale is also the first ethernet storage solution that has been validated on NVIDIA DGX SuperPOD, a powerful AI platform that requires high speed single client access.
PowerScale Scale Out to Scale Up and Down – PowerScale is designed to grow with your needs. You can start with a small and affordable cluster (< 50TB) and expand it easily by adding more nodes. This way, you can build a Gen AI environment that can scale to multiple PBs with different node types and generations. GenAI tasks, such as training and inferencing, requires significant computational power and GPU resources, but in many cases, not much storage space is required. PowerScale provides ultra high-performance when combined with powerful GPUs to handle high-bandwidth streaming reads, which speed up model training and checkpointing. These performance capabilities reduce GPU’s idle time and increases utilization. To optimize the performance and cost of GenAI, it is important to choose the right size resources based on the needs of each task. This way, users can avoid wasting money and resources by over provisioning storage or under provisioning performance.
PowerScale nodes are designed to work together in a seamless way, regardless of their type or configuration. This means that you can scale up your performance and capacity as you need, without compromising on efficiency or reliability. That allows you to scale performance and capacity linearly across nodes, while ensuring that each node has its own resources to deliver consistent and predictable performance service levels. Dell testing of various models indicates users can easily run most GenAI workloads on a single three node cluster, which is easy to set up and manage.
PowerScale Flexibility to Support Storage Tiers – Depending on your storage needs and budget, PowerScale lets you can choose from several types of nodes. All Flash nodes are ideal for high-speed applications that require low latency and high IOPS. Hybrid nodes combine flash and HDD drives to balance speed and cost but can still be attractive for high bandwidth streaming and sequential access needs. Archive nodes use HDD drives and provide large capacity and low cost per GB. To ensure service levels for AI models are maintained, it is important to have consistent and predictable performance across PowerScale node resources. This can be achieved by using intelligent load-balancing policies that consider numerous factors such as IP ports, CPU utilization, and network bandwidth. These policies can optimize the access of clients or tenants to the node resources and avoid bottlenecks or overloads.
All nodes support in line data reduction, which can lower the effective storage capacity costs by eliminating duplicate or redundant data. However, in line data reduction may not be effective for all types of data, such as encrypted, compressed, or non-deduplicable data. In these cases, using hybrid arrays with HDD drives may be more economical than using all flash drives, especially for streaming workloads that still need high front end streaming bandwidth.
PowerScale’s All Flash, Hybrid, and Archive options help to optimize your costs and meet your specific use cases, including performance needs, data set size and type, and cost considerations. Smart tiering automatically moves data to optimize your storage resources for various use cases, deployment models, and cost points, and access your data whenever and wherever you need it.
PowerScale for GenAI Workloads Summary
If you are looking for a reliable and scalable platform to build your AI factory, you should consider PowerScale and OneFS as the foundation. PowerScale and OneFS offer you the flexibility to start with a small and affordable cluster and grow it as your needs increase, without compromising performance or efficiency. You can also benefit from GPUDirect technology, which allows direct communication between GPUs and storage, reducing latency and CPU overhead. Moreover, PowerScale and OneFS provide low latency NFSoRDMA, which enables fast data ingestion, pre-processing, and AI training. You can also extend your cluster to the cloud and leverage cloud-native services for your AI workloads.
Bottom line, architecture matters. With PowerScale and OneFS, you can accelerate your AI journey and achieve better outcomes.
The post Architecture Matters: Dell PowerScale for GenAI Workloads appeared first on Itzikr's Blog.