A guest post by Nomuka Luehr
Dell and NVIDIA have been collaborating for years to develop innovative solutions for artificial intelligence (AI) computing. They have now created a full-stack solution that allows enterprises to generate and deploy AI models at scale. This joint project builds on their previous work and further strengthens their leadership in the AI computing market.
Let’s Explore the Dell and NVIDIA Solution Architecture!
High Level Solution Architecture
The architecture is designed with modularity and scalability in mind, with components that can be mixed and matched to meet specific project requirements and accommodate different AI workflows.
Check out the high-level view of the solution architecture below, with emphasis on the software stack, from the infrastructure layer up through the AI application software layer. We’ll go into each layer in more detail in the following sections.
What primary workflows does the generative AI solution address?
The generative AI solution architecture addresses three primary workflows:
- Large Model Training
- Large Model Fine-tuning, P-tuning, and/or Transfer Learning
- Large Model Inferencing
As AI workflows have specific compute, storage, network, and software requirements, a modular solution design where components can be scaled independently based on customer needs is a must. To make it even more flexible, certain modules in the solution are optional or can be replaced by existing solutions within an organization’s AI infrastructure, such as preferred MLOps and Data prep modules or data modules. Here is a list of the various functional modules in the solution architecture:
Training Module |
AI optimized servers for training with Nvidia GPUs powered by XE9680/XE8640/R760XA with H100 NVL |
Inferencing Module |
AI optimized servers for inferencing with Nvidia GPUs powered by XE9680/R760XA with H100/L40/L4 |
InfiniBand Module |
InfiniBand Module for very low latency, high bandwidth GPU to GPU communication powered by QM9700 |
Ethernet Module |
High throughput and high bandwidth communication between other modules in the solution powered by Power Switch Z9432f |
Management Module |
Management Module to manage the cluster and along with it a head node for Bright/Omnia powered by R660 |
MLOps and Data Prep Module |
ML Operations and Data preparation module for running MLOps software, database, and other CPU-based tasks for data preparation powered by R660 |
Data Module |
High-throughput scale-out NAS powered by PowerScale, plus high-throughput scale out object storage powered by ECS and ObjectScale |
Is the solution easily scalable?
Yes! The solution architecture allows for flexible scaling of functional modules based on use cases and capacity needs. For large model training, a minimum unit consists of eight Dell XE9680 servers with sixty-four NVIDIA H100 GPUs, which can train a 175B parameter model in 112 days. By using six copies of this unit, the same model can be trained in 19 days.
Similarly, the InfiniBand module can be scaled up from supporting twenty-four to forty-eight XE9680 servers by doubling it in a fat tree architecture. The Data module employs scale-out storage solutions that can linearly scale to meet performance and capacity requirements as the number of servers and GPUs in the Training and Inference modules increases.
Is it secure?
Dell Technologies incorporates intrinsic security measures into its approach, integrating security throughout the development lifecycle. Security controls, features, and solutions are continuously updated to address evolving threats, and a Silicon Root of Trust provides a strong foundation. The PowerEdge Cyber Resilient Platform includes various security features, ranging from access control to data encryption to supply chain assurance. These features, such as Live BIOS scanning, UEFI Secure Boot Customization, and RSA Secure ID MFA, leverage intelligence and automation to proactively protect against threats and accommodate expanding usage models.
What Infrastructure components are important to consider for AI?
-
Generative AI models demand substantial computational power, especially during training, often utilizing powerful GPUs for accelerated processing.
-
Specialized accelerators like GPUs enable parallel processing and matrix multiplication required by these models.
-
Storage is crucial due to the large size of generative AI models, and distributed storage systems like Hadoop or Spark are commonly used during training, while network-attached storage or cloud-based solutions are preferred for inferencing larger models.
-
Networking plays a vital role in distributed training, where data exchange and model updates occur between nodes, benefitting from high-speed solutions like InfiniBand or RDMA to minimize latency.
By considering these requirements, businesses can effectively build and deploy fast, efficient, and accurate generative AI models.
So, which infrastructure components does Dell have to offer?
Dell Technologies offers acceleration-optimized servers and an extensive portfolio of NVIDIA GPUs for generative AI solutions.
The PowerEdge servers are designed for AI workloads, featuring improvements such as focus on acceleration, thoughtful thermal design, and multi-vector cooling.
-
The Dell PowerEdge XE9680 is a purpose-designed server for demanding AI, machine learning, and deep learning workloads. It features eight NVIDIA H100 NVLink GPUs and NVIDIA AI software, making it the industry’s first server with this configuration. The XE9680 maximizes AI throughput, enabling breakthroughs in natural language processing, recommender systems, and data analytics.
-
The PowerEdge XE8640 is a 4U server with four NVIDIA H100 Tensor Core GPUs and NVIDIA NVLink technology, along with upcoming Intel Xeon Scalable processors. It helps businesses develop, train, and deploy machine learning models for analysis.
-
The Dell PowerEdge R760XA is a dual-socket 2U server optimized for PCIe GPUs, enabling acceleration of AI training, inferencing, analytics, virtualization, and rendering applications. It delivers outstanding performance using Intel CPUs and supports various GPU accelerators.
The PowerScale storage solutions, including the F900 and F600 models, provide high-performance, scalable, and cost-effective file storage for demanding AI workloads. Dell also offers object-based storage products like ECS and ObjectScale, which deliver performance at scale for AI workloads.
The PowerSwitch networking technology, including the Z9432F-ON 100/400GbE fixed switch, offers high-density ports and functionality to meet data center demands.
What about software?
IT management is crucial for running complex multi-node systems, especially for generative AI workloads.
-
Dell’s OpenManage Enterprise simplifies IT management, offering server lifecycle management capabilities that save time and effort while providing real-time efficiencies and cost savings. It supports up to 8,000 devices, manages Dell servers, monitors Dell networking and storage infrastructure, and even integrates with third-party products. OpenManage Enterprise enhances security, efficiency, and time to value through predictive analysis, added insights, extended control, and intelligent automation. It features full-lifecycle configuration management, extensible plug-in architecture, and streamlined remote management capabilities.
-
Dell OpenManage Power Manager, integrated with the iDRAC (integrated Dell Remote Access Controller), enables maximizing data center uptime, controlling energy usage, and monitoring and budgeting server power based on consumption and workload needs. Power Manager helps mitigate operational risks and ensures efficient power consumption for demanding generative AI workloads.
-
Dell CloudIQ is a cloud-based proactive monitoring and predictive analytics application that combines human and machine intelligence. It integrates data from OpenManage consoles to monitor the health, capacity, performance, and cybersecurity of Dell infrastructure across multiple locations. With CloudIQ, users can easily manage and monitor their infrastructure, simplify monitoring across data centers and edge locations, and ensure critical workloads receive the necessary capacity and performance. This allows IT teams to spend more time on innovation and value-added projects.
How about NVIDIA?
The section describes the hardware acceleration and AI software components of NVIDIA used in the generative AI solution architecture.
The NVIDIA hardware components
-
The NVIDIA H100 Tensor Core GPU offers exceptional performance and scalability for various workloads. It supports up to 256 GPUs connected via the NVIDIA NVLink Switch System and incorporates breakthrough innovations in the NVIDIA Hopper architecture, delivering industry-leading conversational AI and speeding up large language models by 30X compared to the previous generation.
-
The NVIDIA H100 NVL GPU with NVLink is specifically designed for deploying massive language models at scale. It provides up to 12x faster inference performance for models like GPT-3 compared to the previous A100 generation.
-
The NVIDIA A100 Tensor Core GPU powers high-performance data centers for AI, data analytics, and HPC applications. It offers up to 20X higher performance than the previous Volta generation and supports scaling up or partitioning into isolated GPU instances with Multi-Instance GPU (MIG) for dynamic workload adjustment. The A100 also provides high memory capacity and the world’s fastest memory bandwidth for faster time to solution.
-
The NVIDIA L40 GPU Accelerator is a powerful graphics solution based on the Ada Lovelace Architecture. It supports hardware-accelerated ray tracing, AI features, advanced shading, and simulations for various graphics and compute use cases, delivering high-level performance with 48GB of memory.
-
The NVIDIA L4 Tensor Core GPU delivers universal acceleration and energy efficiency for video, AI, virtualized desktop, and graphics applications. It is optimized for inference at scale and offers significantly higher performance compared to CPU solutions in AI video, generative AI, and graphics applications. The L4 is versatile, energy-efficient, and ideal for global deployments, including edge locations.
-
NVIDIA NVLink is a fast interconnect that enables high-speed communication between GPUs in multi-GPU systems, with the fourth generation providing higher bandwidth and improved scalability. NVSwitch, built on NVLink technology, enhances communication for compute-intensive workloads, enabling high-speed collective operations with reduced latency.
NVIDIA AI software components
NVIDIA enterprise software solutions provide IT admins, data scientists, architects, and designers with tools to manage and optimize accelerated systems.
-
NVIDIA AI Enterprise is a cloud-native suite of AI software that accelerates the data science pipeline and streamlines the development and deployment of production AI applications. It offers a vast library of full-stack software, including workflows, frameworks, pretrained models, and infrastructure optimization capabilities. NVIDIA AI Enterprise supports generative AI, computer vision, speech AI, and more, automating essential processes and providing rapid insights from data. It includes NVIDIA NeMo, a framework for building and deploying generative AI models with billions of parameters, optimized for large-scale language and image applications.
-
NVIDIA Base Command Manager is a cluster manager for AI Infrastructure that enables seamless operationalization of AI development at scale, with features such as provisioning, job scheduling, and system monitoring. It integrates with various HPC workload managers and container technologies, providing extensive support and a robust health management framework.
My workload is unique, what design considerations should I keep in mind?
Dell and NVIDIA have collaborated to design scalable system configurations based on the requirements of different workflows and customer performance needs. The hardware and software choices depend on the specific workflow and data types (LLMs, text data, image, video, audio). However, certain considerations related to performance, memory, network, and storage are common across these workflows.
For Large Model Inferencing:
-
Large models, typically defined as those with more than 10 billion parameters, have a significant memory footprint.
-
Communication between GPUs is crucial when the model is split between them. NVIDIA Triton Inference Server supports multi-GPU deployment with its Fast Transformer technology.
-
For models above 40 billion parameters, the XE9680 configuration is recommended. For smaller models, the 760XA with H100 NVL delivers good performance.
-
The Z9432f networking solution supports high concurrency needs and scales linearly up to 32 nodes.
For Large Model Fine-tuning:
-
Fine-tuning, P-training, and transfer learning with pre-trained large models require substantial information exchange between GPUs of different nodes. An InfiniBand (IB) module is necessary for optimized performance and throughput, along with an 8-way GPU configuration with all-to-all NVLink connections.
-
P-tuning involves using a small trainable model before utilizing the large language model (LLM).
-
For models smaller than 40 billion parameters, the XE8640 configuration may suffice, while larger models benefit from the XE9680.
-
A high-performance data module may be needed for certain prompt engineering techniques that require a large dataset.
For Large Model Training:
-
Training generative AI models or LLMs requires extensive compute resources. Splitting the model across multiple GPUs is necessary due to their large memory footprint.
-
XE9680 configuration with 8 Nvidia GPUs connected via NVLink and NVSwitch is beneficial due to the model size, parallelism techniques, and dataset requirements.
-
IB Module is essential for high-performance and throughput during information exchange between GPUs of different nodes.
-
A fat tree network topology with additional IB Modules may be needed as the cluster expands.
-
Scaling PowerScale appropriately is required to meet IO performance requirements.
-
Checkpointing is a standard technique used in large model training, with the size of checkpoints depending on the model’s size and parallelism dimensions.
-
The F600 Prime storage solution provides high throughput performance for checkpointing.
Final thoughts
Ultimately, Project Helix is a collaborative effort between Dell and NVIDIA that aims to bring the benefits of generative AI to the enterprise. It combines Dell’s infrastructure and software with NVIDIA’s award-winning software stack and accelerator technology to deliver a comprehensive solution. The key objectives of Project Helix are as follows:
-
Full-Stack Solution: Project Helix offers end-to-end generative AI solutions utilizing Dell’s infrastructure and software, along with the latest NVIDIA accelerators, AI software, and expertise.
-
On-Premise Deployment: Enterprises can leverage purpose-built generative AI solutions on-premises to address specific business challenges.
-
Lifecycle Support: Project Helix assists enterprises throughout the entire generative AI lifecycle, including infrastructure provisioning, large model development and training, fine-tuning of pre-trained models, multi-site model deployment, and large model inferencing.
-
Trust and Security: The project prioritizes the trust, security, and privacy of sensitive and proprietary company data, ensuring compliance with government regulations.
By leveraging Project Helix, organizations can automate complex processes, enhance customer interactions, and unlock new possibilities through improved machine intelligence. Dell and NVIDIA are at the forefront of driving innovation in the enterprise AI landscape with this collaboration.
What services are available for me to get started?
Dell Technologies offers a comprehensive range of services to support businesses in their AI solutions and data center needs.
-
Consulting Services provide expert guidance throughout the data analytics journey, enabling companies to leverage their data capital and implement advanced techniques like AI, ML, and DL.
-
Deployment Services streamline the process of bringing new IT investments online efficiently and reliably.
-
Support Services leverage AI and DL technologies to proactively detect and prevent issues, maximizing productivity and uptime.
-
Managed Services help reduce the complexity and cost of IT management, allowing resources to focus on innovation and transformation.
-
Residency Services provide expert support for IT transformation and ensure peak performance of IT infrastructure.
Want to learn more?
It’s been fun exploring different aspects of generative AI and LLMs together! Get to know the solution in full detail through the official white paper:
Check Out the Entire Generative AI 101 Blog Series:
The post Dell Project Helix : Generative AI 101 Part 5: The Joint Dell and NVIDIA Solution Architecture appeared first on Itzikr's Blog.