Creating a GenAI Digital Assistant using Llama 2 on Red Hat OpenShift AI & Dell PowerFlex & ObjectScale

A Guest post by Anuraj PD

Red Hat OpenShift AI is a flexible, scalable MLOps platform with tools to build, deploy, and manage AI-enabled applications. OpenShift AI enables data acquisition and preparation, model training and fine-tuning, model serving and model monitoring, and hardware acceleration. Dell ObjectScale is high-performance containerized object storage built for the toughest applications and workloads—Generative AI, analytics and more. Please read more about Dell ObjectScale here. Dell PowerFlex, a software-defined infrastructure, provides a solid foundation for the customers for their IT infrastructure modernization. Please read more about Dell PowerFlex here. In this demo we are building a GenAI digital assistant to assist users by answering questions about Dell ObjectScale using the Llama 2 large language model on Red Hat OpenShift AI, integrated with Dell PowerFlex and Dell ObjectScale. Dell APEX Cloud Platform is the first fully integrated application delivery platform purpose-built for Red Hat OpenShift and can simplify and accelerate the AI adoption. Read more about Dell APEX Cloud Platform for OpenShift here. Dell validated design for Implementing a Digital Assistant with Red Hat OpenShift AI on Dell APEX Cloud Platform is available here.

Create S3 Bucket in ObjectScale for OpenShift Image Registry.

Integrate OpenShift Image Registry with ObjectScale storage.

Integrate Dell PowerFlex with Openshift cluster using Dell Container Storage Modules.

Install the Dell CSM Operator.

Create Dell PowerFlex CSM.

Create Storage class to provision PVC s from PowerFlex.

OpenShift AI depends on below Operators and these operators should be present in the OpenShift Cluster before starting the installation of OpenShift AI. In this demo we are using the automated configuration of these operators by the OpenShift AI, so we are not required to create any custom resources for these operators only need to make sure these operators are installed.

Red Hat OpenShift Service Mesh Operator
- Red Hat OpenShift Elasticsearch Operator
- Red Hat OpenShift Distributed Tracing Platform Operator
- Kiali Operator
Red Hat OpenShift Serverless Operator
Red Hat OpenShift Pipelines Operator
NVIDIA GPU Operator
- Node Feature Discovery Operator

In this demo the OpenShift nodes we are using don’t have GPU installed so will be skipping the GPU Operator installation. If the nodes have GPU installed and the GPU Operator is installed on the OpenShift we can assign GPU to the Pods.

Create a DataScienceCluster custom resource.

Verify DSCInitialization resource.

After the successful installation of the OpenShift AI, we will be able to launch the OpenShift AI console directly from the OpenShift Console.

Create a new Data Science Project in the OpenShift AI.

Configure cluster storage in the Data Science Project. Cluster Storage will be provisioned as PVC from PowerFlex and will be used as persistent storage for our notebook, configure the capacity accordingly.

Verify the PVC is created from PowerFlex.

Create S3 bucket in ObjectScale. This S3 bucket will be added as data connection in the Data Science Project and the models will be deployed from this S3 bucket.

Integrate the Data Science Project with the S3 bucket from ObjectScale.

Create a Workbench. Once the workbench is created and in running state, we can access the notebook from the link provided in the UI.

Download the large language model Llama 2 from huggingface. We are deploying the model on OpenShift AI using composite Caikit-TGIS runtime, which is based on Caikit and Text Generation Inference Server (TGIS). To use this runtime, we must convert our model from huggingface format to Caikit format. Use caikit-nlp module to convert the model.

Upload the converted model to ObjectScale S3 bucket.

Verify the model is successfully uploaded to the ObjectScale S3 bucket using S3 browser.

Deploy the model from the ObjectScale S3 bucket. For deploying large language models (LLMs), Red Hat OpenShift AI includes a single model serving platform that is based on the KServe component. The single model serving platform consists of the following components – KServe, Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh.

Verify the model is successfully deployed. Also, we can get the inference end point from the UI.

Use the inference endpoint to send query to the model.

We can observe the model response is not accurate or in other words the model lacks our domain specific information, so we need to ground our large language model in our domain knowledge – knowledge about Dell ObjectScale.

Retrieval-Augmented Generation (RAG) is a technique to enhancing the accuracy and reliability of generative AI models by bringing facts from an external knowledge base. By using RAG, we can bring our domain knowledge to the large language models. Vector embeddings are a way to convert objects like words, sentences, documents, images, and other data into numerical representation that capture their meaning and relationships. A vector database is a type of database that is specifically designed to handle vector embeddings making it easier to search and query data objects. There are variety of options available as vector database and in our demo, we are using the Redis Enterprise as our vector database. LangChain python library is used to implement RAG on top of the Llama 2 large language model. Gradio python library is used to build the digital assistant UI.

We need to do the embedding of domain specific information to a vector database and a vector database search of the query is performed to retrieve relevant domain specific information before sending to the query to the model. The domain specific information retrieved from the vector database search is used to enhance the prompt before sending to the model. This way the model gets all the relevant domain specific information included in the prompt and the model will be able to generate accurate and reliable response to the queries grounded in our domain knowledge. The domain information can come from variety of sources including the documentation, websites, data lake etc. In our case we are doing the embedding of Dell ObjectScale Storage pdf documents. The documents will be uploaded to S3 bucket for embedding, and during the embedding the documents will be pulled from S3 bucket and embeddings are created and stored in a vector database.

Install Redis Enterprise Operator.

Create Redis Enterprise Cluster.

Verify the Redis Pods and PVCs are created successfully. As we had specified the PowerFlex storage class all the required PVCs will be provisioned from PowerFlex Storage.

Create database in Redis Enterprise Cluster.

Open the Redis Enterprise UI and verify the database.

Create S3 bucket in ObjectScale to upload the documents for creating Embeddings.

Upload the ObjectScale documents to the S3 bucket using S3 browser.

Create Vector Embedding of the ObjectScale documents from the S3 bucket and store those embeddings in the Redis database.

Send user query to the model, include all the relevant domain specific information retrieved from the vector database search in the prompt as context.