Quantcast
Channel: Itzikr's Blog
Viewing all articles
Browse latest Browse all 509

How I built an AI platform (Riva Speech Services) with RedHat OpenShift, NVIDIA GPU’s, and Dell PowerFlex

$
0
0

A post by Kailas Goliwadekar

In the recent past, I’ve been working on AI/ML with PowerFlex as SDS.

My objective was to build a cloud-native artificial intelligence platform made up of Red Hat OpenShift cluster, NVIDIA GPU’s and PowerFlex. So initially, I built PowerFlex 4.0 platform of 4 SDS. On separate PowerEdge nodes, I built an OpenShift BareMetal cluster with 3 master nodes and 4 worker nodes. The entire process of OpenShift deployment was carried out with assisted installer.

Then PowerFlex CSI is deployed on the OpenShift worker nodes that enables the pods to connect with PowerFlex storage.

The logical architecture of OpenShift on PowerFlex is showcased in below figure.


To carry out speech recognition services from NVIDIA, I had to first install GPU operator from OpenShift console. The NVIDIA GPU Operator makes the underlying GPUs of a compute node available to containerized workloads.

A prerequisite for running the GPU operator is the Node Feature Discovery (NFD) Operator, which detects hardware features and system configuration at a node level. After installing the NFD Operator and creating a NodeFeatureDiscovery instance, we can start with installing the NVIDIA GPU Operator and creating an instance of ClusterPolicy.

To deploy the Riva API I performed the following steps on my OpenShift cluster.

export NGC_CLI_API_KEY=<your NGC API key>

export VERSION_TAG=”2.11.0″

helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-${VERSION_TAG}.tgz –username=’$oauthtoken’ –password=$NGC_CLI_API_KEY tar -xvzf riva-api-${VERSION_TAG}.tgz

In the riva-api folder, I have chosen asr,nlp, and tts to true or false as needed. Also, I changed the service.type from LoadBalancer to ClusterIP. This directly exposes the service only to other services within the cluster.

Enable the cluster to run containers needing NVIDIA GPUs using the nvidia device plugin

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update                                  helm install –generate-name –set failOnInitError=false nvdp/nvidia-device-plugin

[root@ocp411-admin Samples]# oc get pods

NAME READY STATUS RESTARTS AGE

nvidia-device-plugin-1683609115-2rh8w 1/1 Running 0 3d6h

nvidia-device-plugin-1683609115-9x9tg 1/1 Running 0 3d6h

nvidia-device-plugin-1683609115-gm642 1/1 Running 0 3d6h

nvidia-device-plugin-1683609115-gpmtm 1/1 Running 0 3d6h

Install the Riva Helm Chart. You can explicitly override variables from the values.yaml file, such as the riva.speechServices.[asr,nlp,tts] settings.

helm install riva-api riva-api/ \

–set ngcCredentials.password=`echo -n $NGC_CLI_API_KEY | base64 -w0` \

–set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` \

–set riva.speechServices.asr=true \

–set riva.speechServices.nlp=true \

–set riva.speechServices.tts=true

The Helm chart runs two containers in order: a riva-model-init container that downloads and deploys the models, followed by a riva-speech-api container to start the speech service API. Depending on the number of models, the initial model deployment could take an hour or more. To monitor the deployment, use kubectl to describe the riva-api pod and to watch the container logs.

export pod=`kubectl get pods | cut -d ” ” -f 1 | grep riva-api`

kubectl describe pod $pod

kubectl logs -f $pod -c riva-model-init

kubectl logs -f $pod -c riva-speech-api

Since the Riva service is running now, the cluster needs a mechanism to route requests into Riva. So deploy the open source Traefik edge router.

helm repo add traefik https://helm.traefik.io/traefik

helm repo update

helm fetch traefik/traefik

tar -zxvf traefik-*.tgz

Modify the traefik/values.yaml file. Change service.type from LoadBalancer to ClusterIP. This exposes the service on a cluster-internal IP. Now Deploy the modified traefik Helm chart.

helm install traefik traefik/

An IngressRoute enables the Traefik load balancer to recognize incoming requests and distribute them across multiple riva-api services. When you deployed the traefik Helm chart above, Kubernetes automatically created a local DNS entry for that service: traefik.default.svc.cluster.local. The IngressRoute definition below matches these DNS entries and directs requests to the riva-api service. Create the following riva-ingress.yaml file:

apiVersion: traefik.containo.us/v1alpha1

kind: IngressRoute

metadata:

name: riva-ingressroute

spec:

entryPoints:

– web

routes:

– match: “Host(`traefik.default.svc.cluster.local`)”

kind: Rule

services:

– name: riva-api

port: 50051

scheme: h2c

Deploy the IngressRoute.

kubectl apply -f riva-ingress.yaml

The Riva service is now able to serve gRPC requests from within the cluster at the address traefik.default.svc.cluster.local.

Riva provides a container with a set of pre-built sample clients to test the Riva services.

Create the client-deployment.yaml file that defines the deployment and contains the following:

apiVersion: apps/v1

kind: Deployment

metadata:

name: riva-client

labels:

app: “rivaasrclient”

spec:

replicas: 1

selector:

matchLabels:

app: “rivaasrclient”

template:

metadata:

labels:

app: “rivaasrclient”

spec:

nodeSelector:

eks.amazonaws.com/nodegroup: cpu-linux-clients

imagePullSecrets:

– name: imagepullsecret

containers:

– name: riva-client

image: “nvcr.io/nvidia/riva/riva-speech-client:2.11.0”

command: [“/bin/bash”]

args: [“-c”, “while true; do sleep 5; done”]

Deploy the client service.

kubectl apply -f client-deployment.yaml

export cpod=`kubectl get pods | cut -d ” ” -f 1 | grep riva-client`

kubectl exec –stdin –tty $cpod /bin/bash

[root@ocp411-admin Riva]# oc get all

NAME READY STATUS RESTARTS AGE

pod/nvidia-device-plugin-1683609115-2rh8w 1/1 Running 0 3d7h

pod/nvidia-device-plugin-1683609115-9x9tg 1/1 Running 0 3d7h

pod/nvidia-device-plugin-1683609115-gm642 1/1 Running 0 3d7h

pod/nvidia-device-plugin-1683609115-gpmtm 1/1 Running 0 3d7h

pod/riva-client-668dd7594b-cr68q 1/1 Running 0 2d7h

pod/riva-riva-api-7d5b75687b-4t6kn 1/1 Running 0 2d6h

pod/traefik-6fbf57555d-xw82v 1/1 Running 0 2d8h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP 4d6h

service/openshift ExternalName <none> kubernetes.default.svc.cluster.local <none> 4d6h

service/riva-riva-api ClusterIP 172.30.6.226 <none> 8000/TCP,8001/TCP,8002/TCP,50051/TCP 2d6h

service/traefik ClusterIP 172.30.246.217 <none> 80/TCP,443/TCP 2d8h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE

daemonset.apps/nvidia-device-plugin-1683609115 4 4 4 4 4 <none> 3d7h

NAME READY UP-TO-DATE AVAILABLE AGE

deployment.apps/riva-client 1/1 1 1 2d7h

deployment.apps/riva-riva-api 1/1 1 1 2d6h

deployment.apps/traefik 1/1 1 1 2d8h

NAME DESIRED CURRENT READY AGE

replicaset.apps/riva-client-668dd7594b 1 1 1 2d7h

replicaset.apps/riva-riva-api-7d5b75687b 1 1 1 2d6h

replicaset.apps/traefik-6fbf57555d 1 1 1 2d8h

[root@ocp411-admin Riva]#

Let’s jump on to the demo. First login to the riva-client pod and carry out Riva ASR and Riva TTS tests

[root@ocp411-admin Riva]# kubectl exec –stdin –tty $cpod /bin/bash

root@riva-client-668dd7594b-cr68q:/opt/riva# riva_streaming_asr_client \

> –audio_file=wav/en-US_sample.wav \

> –automatic_punctuation=true \

> –riva_uri=traefik.default.svc.cluster.local:80

I0512 13:00:14.664886 47228 riva_streaming_asr_client.cc:150] Using Insecure Server Credentials

Loading eval dataset…

filename: /opt/riva/wav/en-US_sample.wav

Done loading 1 files

what

what

what is

what is

what is

what is now

what is natural

what is natural

what is natural language

what is natural language

what is natural language

what is natural language

what is natural language Processing

what is natural language Processing

what is natural language Processing

what is natural language Processing

what is natural language Processing

what is language Processing

what is language Processing

What is Natural Language Processing?

———————————————————–

File: /opt/riva/wav/en-US_sample.wav

Final transcripts:

0 : What is Natural Language Processing?

Timestamps:

Word Start (ms) End (ms)

What 840 880

is 1160 1200

Natural 1800 2080

Language 2200 2520

Processing? 2720 3200

Audio processed: 4 sec.

———————————————————–

Not printing latency statistics because the client is run without the –simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with –simulate_realtime and set the –chunk_duration_ms to be the same as the server chunk duration

Run time: 0.1486 sec.

Total audio processed: 4.152 sec.

Throughput: 27.9407 RTFX

I have published a short video of all other demo’s for Riva services. Check it out !


The post How I built an AI platform (Riva Speech Services) with RedHat OpenShift, NVIDIA GPU’s, and Dell PowerFlex appeared first on Itzikr's Blog.


Viewing all articles
Browse latest Browse all 509

Trending Articles