A post by Kailas Goliwadekar
In the recent past, I’ve been working on AI/ML with PowerFlex as SDS.
My objective was to build a cloud-native artificial intelligence platform made up of Red Hat OpenShift cluster, NVIDIA GPU’s and PowerFlex. So initially, I built PowerFlex 4.0 platform of 4 SDS. On separate PowerEdge nodes, I built an OpenShift BareMetal cluster with 3 master nodes and 4 worker nodes. The entire process of OpenShift deployment was carried out with assisted installer.
Then PowerFlex CSI is deployed on the OpenShift worker nodes that enables the pods to connect with PowerFlex storage.
The logical architecture of OpenShift on PowerFlex is showcased in below figure.
To carry out speech recognition services from NVIDIA, I had to first install GPU operator from OpenShift console. The NVIDIA GPU Operator makes the underlying GPUs of a compute node available to containerized workloads.
A prerequisite for running the GPU operator is the Node Feature Discovery (NFD) Operator, which detects hardware features and system configuration at a node level. After installing the NFD Operator and creating a NodeFeatureDiscovery instance, we can start with installing the NVIDIA GPU Operator and creating an instance of ClusterPolicy.
To deploy the Riva API I performed the following steps on my OpenShift cluster.
export NGC_CLI_API_KEY=<your NGC API key>
export VERSION_TAG=”2.11.0″
helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-${VERSION_TAG}.tgz –username=’$oauthtoken’ –password=$NGC_CLI_API_KEY tar -xvzf riva-api-${VERSION_TAG}.tgz
In the riva-api folder, I have chosen asr,nlp, and tts to true or false as needed. Also, I changed the service.type from LoadBalancer to ClusterIP. This directly exposes the service only to other services within the cluster.
Enable the cluster to run containers needing NVIDIA GPUs using the nvidia device plugin
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update helm install –generate-name –set failOnInitError=false nvdp/nvidia-device-plugin
[root@ocp411-admin Samples]# oc get pods
NAME READY STATUS RESTARTS AGE
nvidia-device-plugin-1683609115-2rh8w 1/1 Running 0 3d6h
nvidia-device-plugin-1683609115-9x9tg 1/1 Running 0 3d6h
nvidia-device-plugin-1683609115-gm642 1/1 Running 0 3d6h
nvidia-device-plugin-1683609115-gpmtm 1/1 Running 0 3d6h
Install the Riva Helm Chart. You can explicitly override variables from the values.yaml file, such as the riva.speechServices.[asr,nlp,tts] settings.
helm install riva-api riva-api/ \
–set ngcCredentials.password=`echo -n $NGC_CLI_API_KEY | base64 -w0` \
–set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` \
–set riva.speechServices.asr=true \
–set riva.speechServices.nlp=true \
–set riva.speechServices.tts=true
The Helm chart runs two containers in order: a riva-model-init container that downloads and deploys the models, followed by a riva-speech-api container to start the speech service API. Depending on the number of models, the initial model deployment could take an hour or more. To monitor the deployment, use kubectl to describe the riva-api pod and to watch the container logs.
export pod=`kubectl get pods | cut -d ” ” -f 1 | grep riva-api`
kubectl describe pod $pod
kubectl logs -f $pod -c riva-model-init
kubectl logs -f $pod -c riva-speech-api
Since the Riva service is running now, the cluster needs a mechanism to route requests into Riva. So deploy the open source Traefik edge router.
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
helm fetch traefik/traefik
tar -zxvf traefik-*.tgz
Modify the traefik/values.yaml file. Change service.type from LoadBalancer to ClusterIP. This exposes the service on a cluster-internal IP. Now Deploy the modified traefik Helm chart.
helm install traefik traefik/
An IngressRoute enables the Traefik load balancer to recognize incoming requests and distribute them across multiple riva-api services. When you deployed the traefik Helm chart above, Kubernetes automatically created a local DNS entry for that service: traefik.default.svc.cluster.local. The IngressRoute definition below matches these DNS entries and directs requests to the riva-api service. Create the following riva-ingress.yaml file:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: riva-ingressroute
spec:
entryPoints:
– web
routes:
– match: “Host(`traefik.default.svc.cluster.local`)”
kind: Rule
services:
– name: riva-api
port: 50051
scheme: h2c
Deploy the IngressRoute.
kubectl apply -f riva-ingress.yaml
The Riva service is now able to serve gRPC requests from within the cluster at the address traefik.default.svc.cluster.local.
Riva provides a container with a set of pre-built sample clients to test the Riva services.
Create the client-deployment.yaml file that defines the deployment and contains the following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: riva-client
labels:
app: “rivaasrclient”
spec:
replicas: 1
selector:
matchLabels:
app: “rivaasrclient”
template:
metadata:
labels:
app: “rivaasrclient”
spec:
nodeSelector:
eks.amazonaws.com/nodegroup: cpu-linux-clients
imagePullSecrets:
– name: imagepullsecret
containers:
– name: riva-client
image: “nvcr.io/nvidia/riva/riva-speech-client:2.11.0”
command: [“/bin/bash”]
args: [“-c”, “while true; do sleep 5; done”]
Deploy the client service.
kubectl apply -f client-deployment.yaml
export cpod=`kubectl get pods | cut -d ” ” -f 1 | grep riva-client`
kubectl exec –stdin –tty $cpod /bin/bash
[root@ocp411-admin Riva]# oc get all
NAME READY STATUS RESTARTS AGE
pod/nvidia-device-plugin-1683609115-2rh8w 1/1 Running 0 3d7h
pod/nvidia-device-plugin-1683609115-9x9tg 1/1 Running 0 3d7h
pod/nvidia-device-plugin-1683609115-gm642 1/1 Running 0 3d7h
pod/nvidia-device-plugin-1683609115-gpmtm 1/1 Running 0 3d7h
pod/riva-client-668dd7594b-cr68q 1/1 Running 0 2d7h
pod/riva-riva-api-7d5b75687b-4t6kn 1/1 Running 0 2d6h
pod/traefik-6fbf57555d-xw82v 1/1 Running 0 2d8h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP 4d6h
service/openshift ExternalName <none> kubernetes.default.svc.cluster.local <none> 4d6h
service/riva-riva-api ClusterIP 172.30.6.226 <none> 8000/TCP,8001/TCP,8002/TCP,50051/TCP 2d6h
service/traefik ClusterIP 172.30.246.217 <none> 80/TCP,443/TCP 2d8h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nvidia-device-plugin-1683609115 4 4 4 4 4 <none> 3d7h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/riva-client 1/1 1 1 2d7h
deployment.apps/riva-riva-api 1/1 1 1 2d6h
deployment.apps/traefik 1/1 1 1 2d8h
NAME DESIRED CURRENT READY AGE
replicaset.apps/riva-client-668dd7594b 1 1 1 2d7h
replicaset.apps/riva-riva-api-7d5b75687b 1 1 1 2d6h
replicaset.apps/traefik-6fbf57555d 1 1 1 2d8h
[root@ocp411-admin Riva]#
Let’s jump on to the demo. First login to the riva-client pod and carry out Riva ASR and Riva TTS tests
[root@ocp411-admin Riva]# kubectl exec –stdin –tty $cpod /bin/bash
root@riva-client-668dd7594b-cr68q:/opt/riva# riva_streaming_asr_client \
> –audio_file=wav/en-US_sample.wav \
> –automatic_punctuation=true \
> –riva_uri=traefik.default.svc.cluster.local:80
I0512 13:00:14.664886 47228 riva_streaming_asr_client.cc:150] Using Insecure Server Credentials
Loading eval dataset…
filename: /opt/riva/wav/en-US_sample.wav
Done loading 1 files
what
what
what is
what is
what is
what is now
what is natural
what is natural
what is natural language
what is natural language
what is natural language
what is natural language
what is natural language Processing
what is natural language Processing
what is natural language Processing
what is natural language Processing
what is natural language Processing
what is language Processing
what is language Processing
What is Natural Language Processing?
———————————————————–
File: /opt/riva/wav/en-US_sample.wav
Final transcripts:
0 : What is Natural Language Processing?
Timestamps:
Word Start (ms) End (ms)
What 840 880
is 1160 1200
Natural 1800 2080
Language 2200 2520
Processing? 2720 3200
Audio processed: 4 sec.
———————————————————–
Not printing latency statistics because the client is run without the –simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with –simulate_realtime and set the –chunk_duration_ms to be the same as the server chunk duration
Run time: 0.1486 sec.
Total audio processed: 4.152 sec.
Throughput: 27.9407 RTFX
I have published a short video of all other demo’s for Riva services. Check it out !
The post How I built an AI platform (Riva Speech Services) with RedHat OpenShift, NVIDIA GPU’s, and Dell PowerFlex appeared first on Itzikr's Blog.