About Customer
The customer is a large multi-national medical diagnostic equipment company.
The Requirement
The customer is a large multinational who is a leading manufacturer of medical scanning and diagnostic equipment. The customer has a development center in Bengaluru. The customer used vGPU enabled VMs for test/dev activities running ML workloads.
The customer used a time consuming manual process which required roughly about half a day for an IT engineer to provision and deliver the VM with required prerequisites. The customer wanted to automate this process and deliver it as a blueprint to its business groups.
Challenges
The solution required custom vRealize Automation blueprint integrated with vRealize Orchestrator workflows for deploying vGPU enabled VM. In context of this some of the key challenges to be overcome were:
- Special and expensive hardware requirement
- vGPU Domain Knowledge
- Multiple hardware models for Nvidia Grid GPU cards
- Each physical host was configured 4 Nvidia Grid vGPU cards
- Each Nvidia Grid can be configured with different vGPU profiles
- T-Shirt sizing of vGPU enabled VMs for CPU & Memory
- Limited native API support for querying vGPU information and capacity utilization
- DRS clusters in vSphere 6.7 does not support vGPU enabled VMs
- Customer wanted to maximize capacity by consolidating similar vGPU profiles on a single Grid card on a single physical host
- Customer wanted initial placement load balancing of vGPU enabled VMs across the cluster
- Solution must support future releases of Nvidia Grid vGPU cards
Solution
The solution was to design a custom vRealize Orchestrator workflow that would be triggered via a vRealize Automation blueprint. Based on the CPU, vGPU profile, Memory size and number of instances selected & workflow would select a suitable host for VM placement. Post every subsequent deployment, the workflow would ensure the placement is load balanced across the cluster and appropriate vGPU profile is allocated to the VM.
Benefits
All the identified challenges were addressed by the solution. The solution was much appreciated by the customer for it ease of use and management. All configuration management was achieved via JSON based configuration files.
This automated solution with helped reduce the deployment time from 4 hours to 1 hour. More importantly now this entire activity could be triggered and requested by a business user without IT team’s involvement. Tracking of workloads has become easy to reclaim unused capacity.