Переглядів 2,471
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
Keynote: Accelerating AI Workloads with GPUs in Kubernetes - Kevin Klues, Distinguished Engineer & Sanjay Chatterjee, Engineering Manager, NVIDIA
As AI and machine learning become ubiquitous, GPU acceleration is essential for model training and inference at scale. However, effectively leveraging GPUs in Kubernetes brings challenges around efficiency, configuration, extensibility, and scalability.
This talk provides an overview of the capabilities needed to address these challenges, enabling seamless support for next-generation AI applications on Kubernetes.
- GPU resource-sharing mechanisms such as MPS (Multiple-Process Service), Time-Slicing, MIG (Multi-Instance GPU), and GPU virtualization
- Flexible accelerator configuration using the traditional device plugin and the upcoming Dynamic Resource Allocation (DRA) feature
- Advanced scheduling and resource management techniques, including gang scheduling, topology-awareness, fault-tolerance and more
- Key learnings (and areas of improvement) necessary to scale multi-node AI/ML jobs in large production clusters
Some of these capabilities are already supported today and some of them are not. By addressing the remaining challenges, Kubernetes is poised to emerge as the go-to platform for accelerated AI/ML in the cloud, mirroring Linux's pervasive dominance in the datacenter.