Keynote: Accelerating AI Workloads with GPUs in Kubernetes - Kevin Klues & Sanjay Chatterjee

  Переглядів 2,471

CNCF [Cloud Native Computing Foundation]

CNCF [Cloud Native Computing Foundation]

Місяць тому

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
Keynote: Accelerating AI Workloads with GPUs in Kubernetes - Kevin Klues, Distinguished Engineer & Sanjay Chatterjee, Engineering Manager, NVIDIA
As AI and machine learning become ubiquitous, GPU acceleration is essential for model training and inference at scale. However, effectively leveraging GPUs in Kubernetes brings challenges around efficiency, configuration, extensibility, and scalability.
This talk provides an overview of the capabilities needed to address these challenges, enabling seamless support for next-generation AI applications on Kubernetes.
- GPU resource-sharing mechanisms such as MPS (Multiple-Process Service), Time-Slicing, MIG (Multi-Instance GPU), and GPU virtualization
- Flexible accelerator configuration using the traditional device plugin and the upcoming Dynamic Resource Allocation (DRA) feature
- Advanced scheduling and resource management techniques, including gang scheduling, topology-awareness, fault-tolerance and more
- Key learnings (and areas of improvement) necessary to scale multi-node AI/ML jobs in large production clusters
Some of these capabilities are already supported today and some of them are not. By addressing the remaining challenges, Kubernetes is poised to emerge as the go-to platform for accelerated AI/ML in the cloud, mirroring Linux's pervasive dominance in the datacenter.

КОМЕНТАРІ: 1
@luchen3414
@luchen3414 2 години тому
A perfect overview of GPU with Kubernetes today. Thank you, Kevin and Sanjay.
Why Kubernetes Is Inappropriate for Platforms, and How to Make It Better
35:25
CNCF [Cloud Native Computing Foundation]
Переглядів 3,6 тис.
Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues
47:53
Піхотинець - про рутину на фронті
00:46
Суспільне Новини
Переглядів 1,2 млн
LIVE - Парад Победы в Москве. 9 Мая 2024
2:27:56
AKIpress news
Переглядів 2,2 млн
GPT-4o - Full Breakdown + Bonus Details
18:43
AI Explained
Переглядів 117 тис.
Do NOT Learn Kubernetes Without Knowing These Concepts...
13:01
Travis Media
Переглядів 202 тис.
OpenAI's STUNS with "OMNI" Launch - FULL Breakdown
27:07
Matthew Berman
Переглядів 24 тис.
Sharing Is Caring: GPU Sharing and CDI in Device Plugins - Evan Lezar, NVIDIA & David Porter, Google
40:12
Microsoft AI Tour keynote session by Satya Nadella | February 8, 2024
1:11:15
Microsoft India
Переглядів 96 тис.
Building a GPU cluster for AI
56:20
Lambda Cloud
Переглядів 96 тис.
Keynote: The Cloud Native News Show: AI Breakthroughs Revealed
18:14
CNCF [Cloud Native Computing Foundation]
Переглядів 928
What runs ChatGPT? Inside Microsoft's AI supercomputer | Featuring Mark Russinovich
16:28
Піхотинець - про рутину на фронті
00:46
Суспільне Новини
Переглядів 1,2 млн