Keynote: Accelerating AI Workloads with GPUs in Kubernetes - Kevin Klues & Sanjay Chatterjee

Why Kubernetes Is Inappropriate for Platforms, and How to Make It Better

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues

Піхотинець - про рутину на фронті

КИТАЙСКАЯ ПЕТАРДА детям не игрушка!😂 TG: great_hustle жду тебя там

LIVE - Парад Победы в Москве. 9 Мая 2024

Зачем командирам БМ-13 "Катюша" выдавали презервативы? #shorts

Keynote: Accelerating AI Workloads with GPUs in Kubernetes - Kevin Klues & Sanjay Chatterjee

Переглядів 2,471

CNCF [Cloud Native Computing Foundation]

CNCF [Cloud Native Computing Foundation]

Місяць тому

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
Keynote: Accelerating AI Workloads with GPUs in Kubernetes - Kevin Klues, Distinguished Engineer & Sanjay Chatterjee, Engineering Manager, NVIDIA
As AI and machine learning become ubiquitous, GPU acceleration is essential for model training and inference at scale. However, effectively leveraging GPUs in Kubernetes brings challenges around efficiency, configuration, extensibility, and scalability.
This talk provides an overview of the capabilities needed to address these challenges, enabling seamless support for next-generation AI applications on Kubernetes.
- GPU resource-sharing mechanisms such as MPS (Multiple-Process Service), Time-Slicing, MIG (Multi-Instance GPU), and GPU virtualization
- Flexible accelerator configuration using the traditional device plugin and the upcoming Dynamic Resource Allocation (DRA) feature
- Advanced scheduling and resource management techniques, including gang scheduling, topology-awareness, fault-tolerance and more
- Key learnings (and areas of improvement) necessary to scale multi-node AI/ML jobs in large production clusters
Some of these capabilities are already supported today and some of them are not. By addressing the remaining challenges, Kubernetes is poised to emerge as the go-to platform for accelerated AI/ML in the cloud, mirroring Linux's pervasive dominance in the datacenter.

КОМЕНТАРІ: 1

@luchen3414 2 години тому

A perfect overview of GPU with Kubernetes today. Thank you, Kevin and Sanjay.

Why Kubernetes Is Inappropriate for Platforms, and How to Make It Better

35:25

Why Kubernetes Is Inappropriate for Platforms, and How to Make It Better

CNCF [Cloud Native Computing Foundation]

Переглядів 3,6 тис.

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues

47:53

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues

CNCF [Cloud Native Computing Foundation]

Переглядів 996

Піхотинець - про рутину на фронті

00:46

Піхотинець - про рутину на фронті

Суспільне Новини

Переглядів 1,2 млн

КИТАЙСКАЯ ПЕТАРДА детям не игрушка!😂 TG: great_hustle жду тебя там

00:10

КИТАЙСКАЯ ПЕТАРДА детям не игрушка!😂 TG: great_hustle жду тебя там

МишАня

Переглядів 1,6 млн

LIVE - Парад Победы в Москве. 9 Мая 2024

2:27:56

LIVE - Парад Победы в Москве. 9 Мая 2024

AKIpress news

Переглядів 2,2 млн

Зачем командирам БМ-13 "Катюша" выдавали презервативы? #shorts

00:59

Зачем командирам БМ-13 "Катюша" выдавали презервативы? #shorts

Поле брани

Переглядів 3,7 млн

GPT-4o - Full Breakdown + Bonus Details

18:43

GPT-4o - Full Breakdown + Bonus Details

AI Explained

Переглядів 117 тис.

What is a vector database? Why are they critical infrastructure for #ai #applications?

43:31

What is a vector database? Why are they critical infrastructure for #ai #applications?

Pinecone

Переглядів 12 тис.

Kubernetes on bare-metal: lessons learned, with Mathias Pius | KubeFM podcast

26:31

Kubernetes on bare-metal: lessons learned, with Mathias Pius | KubeFM podcast

KubeFM

Переглядів 546

Do NOT Learn Kubernetes Without Knowing These Concepts...

13:01

Do NOT Learn Kubernetes Without Knowing These Concepts...

Travis Media

Переглядів 202 тис.

OpenAI's STUNS with "OMNI" Launch - FULL Breakdown

27:07

OpenAI's STUNS with "OMNI" Launch - FULL Breakdown

Matthew Berman

Переглядів 24 тис.

Sharing Is Caring: GPU Sharing and CDI in Device Plugins - Evan Lezar, NVIDIA & David Porter, Google

40:12

Sharing Is Caring: GPU Sharing and CDI in Device Plugins - Evan Lezar, NVIDIA & David Porter, Google

CNCF [Cloud Native Computing Foundation]

Переглядів 960

Microsoft AI Tour keynote session by Satya Nadella | February 8, 2024

1:11:15

Microsoft AI Tour keynote session by Satya Nadella | February 8, 2024

Microsoft India

Переглядів 96 тис.

Building a GPU cluster for AI

56:20

Building a GPU cluster for AI

Lambda Cloud

Переглядів 96 тис.

Keynote: The Cloud Native News Show: AI Breakthroughs Revealed

18:14

Keynote: The Cloud Native News Show: AI Breakthroughs Revealed

CNCF [Cloud Native Computing Foundation]

Переглядів 928

What runs ChatGPT? Inside Microsoft's AI supercomputer | Featuring Mark Russinovich

16:28

What runs ChatGPT? Inside Microsoft's AI supercomputer | Featuring Mark Russinovich

Microsoft Mechanics

Переглядів 684 тис.

Піхотинець - про рутину на фронті

00:46

Піхотинець - про рутину на фронті

Суспільне Новини

Переглядів 1,2 млн