Deep-dive into the AI Hardware of ChatGPT

День тому

With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnor...
Or use code highyieldnordpass at checkout.
Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass: www.nordpass.com/highyieldbus... with code highyieldbusiness in the form.
....
What hardware was used to train ChatGPT and what does it take to keep it running? In this video we will take a look at the AI hardware behind ChatGPT and figure out how Microsoft & OpenAI use machine learning and Nvidia GPUs to create advanced neural networks.
Support me on Patreon: www.patreon.com/user?u=46978634
Follow me on Twitter: / highyieldyt
Links
The Google research paper that changed everything: arxiv.org/abs/1706.03762
The OpenAI research paper confirming Nvidia V100 GPUs: arxiv.org/abs/2005.14165
0:00 Intro
0:28 AI Training & Inference
2:22 Microsoft & OpenAI Supercomputer
4:08 NordPass
5:57 Nvidia Volta & GPT-3
9:05 Nvidia Ampere & ChatGPT
13:23 GPT-3 & ChatGPT Training Hardware
14:41 Cost of running ChatGPT / Inference Hardware
16:06 Nvidia Hopper / Next-gen AI Hardware
17:58 How Hardware dictates the Future of AI

КОМЕНТАРІ: 400

@HighYield Рік тому

With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnordpass Or, use code highyieldnordpass at the checkout. Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass at www.nordpass.com/highyieldbusiness with code highyieldbusiness in the form.

@fitybux4664 Рік тому

Are we all speaking to the same ChatGPT? What if OpenAI trained a sub-model with fewer parameters if users don't ask complicated questions or only for a certain subject? Then they could maybe run inference cheaper by using a smaller model. Maybe this could be detected after the first question or two that ChatGPT is asked? If I had a bill of a million dollars a day, these sorts of optimizations would definitely make sense!

@GermanMythbuster Рік тому

Who the F*** listens to 2 Min. of Ads?! Skipped that crap. Any more than 20 sec. and nobody cares any more. Don't know if Nord had this stupid idea or you to make the ad so long but who ever it was it is a f***ing stupid idea!

@akselrasmussen3386 Рік тому

I prefer Avira PWM

@memerified Рік тому

@wiredmind Рік тому

I think in a few years, AI accelerator cards will be the next video cards. A race to the top for the most powerful accelerator to be able to train and run AI locally on our own PCs, bypassing the need to pay for filtered models from large companies. Once people can run this kind of thing independently, that's when things will start getting _really_ exciting.

@mishanya1162 Рік тому

Its not about running AI locally. Todays gpus (40 series) mostly gain fps from AI upscaling. Frame generation and so. So, who makes better AI and soft - will win

@mrmaniac9905 Рік тому

training a model is a very involved process.. I could see consumer cards that can run pre-trained sets, but the average consumer will not be training on their own

@Amipotsophspond Рік тому

a break threw is coming out that will enable black boxes to be in part of the networks, this will enable to build networks in small parts that go in big networks rather then training it all at once and getting rid of it for the next unrelated network. it will not be a service forever that's just to trick WEF/CCPchina money in to giving them start up money.

@ralphclark Рік тому

This will require someone to create an open source ontology that you can use to preload your AI as a starting point. Training one completely from bare metal with only your own input will be beyond everyone except large corporations with deep pockets.

@Tomas81623 Рік тому

I don't think people in their home will really have a need to run large models as they would still be very expensive and other than privacy, little advantage. On the other hand, I can definitely see business having a need for them, multiple.

@matthewhayes7671 8 місяців тому

I'm a newer subscriber, working my way back through your recent videos. I just want to tell you that I think this is the best tech channel on UKposts right now, hands down. You are a wonderful teacher, you take frequent breaks to summarize key points, you provide ample context and visual aids, and when you do make personal guesses or offer your own opinions, it's always done in a transparent and logical manner. Thank you so much, and keep up the amazing work. I'll be here for every upload going forward.

@guyharris8007 Рік тому

Dev here... gotta say I love it. Thoroughly enjoyable thank you for your time!

@kaystephan2610 Рік тому

I find the increase in cumpute performance incredible. As shown at 7:38 the GV100 from the beginning of 2018 had 125 TFLOPS of FP16 Tensor core compute performance. The current generation of Enterprise AI accelerators from NVIDIA are the NVIDIA H100. And the H100 provides up to 1979 TFLOPS of FP16 Tensor core compute Performance. And the Fp32 and FP64 Tensor Core performance has also obviously increased massively. Within ~5 years the raw compute performance of Tensor Cores has increased by around 16x. What previouslym required 10,000 GPUs could now be done with ~632.

@klaudialustig3259 Рік тому

Great video! I'd like to add one thing: in the segment starting at 16:06 where you talk about the Nvidia Hopper H100, in the context of Neural Networks the most important number to compare to the previous A100 should be the memory. As far as I know, as long as there is *some* kind of matrix multiplication acceleration, it doesn't matter much how fast it is. Memory bandwidth becomes the major bottleneck again. I looked it up and found the number of 3TB/s, which would be 50% higher than the A100 80GB-version. I wonder where the number of 4.9TB/s shown in the video at 18:50 comes from. It seems unrealistically high to me. Nvidia's marketing does not like to admit this. They like to instead compare other numbers, where they can claim some 10x or 20x or 30x improvement.

@klaudialustig3259 Рік тому

They call that 4.9TB/s "total external bandwidth" and I think they get it by adding the 3TB/s HMB3 memory bandwidth, plus 0.9TB/s NVLink bandwidth, plus something else? Also I have seen Nvidia claim that H100 has 2x higher memory bandwidth than A100. Note that this only when comparing it to the A100 40GB-version, not the 80BG-version.

@JohnDlugosz Рік тому

I recall the Intel roadmap showing the next big thing is being able to put or use memory resources in different places. The PCIe will be so fast that you don't have to put all the RAM on the accelerator card. You'll be able to use system RAM or special RAM cards, and thus easily expand the RAM as needed.

@Fractal_32 Рік тому

I just saw your post on the community page, I wish UKposts would have notified me when the video was posted instead of pushing the post without the video. I cannot wait to see what is talked about in the video! Edit: This was great, I’m definitely sharing it with some friends, keep up the great work!

@garyb7193 Рік тому

Great video! Hopefully it will put things into perspective, that Nvidia, Intel, and AMD's world does not revolve around graphic card sales and squeezing the most performance out of Cyberpunk. Hundreds of millions of dollars are a stake in areas much more lucrative than $500 CPUs or $800 videocards. They must meet the demands of all their various customers as well as investors and stockholders too. Thanks!

@marka5968 Рік тому

This is the single whale > millions of peons attitude that is in video game now. Video games are not designed for fun but for grind and feeding the whales. Apparently video cards are going to be designed that way as well. I don't know how tens of millions of customers are lower priority than some big whale for a tech that hasn't made a red cent yet and costs millions per day to run. Certainly data center makes nVidia more money than gamers buying cards but I don't know that all works out. nVidia is no. 8 on most valuable companies in the world and I guess selling GPUs once every 3-4 years to games isn't going to make that much revenue to be no.8 in the most valuable company in the world. These numbers don't seem to make sense in my mind.

@garyb7193 Рік тому

@@marka5968 Okay?!

@jimatperfromix2759 Рік тому

In its last quarter results, although AMD is improving market share in its consumer divisions (CPUs and gamer GPUs), it took a slight loss on consumer products. Partly that's the recession coming in for an ugly landing. Good thing for consumers, though, is that AMD is using its massive profits in servers and AI (plus some new profits in the embedded area via its recent purchase of Xilinx) to "support its addiction to making good hardware for the computer/gamer retail market. By the way, one of its next-gen laptop APU models not only contains an integrated GPU that rivals the low-end of the discrete GPU market, but also contains a built-in AI engine (thanks to the Xilinx people). So you can get its highest-end laptop CPU/APU chip (meant for the big boys like Lenovo/HP/Dell/Asus/Acer et al. to integrate into a gemer laptop along with a discrete GPU from AMD or NVidia (or even Intel)), or its 2nd-from-the-top series of laptop CPU/APU chip (the one described above that already has a pretty darn good integrated GPU plus an AI engine (think: compete with Apple M1/M2)), or one of a number of slower series of CPU/APU (that are meant for more economy laptops, and mostly just faster versions of older chips that have been redone on faster silicon to fill that market segment at a cheaper cost). Think of the top-two tiers of laptops built on the new AMD laptop chips as each being about 1/10,000th of the machine they trained ChatGPT on - sort-of. By the way, did I mention you can do AI and Machine Learning on your laptop, starting about *next month*.

@snailone6358 Рік тому

800$ video cards. We’re past that point for a while now

@emkey6823 Рік тому

...and we were told btc was consuming too much of that co2 power, and because of it the cards went so expensive. I think the bad people in cHARGE did make those models using AI for their profits which worked out pretty well for them. Let's keep our eye on that and give the AI they created some good vibes individually

@marka5968 Рік тому

Great and very informative video, sir. I remember watching your very early videos and thought it was bit meh. But, this is absolutely world class stuff and happy to listen to such a sharp and insightful mind.

@HighYield Рік тому

Thanks, comments like this really mean a lot to me!

@SteveAbrahall Рік тому

Thanks for the tech background on what it's running from a hardware angle. The interesting thing is I think when some one comes up with a hunk of code that saves billions of hours of computational power. That disruptive type of thing from a software angle. It is an amazing time to live. Thanks for all your hard work, and an interesting vid!

@og_jakey Рік тому

Fantastic presentation. Appreciate your pragmatic and reasonable research, impressive work. Thank you!

@KiraSlith Рік тому

Been beating my head against the Cost vs Scale issue of building my own AI compute rig for training models at home, and this gave me a better idea of what kind of hardware I'll need long-term by looking at what the current bleeding edge looks like. Thanks for doing all the research work on this one!

@Transcend_Naija 8 місяців тому

Hello, how did it go?

@KiraSlith 8 місяців тому

@@Transcend_Naija I ended up just starting cheapish for my personal rig, a T7820 with a pair of 2080tis on an NVlink bridge. I couldn't argue the V100s just yet, and the P40s I was using at work lacked the grunt for particularly large LLM (works fine for large-scale oject recognition though).

@newmonengineering Рік тому

I have been an OpenAI beta member for 3 years now. It has only become better over the years. I wonder what it will look like in 5 years.

@EmaManfred Рік тому

Good job here sir! Mind if you did a quick breakdown of language model like Bluewillow that also utilizes diffusion?

@kanubeenderman Рік тому

MS will for sure use its Azure based cloud system for hosting its ChatGPT, so that they can load balance the demand, and be able to scale out to more VM's and instances if needed to meet demand, and to increase resources on any individual instance if needed. That would be the best use of that set up and provide the best user experience. So basically, the hardware specifics would be whatever servers are running in the 'farms'. I doubt if they will have separate and specific hardware set aside just for ChatGPT as it would run like any other service out there.

@alexcrisara4902 Рік тому

Great video! Curious what you use to generate graphics / screenshot animations for your videos?

@novadea1643 Рік тому

Logically the inference costs should scale pretty linearly to the amount of users since it's pretty much a fixed amount of computation and data transfer, or can you elaborate why the requirements would scale exponentially as you state at @15:40?

@HighYield Рік тому

The most commented question :D I meant if the amount of users increases exponentially, so does the amount of inference computation. I now realize it want very clear. Plus, scaling is a big problem for AI, but thats not what I meant.

@AgentSmith911 Рік тому

10:14 is so funny because "what hardware are you running on?" is one of the first questions I asked that bot 😀

@vmooreg Рік тому

Thank you for this! I’ve been looking around for this content. Great work!!👍🏼

@theminer49erz Рік тому

Fantastic!! First of all, I cpuld be wrong, but I don't remember you having an in video sponsor before. Either way, that I awesome!! I'm glad you are getting the recognition you deserve!! You must have done a lot of work to get these numbers and configurations. Very interesting stuff! I am looking forward to AI splitting off from GPUs too. Especially with the demand for them going up as investment in AI grows. I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them. Plus I'm sure they will do a much better job. The design, at least in regards to Nvidia because of it is quite annoying too. Tensor cores for example were mainly put there for AI and Mining use, the marketing of them for upscaling and the cost added for a gamer to use it is kinda ridiculous. If you have a lower end card with them where you wpuld benifiet from the upscaling, you could probably buy a card without them that wouldnt need to upscale. It seems to me that their existence is almost the cause for their need in that use case. I don't know how much of the cost of the card is just for them, but I imagine it's probably around 20-30% maybe?? IDK, just thinking "aloud". Anyway, thanks again for the hard work and please let us know when you get a Patreon account!! I would be proud to sponsor you as well!! Cheers!!

@brodriguez11000 Рік тому

" I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them." Blame cryptocurrency for that. Otherwise those non-gaming sectors are what's keeping the lights on and driving the R&D that gamers enjoy the fruits of.

@backToFreedom Рік тому

Thank you very much for bringing this kind of information. Even chatgpt is unware about the hardware is running on!

@Tomjones12345 Рік тому

i was wondering how far off running inference is on a local machine. Or could a more focused model (one language, specific subjects/sites) run on today's hardware?

@frizzel4 Рік тому

Congrats on the sponsorship!! Been watching since you had 1k subs

@hendrikw4104 Рік тому

There are interesting approaches like LLaMA, which focus on inference efficiency over training efficiency. These could also help to bring down inference costs to a reasonable level.

@alb.1911 Рік тому

Do you have any idea why they are back to Intel CPU for the NVIDIA DGX H100 hardware?

@markvietti Рік тому

could you do a video on memory cooling..Seems most of the video card manufactures don't cool the memory. some do . why is that?

@petevenuti7355 Рік тому

Out of curiosity and considering my current hardware, How many orders of magnitude slower, would a neural network of this magnitude, run on a simple CPU and virtual memory?

@Embassy_of_Jupiter Рік тому

Kind of mind blowing that we can already run something very similar on a MacBook. The progress in AI is insane and it hasn't even started to self-improve, it's just humans that are that fast.

@tee6942 Рік тому

What a valuable information👌🏻 thank you for sharing, and keep up the good work

@Noobificado Рік тому

Some time around 1994, the era of search engines started. And now, the era of free access to general purpose Artifical intelligence, is becoming a reality in front of our eyes. What a time to be alive.

@josephalley Рік тому

Great to see this video well. I loved your m2 chip video breakdowns ages ago

@Speak_Out_and_Remove_All_Doubt Рік тому

A super interesting video, really well explained too, thanks for all your hard work. It's always impressive how Nvidia seems to always be playing the long game with it's hardware development and as you mention I can't wait to see what Jim Keller comes up with at Tenstorrent because I can't think of a job he's had where he hasn't changed the face of computing with what he helps develop. I just wish Intel had backed him more and done whatever was needed to keep him for a little longer and we would maybe be in a very different Intel landscape right now.

@AjitMD Рік тому

Jim Keller does not stay at a company for very long. Once he creates a new product, he moves on. Hopefully he gets paid well for all his contributions.

@Speak_Out_and_Remove_All_Doubt Рік тому

@@AjitMD I think more accurately, what he does is stay until he's achieved what he set out to achieve and then wants a fresh challenge, he didn't get to do that at Intel. He was essentially forced out or put into a position that he was not comfortable with so chose to leave but he still had huge amounts of unfinished work left to do at Intel plus becoming the CEO or at least head of the CPU division would have been that fresh new challenge for him.

@zerodefcts Рік тому

I remember when I was growing up, I thought to myself...geez...it would have been great to live in the past, as there were so many undiscovered things that I could have figured out. Grown up, I have been working in AI for the past 7 years, and looking at this very point in time I can't help but think reflect on that moment...geez...there is just so much opportunity for new discovery.

@HighYield Рік тому

I'm honestly excited to see what's coming next. If we use it to improve our lives, it will be amazing.

@StrumVogel Рік тому

We have 8 of those at the Apple data center I worked at. NVidia cheaped out on the CMOS bracket, and it’ll always crack. You’ll have to warranty work the whole board to fix it.

@dorinxtg Рік тому

Thanks for the video I was looking at the images you created with the GPU specs, and I'm not sure if your numbers are correct. Just for comparison, I checked the numbers in TechpowerUp GPU DB. So if we'll look at GH100, for example, you mention 1000 TFlops (FP16) and 50 TFlops (FP32) On TechPowerUP GPU DB, an H100 (I checked both the SXM5 and PCIe versions) the numbers are totally different: 267 TFlops (FP16) and ~67 TFlops (FP32).

@HighYield Рік тому

TechPowerUp doesn’t show the Tensor core FLOPS. If you look up the H100 specs at Nvidia you can see the full performance.

@dorinxtg Рік тому

@@HighYield I see. Ok, thanks ;)

@n8works Рік тому

15:30 You say that the inference hardware must scale exponentially, but that must be hyperbole right? At the very most it's 1 to 1 and I'm sure there are creative ways to multiplex. The interesting thing to see would be transactions/sec for a single cluster instance.

@HighYield Рік тому

I meant in relation to its users. If users increase exponentially, so do the hardware requirements. Since you are like comment no 5 about this I realize I should have said it differently. A point that could play into that question is as scaling, but that’s not what I was talking about.

@n8works Рік тому

@@HighYield ahh. I understand what you were saying, yes. It scales with users in some way. The more users the more hardware in some degree

@senju2024 Рік тому

Very very good video. SUBSCRIBE. Reason. You did not talk about hype. You explain tech concepts based on AI. I knew about Nvidia hardware running chatGPT but not the details. Thank you.

@boronat1 Рік тому

wondering if we could run a software that uses your gpu to give power to ai network? like we do with crypto mining

@OEFarredondo Рік тому

Mad love bro. Thanks for the vid

@stefanbuscaylet Рік тому

Does anyone have any references on how big the storage required for this was? Was it a zillion SSDs or was it all stored on HDDs?

@dougchampion8084 Рік тому

The training data itself is pretty small in relation to the compute required to process it. Text is tiny.

@stefanbuscaylet Рік тому

@@dougchampion8084 I feel like that is over simplifying things. When there are over 10K cores distributed over a large network and the training data is “all Wikipedia and tons of other data” there has to be quite a bit of disaggregated storage for that along with every node seems to have some local/fast NAND SSD storage. As far as I can tell they mostly use the CPUs to orchestrate and feed the data to the GPUs and the GPUs then feed the data back to the CPUs to be pushed to storage. Be nice if someone just mapped this all out along with capacity and bandwidth needs.

@THE-X-Force Рік тому

Excellent excellent video! (edit to ask: at 19:25 .. "In-Network Compute" of ... *_INFINITY_* ... ? Can anyone explain that to me, please?)

@MemesnShet Рік тому

And to think now anyone can run a GPT 3.5 Turbo-like AI on their local computer without the need for crazy specs is just incredible Stanford Alpaca and GPT4All are some models that achieve it

@zahir2942 Рік тому

Was cool working on these servers

@TheFrenchPlayer Рік тому

🤨

@prodromosregalides3402 Рік тому

15:19 That's not a problem at all. Even if only 10 millions out of 100 are using OpenAI servers at every single time, that's 29000/10000000 gpus, or 0.0029 gpus per user. Probably less. So instead of them running the model , the end-users could , easily on their machines. Bloody hell , even small communities of few thousands of people could train their own AIs on their machines, soon to be a much smaller number. Few major problems with that. They lose much control on their product. They haven't figured out , yet , the details of monetizing these models, so they are restricted to running them in their own servers instead. Third major problem for Nvidia , it will be forced to return to the gaming gpus , their rightful capabilities , from which they were stripped back in 2008. This would mean no lucrative sales of the same hardware (with some tweaks ) to corporations and instead rely on massive sales of cheaper units. And last but not least , an end-user, gamer or not, will be able to acquire much more compute power with their 1000-3000 dollar purchases. Because , now a pc may be sporting the same cpus and gpus, but difference is gpus will be unlocked to their full computing potential . We are talking about many tens to hundreds of teraflops performance available now for the end-user to do useful work. Anf how will the mobile sector compare to this? Due to the fact that it runs on lower power budgets there is no way it could compete with fully-fledged pcs. Many will start forgetting buying a new smartphone especially the flagship-ones ; in fact the very thought of spending to buy sth that is an order of magnitude less compute-capable would be hugely annoying. Now, that I am thinking of it , losing control , worries them much more than anything else. And it would not only be control lost on a corporate level , but on a much much higher level. Right now , top heads at computer companies and possibly state planners must have shit their pants, because of what is seeminlgly an unavoidable surrendering of power from power centers to the citizens. To paraphrase Putin "Whoever becomes the leader in this sphere will not only forget about ruling this world, but lose the power he/she already has" All top leaders got this all thing wrong. This is not necessarily a bad thing.

@hxt21 Рік тому

I want to say thank you very much for a really good video with good information.

@jagadeesh_damarasingu Рік тому

when thinking about huge capital costs involving in setting up AI hardware farm, Is it possible to take advantage of shared computing power of public peer network like we are doing now with blockchain nodes ?

@jagadeesh_damarasingu Рік тому

also isn't Intel in race with NVIDIA and AMD?

@genstian 6 місяців тому

We do run into lots of problems where general AI models isn't good, the future is to make a new submodel that can specifiacally solve specific tasks or just add weights to general models, but such a version of chatgpt would probably require 1000xbetter hardware.

@memejeff Рік тому

I asked gpt3 half a year ago what it was running on. I kept asking more leading questions. I was able to get to a point where it said that it used specific mid range FPGA accelerators that retail between 4000-6000 dollars. The high end fpga are connected by pcie and the lower end use high speed uart. The servers used a lot of K series gpu's too.

@jordanrodrigues1279 Рік тому

The specs aren't in the training dataset, there's no way for it to have that information; it's like asking it to give you my passwords. Or in other words you just told yourself what you wanted to hear with extra steps.

@FakhriESurya 6 місяців тому

makes me giggle a bit when the "Most Replayed" part is the part where the ads ends

@HighYield 6 місяців тому

That's normal, most ppl skip over the add or try to jump to a point right after it. On the chart it looks like a hole :D

@BGTech1 Рік тому

Great video I was wondering about this

@karlkastor Рік тому

15:50 Now with GPT-3.5 Turbo they have decreased the cost 10 times, but likely not with new hardware, but with an improved, discretized and/or pruned model.

@HighYield Рік тому

That sounds super interesting! Do you have any further links for me to read up on this?

@miroslawkaras7710 Рік тому

Does quantum coputer could be used for AI training?

@elonwong Рік тому

From what I understood from chatgpt, it’s a strip down version of gpt3. Where its hardware requirement and model size are massively cut down. It’s a lot lighter running the model compared to gpt3. Itself even said chatgpt can even be ran on a high end pc.

@HighYield Рік тому

That’s what I gathered to. IMHO it’s also using a lot less parameters, but since there is nothing official I’m rather careful with my claims.

@legion1791 Рік тому

Cool I was exactly wanting to know that!

@shrapnel95 Рік тому

I find this video funny in that I've asked ChatGPT about what kind of hardware it runs on; never got straightforward answer and it kept running me around loops lol

@HighYield Рік тому

I've noticed exactly the same thing, that's where I got the idea for this video from!

@glenyoung1809 Рік тому

I wonder how fast ChatGPT would have trained on a Cerebras CS-2 system with their Wafer scale 2 architecture?

@user-zk4xq3mn2q 5 місяців тому

Really good video, and lots of efforts! Thanks man!

@legion1791 Рік тому

I would love to have a local and unlocked offline chatGPT

@endike Рік тому

me too :)

@TMinusRecords Рік тому

15:39 Exponentially? How? Why not linearly

@robinpage2730 Рік тому

How powerful would a model be that could be trained on a gaming laptop rtx 1650 ti? How about a natural language compiler, that translates English input into executable machine code like GCC?

@mrpicky1868 Рік тому

what are the most advanced models like megatron are actually doing? anyone knows?

@davocc2405 Рік тому

I can see a rise in private clouds particularly within government at least on a national level. The utilisation of the system itself may give away sensitive information to other nations or even corporations that may have competing self interests so a few of these systems may pop up in the UK, Australia, several in the US and probably Canada to start with (presuming each European nation may have one or two as well). Whosoever develops a homogenised and consistent build for such a system will be suddenly in demand with competing urgency.

@zyxwvutsrqponmlkh Рік тому

For GAI we really need to be perpetually training during inference. You want this stuff to run cheaply open source it, folk have gotten Llama to run on an RPI.

@theminer49erz Рік тому

Yay!! Happy day! Been looking forward to this! Thanks!

@drmonkeys852 Рік тому

My friend is actually already training the smallest version of GPT on 2 A100s for his project in our ML course

@HighYield Рік тому

That's really interesting, I wonder how much time the training takes. And having access to 2x A100 GPUs is also nice!

@drmonkeys852 Рік тому

@@HighYield Yea it's from our uni. We still have to pay for time on it unfortunately but it's pretty cheap still. He estimates it'll cost around 30$, which is not bad

@gopro3365 Рік тому

@@drmonkeys852 $30 for how many hours

@simonlyons5681 Рік тому

I am interested to know the hardware requirements to run inference for a single user. What do you think?

@HighYield Рік тому

I think its VRAM bound. So you might still need a full Nvidia DGX/HGX A100 server, but not because of the raw computing power, but because of the VRAM capacity. Maybe 4x A100 GPUs would work too, depending on how much smaller ChatGPT is compared to GPT-3. It's really hard to say since we don't have official numbers.

@nannesoar Рік тому

This is the type of video im thankful to be watching

@HighYield Рік тому

I’m thankful you are watching :)

@omer8519 Рік тому

Anyone got chills when they heard the name megatron?

@olafschermann1592 Рік тому

Great research and presentation

@gab882 Рік тому

Would be both amazing and scary if AI neural networks run on quantum computers or other advanced computers in the future

@Alexander_l322 Рік тому

I literally saw this on South Park yesterday and now it’s recommended to me on UKposts

@virajsheth8417 Рік тому

Really insightful video. Really appreciate.

@DSTechMedia Рік тому

AMD made a smart move in acquiring Xillinix, and it mostly went unnoticed at the time. But it could pay off heavy in the long run.

@ZweiBein Рік тому

Good and informative video, thanks a lot!

@garydunken7934 Рік тому

Nice one. Well presented.

@timokreuzer1820 Рік тому

Question: if INT8 tensor cores are useful for AI. why doesn't everyone use it? Shouldn't an 8 INT8 mul-add operation require much less transistors than a 16 bit FP mul-add? I know FP multiplications are "cheap" compared to same-size integer, but addition adds quite some overhead.

@0xEmmy Рік тому

int8 is fast with the right hardware, but this comes at the cost of precision. A 16-bit number can have up to 65536 different values, while 8 bits only gets 256 different values. Some applications need more precision than others, and while many AI applications can handle int8, we can't assume that everything can. If your AI outputs a 32-bit image, for instance, at least some of the AI needs to run in 32 bits (which, if it's even possible, is highly wasteful on 8-bit hardware). Further, there's a very big difference between float and int. A float handles really large and small numbers just fine, while int's get stuck rounding to 0 or overflowing. If you're multiplying two really large or really small numbers, this becomes a major problem fast. If you need more than 3 decimal places, you can't use int8. Even moving up to 16 bits, you only get 5 decimal places, and 32 bits only gets you 9 - to get more, you have to use floats (e.x. IEEE 754 standard 32-bit floats have 256 decimal places). Adds aren't quite as complex as multiplies, so I wouldn't worry about them too much. One also needs to actually have the right hardware. If your GPU is designed for 16 bits, but you can only use 8, you're not gonna double performance - you're gonna end up ignoring half your GPU. 8-bit hardware is readily available in some contexts (especially internet services), but if you're writing code that runs entirely offline on existing devices, you're not gonna ignore half the machine unless you have a genuine reason. And if you need the same hardware to do multiple things, this gets even more complicated. If you're a price-conscious end-user, you probably don't want to pay for extra hardware unless you personally use it. If you're an investor, you don't want to buy specialized equipment unless you're absolutely sure it's the right tool for the job.

@sa1t938 Рік тому

int8 and int4 aren't good enough for training (at least right now). Training involves nudging all the values slightly in the right direction over and over and over again. You can't do as precise nudges as you need to with int4 and int8. However, thats just for training. During inference of the model, you aren't changing any of the values so you don't need the same precision, hence you can get away with int8. However, until recently, int4 would just fall apart most of the time. However, theres a new quantization technique called GPTQ which makes int4 have almost the same performance as fp16. As to why its not used for inference, some things need the extra precision. For example, Stable Diffusion kind of falls apart at int8 from what I've heard. I'm actually training a stable diffusion model myself, and I had to use fp32 while training because fp16 just ended up not being enough. Language models usually do fine with less precision. As to why not all language models are being ran at int4 or int8, probably because "if it ain' broke, don't fix it"

@willemvdk4886 Рік тому

Of course the hardware and infrastructure behind this application is interesting, but what I find even more interesting is how this is done in software. How are alle these GPU's clustered? How is the workload actually divided and balanced? How do they maximize performance during training? And how in the world is the same model used by thousands of GPU's to server the inferencing by many, many users simultaneously? That's mindboggling to me, actually.

@lucasew Рік тому

Fun fact: DGX A100 has a configuration with a quad socket EPYC 7742, 8 DGX A100 and 2TB of RAM Source: I know someone who work with one He said it works nice with Blender renders too, but the focus is tensor number crunching using PyTorch

@HighYield Рік тому

All I could find is this dual socket config: images.nvidia.com/aem-dam/Solutions/Data-Center/nvidia-dgx-a100-80gb-datasheet.pdf But a quad one would be much nicer :D

@LukaszStafiniak Рік тому

Hardware requirements increase linearly for inference, not exponentially.

@MultiNeurons Рік тому

Yes it's very interesting, thankyou

@TheGabe92 Рік тому

Interesting conclusion, great video!

@HighYield Рік тому

I’m usually quiet resistant to hype, but AI really has the potential to fundamentally change how we work. It’s gonna be an interesting ride for sure!

@MrEnyecz Рік тому

Why improves the HW requirement of inference exponentially by the number of users? That should be linear, shouldn't it?

@HighYield Рік тому

Ofc you are right, it was just a term I used in conjunction with the exponential user growth of ChatGPT.

@lolilollolilol7773 Рік тому

AI progress is far more software bound than hardware bound. The deep learning algorithms are incapable of making logical reasoning and thus knowing if a proposition is true or not. That's the real breakthrough that needs to be done. Once deep learning gains this capability, we will really be confronted to superintelligence, with all the massive consequences that we are not really ready to face.

@MM-24 Рік тому

Any pricing analysis to go with this?

@HighYield Рік тому

That currently outside of my capabilities, as such large projects are priced very different from off-the-shelve hardware.

@ethanroland6770 Рік тому

Curious about a gpt4 followup!

@sa1t938 Рік тому

Something important to note is that OpenAI works with Microsoft and likely doesn't pay all that much for the GPUs. Microsoft OWNS the hardware, so their cost per day is just electricity, employees, and rent. The servers were already paid in full when the datacenter was built. Its up to Microsoft how much they want to charge OpenAI (who they are working with and just supplied Microsoft with Bing Chat, which made Bing really popular), so I'm guessing Microsoft gives them a huge discount or for free

@knurlgnar24 Рік тому

If you own a shirt is the shirt free because you already own it? If you own a car and let your friend drive it is it free because you already owned a car? This hardware is extremely expensive, it depreciates, requires maintenance, floor space, etc. Economics 101. Ain't nothin' free.

@sa1t938 Рік тому

@@knurlgnar24 did you read my comment? I literally mentioned all of those costs, and I said they are the only thing Microsoft is ACTUALLY paying. Microsoft chooses how much they want to charge, so they could charge a business partner like openAI almost nothing, or just foot the bill of maintenance costs instead. If they did either of those they already would have made their money back by how popular bing is because of bing chat.

@sa1t938 Рік тому

@@knurlgnar24 And I guess to your analogy, is a shirt free if you already own it? And the answer is, yes. You can use the shirt for no cost, minus the maintenance of washing it. You can also give that shirt to a friend for free. The shirt wasn't free, but you paid for it up front and now it's free every time you use it.

@outcast6187 Рік тому

Great video👍, have you considered using some kind of AI narrator tech trained on your own voice but with more clear english enhancements? Even the UKposts subtitles was getting your english wrong calling the Volta CPUs "Water GPUs"...😛

@HighYield Рік тому

I think I might add actual subs for longer videos. Have done it in the past, but it’s kinda tedious. Some ppl have no problem understanding my German accent, others can’t understand me at all :p

@outcast6187 Рік тому

@@HighYield UKposts has started rolling out "multi-audio tracks" so the user can select different versions of english (to include an AI generated natated version) as well as completely different laungages. Maybe it is available for your channel?

@albayrakserkan Рік тому

Great video, looking forward to AMD MI300.

@vincentyang8393 Рік тому

Great talk! Thanks.

@johannes523 Рік тому

Very interesting! I was wondering about Tom Scott's statement on the curve, and I think your take on it is very accurate 👍🏻

@HighYield Рік тому

I really feel like the moment you don't look at what AI does, but how it's "created", you get a much clearer picture. Ofc I might be completely wrong in my assumption :p

@oxide9717 Рік тому

Let's Gooo, 🔥🔥🔥

@GraveUypo Рік тому

the best part is they already made very similar AIs that are open source and you can run in your own computer. which is way preferable than handing over data to openai and paying by character used, which is kinda absurd. and they used chat gpt to train them, lol

@LucaDarioButzberger Рік тому

The Megatron AI has a memory bandwith of nearly 8 PB/s. Kind of insane.

@paulmichaelfreedman8334 Рік тому

I've been in IT for over 30 years, but 8 PB/sec is like 80,000 100 gig network connections parallel, or 8,000,000 1 Gbit connections. Yup, mind just blew. Finally.

@utubekullanicisi Рік тому

@@paulmichaelfreedman8334 It's petabytes/second, not petabits, you need to multiply 1,000,000Gbit/s with 8 to reach a petabyte/second, and 8 again to reach 8 petabytes. So, 64,000,000 1Gbit/s connections in parallel.

@paulmichaelfreedman8334 Рік тому

@@utubekullanicisi another order of magnitude higher.

@SRWeaverPoetry Рік тому

I run this stuff locally, but I dont run mine based on training data. The analogy I use, and AI can know how to ride a bike, but if he has no bike, he isnt riding anywhere. A bot needs both the brains and tools the body uses.

@maniacos2801 Рік тому

What we need are locally ran AI models. Optimisations will have to be made for this, but it is a huge flaw that this type of high-speed interactive knowledge is in the hands of a few multi-billion global players. And we all know by now, "Open"AI is anything but open. This is what scares me most about this whole development. In the early days of the internet, everyone could run a server at home or people could get together and run a dedicated hardware in some co-location. With AI this is impossible because no one can afford this hardware requirement. If Chat-AI is the new internet, we need public access to the technology otherwise only few will be in control of such a huge power to decide what information should be available and what should be filtered or even altered.

@yogsothoth00 Рік тому

Power efficiency comes in to play as well, will AMD be able to compete with slower hardware? Only if they can beat Nvidia in the overall value proposition.

@JazevoAudiosurf Рік тому

I think it's a very simple equation: more layers and thus more params lead to better abstraction and deeper understanding. the brain would not be so huge if it wasn't necessary. we need to scale transformers up until they reach couple trillion params and for that we need stuff like H100 and whatever they announce at GTC next month. transformers are probably enough to solve language. that combined with CoT and as papers have shown, it will surpass humans

@1marcelfilms Рік тому

The box asks for more ram The box asks for another gpu The box asks for internet access

@HighYield Рік тому

What's in the box? WHAT'S IN THE BOX???

@paulchatel2215 Рік тому

You don't need that much power to train ChatGPT. You can't compare the full training of a LLM (GPT-3) with an instruct finetune (ChatGPT). Remember that Stanford trained Vacuna which has performances similar to ChatGPT 3 for only $300 , by instruct finetuning the LLM Llama. And other open source chatbots have been trained on single gpu setups. So it's unlikely that OpenAI needed a full datacenter to train ChatGPT, the data collection was the hardest part here. Maybe they did, but then the training would have lasted less than one second, it seems useless to use 4000+ GPUs.

@seraphin01 Рік тому

Great video thank you Been trying to ask chatgpt about its hardware but obviously you don't get the answer haha Those who think we're already reaching the top of ai right now are grossely mistaken. Those Terra flops we're talking for the new architecture will sound so ridiculously crap in a few years time, just like a top end gpu in 2010 wouldn't even be good enough for a cheap smartphone nowadays. And with focus turning to AI only for hardwares now it's just gonna improve exponentially for a while Although like the guys at openAI stated, don't expect chat gpt4 to be skynet level of AI, the results might look like minor improvements at first glance, but the cost, reliability, speed a'd accuracy of those models will improve a lot before going to the next phase with is actual artificial INTELLIGENCE. By 2030 the world won't be the same as it is now, that's granted imo, and most people are not ready for it