КОМЕНТАРІ
@skrillexdj79
@skrillexdj79 2 години тому
Is this the same for the telsa k80 for the cable to power in the server
@gurshansingh6869
@gurshansingh6869 День тому
Are nvidia gpu better than amd for data science I just rejected an 7900 GRE for 4070 so i hope it is
@LetsChess1
@LetsChess1 День тому
So ive got the same 4028GR-TRT. With the rtx 3090s you got in there you cant fit them in the server however the 3090 founder edition can fit in the case. take a little bit of forcing but they will fit as the power comes out the side with that weird adaptor. That way you dont have to run a whole bunch of other crazy risers and in run them all over the place.
@tech6546
@tech6546 2 дні тому
Awesome, thanks! But isn't the lack of raw flops a big deal for training? I think a lot of modern consumer cards have much better fp16 per dollar... Also, how would that scale? Since kaggle gives 2xT4 for free which is 32gb ram, it doesn't seem like this rig is a big step up, unless you are training for more 25 hrs at a time... Can you suggest the best price/performance setups for $5-10k budget? Thanks
@tech6546
@tech6546 2 дні тому
Oh nvm I saw your new video, its exactly what I've been looking for!
@parthyadav8590
@parthyadav8590 3 дні тому
can we set this to work for either password or ssh key login with pam_duo? If some user wants to login using ssh key or if some sure wants to login using password then both should work and prompt for duo push
@ydgameplaysyt
@ydgameplaysyt 4 дні тому
tokens/s?
@GPTshop_ai
@GPTshop_ai 5 днів тому
If you want the best there is, check out our Nvidia GH200 (and GB200) systems.
@fernandotavarezfondeur3724
@fernandotavarezfondeur3724 5 днів тому
Hi TheDataDaddi, I really enjoyed your video! I’m planning to build a GPU server with a budget of $1,500-$2,000, mainly to rent out for AI processing and generate some income. Do you think this budget is realistic for my goals? Any recommendations for someone new to hardware, like me, would be incredibly helpful. Also, I'm from the Dominican Republic and curious about any advice you might have specific to my location. Is there another way to contact you for more detailed guidance? Thanks!
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks so much for the comment! Really glad you are enjoying the content. This is actually something I have been looking into myself! I think it is possible on that budget although you may have to be a bit patient and wait for some good deals. Feel free to reach out to me. My contact info should be in my UKposts bio. I would love the chance to get to know your situation a bit better and offer you more detailed guidance!
@fedorshin2097
@fedorshin2097 5 днів тому
did you use Clore ai servers for rent?
@TheDataDaddi
@TheDataDaddi 5 днів тому
I have not. I actually just heard about this project from your comment. I love the idea. This is something I knew had to exist. In my opinion, there is a natural opportunity for P2P hardware renting via blockchain technology especially as hardware prices and the cost of the cloud keeps increasing. Nice to see this already exists. Thanks so much for the comment! I will be doing a deep dive into this soon.
@JvP000
@JvP000 6 днів тому
Hi, How did you get the R720 server with all those specs for just $200? I can't even see any options available on the linked site with 256GB RAM, let alone anything near $200. What system would be an alternative option if it wasn't on sale / special time discount? Thanks
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks for the comment! Unfortunately, the price of hardware continues to go up pretty much across the board so my best advice here would be patient and try to find deals from certified refurbishers like the following: savemyserver.com/deal-of-the-week/ pcserverandparts.com/refurbished-servers/ www.servermonkey.com/default www.theserverstore.com/ EBAY is also a great resource for quality refurbed and used hardware Any 2U+ server that has at least 2 x16 lane PCIE slots should work. You will need to double check that the server will support the GPU form factor you are interested in. Some other ones you could consider: Dell PowerEdge R730/R740/R750 HP ProLiant DL380p Gen8/Gen9/Gen10 Supermicro SuperServer 6029GP-TRT ASUS ESC4000 G4 Gigabyte MG50-G20 2U Here is a link to a build I just speced out for another viewer. Hopefully this will help you get an idea of what you can expect more recently. docs.google.com/spreadsheets/d/11HJaPbho8wVKwholTBx-rka8hO5l-1Uh/edit?usp=sharing&ouid=105510157555294255385&rtpof=true&sd=true
@smoofwah3552
@smoofwah3552 6 днів тому
Hm im still watching but im looking to build some deep learning rig to host llama 3 400 dense , how many 3090s can i out together and how do i learn to do that?
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there! Thanks so much for the comment. I am not really sure it is possible for you host llama3 400 dense on 1 machine (without quantization). By my calculation, you would need close to 1TB of VRAM to hold the model. This would need to be split across many GPUs. Even if you used many H100 or A100 GPUs, you would likely not be able to host them in the same machine. It would take something like 12 or more of either. I do not know of any single servers that could support this. In this particular case, the super micro 4028GR-TRT shown in this video could theoretically handle up to 8 rtx 3090s rigged externally. That is only going to give you about 192 GB of VRAM. You might be able to get away with hosting the full 70B model without quantization with that amount of VRAM. However, for the 400 dense you are still a long ways off. To be able to host a model of that size, a much more practical and realistic way to do it would be to setup a distributed computing cluster with many GPUs on several different servers. In a distributed computing setup, each server would handle a portion of the model, allowing you to distribute the computational load and memory requirements across several nodes. This approach not only makes it feasible to host large models like llama3 400 dense, but it also enhances overall performance through parallel processing. To implement this, you might try utilizing frameworks that support distributed deep learning, such as TensorFlow with tf.distribute.Strategy or PyTorch with torch.distributed. These frameworks are designed to help manage the distribution of data and computations, ensuring that the workload is evenly spread and that the nodes synchronize effectively.
@emiribrahimbegovic813
@emiribrahimbegovic813 6 днів тому
how much electricity do you consume per month to run this setup?
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there! Thanks for the comment. I am averaging about 206.4 kWh/month with this setup. It cost me on average about $27.08 to run this per month with electricity costs in my area. I am assuming 30 days per month here, and I would estimate I run both GPUs solid about half of the month for context.
@bindiberry6280
@bindiberry6280 7 днів тому
Can we emulate GPU with USB?!!
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there! Thanks so much for the comment. I am sorry though. I am not sure I understand what is being asked here. Could you give a little more context?
@johnnymurray7331
@johnnymurray7331 7 днів тому
#hello - great work! Could help myself by tweaking this a bit. I added dynamic state, 4 temp control zones for <45 <50 <55 >55. Made it loop every 20 seconds and made the racadm commit stateful to hold off on unnecessary changes. This can now run as a system service. :) import subprocess import os from datetime import datetime import time # Initialize the state variable state = "unknown" def get_timestamp(): return datetime.now().strftime("%Y-%m-%d %H:%M:%S") def get_gpu_temperatures(): result = subprocess.run(['nvidia-smi', '--query-gpu=temperature.gpu', '--format=csv,noheader'], stdout=subprocess.PIPE) output = result.stdout.decode('utf-8') return [int(temp.strip()) for temp in output.split(' ') if temp.strip()] def adjust_fan_speed(temperatures): global state # Use the global state variable # Define temperature thresholds and corresponding fan speed offsets thresholds = [(45, 0), (50, 1), (55, 2)] for temp_threshold, fan_speed_offset in thresholds: if any(temp < temp_threshold for temp in temperatures): if state != str(fan_speed_offset): print(get_timestamp() + f": GPU temperatures are below {temp_threshold}°C. Setting fan speed offset to {fan_speed_offset}.") os.system(f"/usr/local/bin/racadm -r 192.168.1.70 -u root -p calvin set system.thermalsettings.FanSpeedOffset {fan_speed_offset}") state = str(fan_speed_offset) # Update the state break # Exit the loop after adjusting fan speed # Additional condition for temperatures over 55°C if temperatures and all(temp >= 55 for temp in temperatures): fan_speed_offset = 3 if state != str(fan_speed_offset): print(get_timestamp() + f": GPU temperatures are over 55°C. Setting fan speed offset to {fan_speed_offset}.") os.system(f"/usr/local/bin/racadm -r 192.168.1.70 -u root -p calvin set system.thermalsettings.FanSpeedOffset {fan_speed_offset}") state = str(fan_speed_offset) # Update the state def main(): prev_temperatures = [] # Store previous temperatures while True: temperatures = get_gpu_temperatures() print(get_timestamp() + f": Current GPU temperatures: {temperatures}°C") # Check for temperature change if temperatures != prev_temperatures: adjust_fan_speed(temperatures) prev_temperatures = temperatures time.sleep(20) # Adjust the interval to 20 seconds if __name__ == '__main__': main()
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks so much for the comment! This is great. Even better than the solution I came up with. Thank you so much for sharing this with us! In fact, if you have this in a GitHub repo and wouldn't mind sending me the link, I will add it to the video description so others can easily find it. Really appreciate it again!
@SyamsQbattar
@SyamsQbattar 8 днів тому
Which one is better, the Rtx 3060 12GB or 3060Ti?
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks so much for the comment! It really depends on your use case. If you plan on trying to expand to larger models like some of the diffusion related models or want to use larger batch sizes then 12 GB VRAM would be helpful. However, for most conventional deep learning models 8GB should be fine and you will get faster training and inference speeds. Personally, though I would probably go with the 3060 with 12GB. I normally always default to the GPU with the higher VRAM even if the performance is slightly worse. I would rather be able to load the model and just have training and/or inference be slower than get out of memory errors and not be able to load a model I want work with.
@asganaway
@asganaway 9 днів тому
Wait, is the kde GUI not occupaing any ram on any GPU @41:07 ? When I was working in similarry sized workstation was annoying to see Gnome be a little to hungry.. P.S. Just new on your channel and subscribed, good job, waiting for the benchmarking, in particular how the P40 will perform with half precision, my experience with that generation is that maybe it's not convinient compared to double but may be wrong.
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there! Thank so much for the comment and the subscription! Really appreciate that. To my knowledge, the GUI rendering is handled by the Aspeed AST2400 BMC (Baseboard Management Controller) which is the mother board's built-in graphics card. Yeah I have read that Gnome can be like that. It was one of the reasons I went with KDE Plasma instead. Yep, I am working on finish up a preliminary benchmark suite. As soon as I get it finished, I will do a video on the p40 vs p100 and open source the project via Github repo. Thanks again!
@asganaway
@asganaway 5 днів тому
@@TheDataDaddi yeah, the embedded gc, I suspected that, mine doesn't have it but I have to say with those kind of set up it is very convenient to have it, at least you will not experience the frustration of a gpu not fitting the whole model because someone is using firefox logged in :D Keep us posted with the benchmark project if I can I'll run it myself on the hardware I have.
@I_hate_HANDLES
@I_hate_HANDLES 9 днів тому
Great video, But sound is too low
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks so much for the feedback! I will be sure to fix this in future videos.
@fubleach2
@fubleach2 9 днів тому
Really good video but had to play at 1.75X 😅
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks so much for the feedback! In the future, I will work on keeping things more concise.
@Kamology_
@Kamology_ 10 днів тому
$100 for a 2060 Super is WILD lol. If anyone finds one for that price send it my way I’ll buy them all
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hey there. Thanks so much for the comment! Yeah, the prices in the video are probably a bit outdates at this point unfortunately, and I do not have the time to keep it updated. It is insane how much the price of hardware (GPUs in particular) has skyrocketed in such a short period of time. I am actually working on building a website right now that monitors the price of all the GPUs in the referenced spreadsheet in real time to help people make more informed decisions with real current GPU prices. Hopefully, I can have this built in the next few months. Stay tuned for updates here.
@Kamology_
@Kamology_ 5 днів тому
@@TheDataDaddi oh that would be very cool and useful. Yeah, 3090s in particular have gone way way up. I was thinking about grabbing one for $500-$600 a few months ago, checked the price and was like woah. Nvm. I should try to scoop some P40s before it’s too late for those too lol
@novantha1
@novantha1 10 днів тому
You know, I'd love to do a proper machine learning server, but there's a couple of things I was a little bit scared of. I've heard horror stories about people who purchased servers, only to find out that certain components (CPU is a big one) are "vendorlocked", or that the server needs to be booted first from a VGA port and accessed with a specific sign in key to access the BIOS and so on... To say nothing of the fact that certain GPUs will have "HPE" "Lenovo" and so on in the name in the listing, which leads me to believe there might be certain GPUs which are vendorlocked as well. It just seems like there's a lot of major issues you can run into that are absolute showstoppers, and it really does frighten me a bit to be caught out by things of this nature, because I haven't personally set up a server build before. Have you run into any issues like this?
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thanks so much for your comment! Interesting. I have never encountered any of these issues. One caveat here is that I have always bought refurbed servers so there my have been unlocked prior to being resold. That part I am unfamiliar with. I have also never encountered a vendor locked CPU, but again I have only ever bought used server CPUs. If you plan on going the used/refurbed route I would say the there is a low probability of this being an issue. Also, to further mitigate risk, I would advise buying from a platform like EBAY that offers a 30 return policy. This should give you enough time to get all the kinks worked out, and if there is something wrong you can always send it back. As for the GPUs, they may have certain manufacture names in the title/description for a couple of reasons. 1) The GPU might be manufactured by that particular vendor. For example, both GEFORCE and MSI both make a version of RTX GPUs. Same chip, similar if not same performance, often similar form factor. 2) The GPU could be manufactured to fit in a particular server type natively (like maybe Lenovo or Cisco or similar). This might mean it is harder to fit into a different server, but might still work. This is one reason why AI hardware is so difficult because there is so much nuance, competition, and a huge lack of clear documentation. Anyway, all of that to say. I would not worry too much about those things. I have built and worked on a good number of servers at this point, and I have never run into any of those issues (knock on wood). If you would like some help getting a build together, feel free to reach out to me. You can find all my contact info in my UKposts bio.
@SanjaySharma-ov1kf
@SanjaySharma-ov1kf 11 днів тому
@TheDataDaddi, Are you using 8 pin GPU splitter to connect 3090 card? Is is 8 pin female to dual 8 pin male connector ?
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there! Thanks for the comment. For the 3090s, it takes 2 male 8 pin connectors in addition to the PCIE slot to power the GPU. I have to PSUs like you would use for a regular consumer grade mother board. I am using the 8 pin connectors from the PSU to power the 3090. I have one 500W PSU per 3090. For a more in depth explanation here, check out the video I have on setting the 3090s up. ukposts.info/have/v-deo/sqKUf319aGtnw3U.html
@user-cq1be3bk3h
@user-cq1be3bk3h 13 днів тому
Wow ! How many eons worth of time have you put into this ? Theres not very many people who would go so far out of their way for others like this, this great work of art, thank you. So, for LLMs like Mixtral I can just use P 40 ? Yay
@TheDataDaddi
@TheDataDaddi 5 днів тому
Hi there. Thank you so much for the comment! This video took quite a while for me to put together. I think a couple weeks if I remember correctly. I was doing all the research for myself so I figured that I could save others some time and frustration by sharing my results. I am glad to hear that it is appreciated! Yes! After looking at the specs on hugging face, it looks like you can run inference with even the largest mistral 7B available with a max VRAM requirement of 10.20GB. Unfortunately though, if you wanted to use a non quantized version you may run into memory issues. huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
@willdwyer6782
@willdwyer6782 14 днів тому
Thanks for the long winded answer to a simple question. That's 40 minutes of my life that I won't get back.
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there. Thank you for the comment! I am really sorry that you feel like I wasted your time. Honest, feedback is always welcome. I will do my best to keep my videos more concise in the future. I do appreciate you watching the video nonetheless.
@user-cq1be3bk3h
@user-cq1be3bk3h 13 днів тому
Hi, Xi, Ji !
@djmccullough9233
@djmccullough9233 15 днів тому
Hey I couldnt Find a twitter for you so i thought i should ask you here. Have you thought about those Mining Gpus? there are modified p102-100 cards with 10 gb of vram being sold for $50-60 bucks. By my estimate their slightly Gimped 1080ti's with no graphics output and 1 less gb vram. Would that price point be worth the amount of cuda performance it could add to a system for stable diffusion or an LLM? I ordered 3 of them to test this with. Figure $150 isnt that big a buy in to give local LLM acceleration to 3 computers. (and if im wasting money, atleast its not gonna break the bank) The 1080ti is still selling for over 300 bucks on ebay, and you could get six of these lil'bastards for that price.
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there! So, I actually just recently made a Twitter. You should be able to find me @TheDataDaddi. Please let me know if you cannot. So, I just looked up what you are talking about. www.ebay.com/itm/126353539011 I think this would actually be a great way to go if it actually works lol. Is says in the description "This is a mining only GPU, it does not have display outputs, and can not be used for gaming or any other purpose." However, I do not see why it would not be able to be used for AI/ML/DL applications. In fact, I am going to try this myself. For $50 bucks, you really can't beat it if it works! Also, just one other thing to point out, at 10GB VRAM it will probably only be suitable for inference for LLMs and Diffusion/Stable Diffusion models.
@djmccullough9233
@djmccullough9233 15 днів тому
I'd love to know your opinion of the modified Nvidia p102-100's with 10 gb of vram being sold for about $50-$60 on ebay since they have no display outputs. They are basically 1080 ti's with a bit of performance nerfing. they have no display outputs, but seem like they'd be ideal to plop into a system with an existing AMD Gpu just to give Cuda acceleration. or perhaps multiple cards?
@TheDataDaddi
@TheDataDaddi 14 днів тому
I responded to you on the other comment, but in case you see this one first. Short answer. I think this would be a great way to go provided they work!
@SanjaySharma-ov1kf
@SanjaySharma-ov1kf 15 днів тому
This looks very impressive. Do we need that much RAM and Storage for AI model training? And can we install Nvidia 3080 GPU internally?
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there! Thanks for the question. So, you definitely do not need this much RAM or storage. I just have some particularly heavy work loads and having excess storage and RAM makes my life much easier. I would say try to get a little bit extra than what you think you will need for your intended use case. Also, I would spend a bit extra and more memory dense modules. This way you leave yourself open slots for expansion in the future if needed. It is going to depend a lot on the form factor of the particular GPU variant (ASUS, Gigabyte, etc.), but overall I would say it is a safe best to assume that it will not. Part of the issue comes in as well that the 8 pin power connectors are on the top of the GPU so there is not space to plug anything in between the top of the GPU and the lid. You could get creative and buy a 90 degree riser so the GPU would fit sideways or cut a hole in the lid. However, without any modification you can be pretty sure it will not fit.
@SanjaySharma-ov1kf
@SanjaySharma-ov1kf 14 днів тому
@@TheDataDaddi Thank you for the quick reply. I have NVidia founders edition GPu which are smaller compared to other brands. Else I will have to purchase Nvidia P40 GPU, not sure about the performance of P40 as compared to 3080. Also is it fine if I go with only 2 RAM chips like 64 GB x 2 or do I need to install minimum of 4 RAM chips memory bank to work.
@TheDataDaddi
@TheDataDaddi 14 днів тому
​@@SanjaySharma-ov1kf The founders editions may work actually. Just do be mindful that you will have enough room to actually connect the power cables. The 3080 will be about 2.5 times as performance as the p40 on paper so it would certainly be worth using it if it is physically possible. Another route you could go is making an external rig for the 3080s. I have a video on that if you are interested in going that route. ukposts.info/have/v-deo/sqKUf319aGtnw3U.html The 4028GR-TRT supports dual Intel Xeon E5-2600 v3 or v4 series processors, which typically support quad-channel memory configurations. Each processor has four memory channels, and ideally, you should populate one DIMM per channel to maximize memory bandwidth. Using only two DIMMs might limit the memory bandwidth because only two of the eight channels (four per processor) would be utilized. However, it would still function correctly.
@SanjaySharma-ov1kf
@SanjaySharma-ov1kf 14 днів тому
@@TheDataDaddi Thank you, I will try out the 3080 and 3070 GPU which I already have, if not I will buy P40. I do not prefer to have external GPU as it will be little risky with kids at home, even though the server will be in the garage. For RAM, I will go with 8 modules to get bandwidth efficiency. I am placing the orders on eBay, hopefully will get it by the coming weekend :) ..Appreciate your help and sharing of information, keep up the good work.
@TheDataDaddi
@TheDataDaddi 14 днів тому
@@SanjaySharma-ov1kf Yeah if you already have a 3080 and 3070 definitely try those first. Please let me know if you can make them work internally. Great! I think this is what I would do personally. If you are patient you can find some pretty good deals on EBAY for relatively cheap RAM modules. Ofcourse! I am always happy to share what knowledge I have gained. Cheers and good luck!
@pixelslayertv7140
@pixelslayertv7140 16 днів тому
what yould you recommend one rtx 4070 ti super or two rtx 4060 ti 16 Gb for llm inferencing? i know there are a lot of things to consider
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there. Thanks so much for the question! So, in terms of theoretical performance the 4070 ti super is about twice as performant. Since you are just interested in inferencing, I would say that in this case having the better performance of the 4070 will benefit you more.
@pixelslayertv7140
@pixelslayertv7140 14 днів тому
@@TheDataDaddi thanks a lot for you answer. I think its worth the better performance since i dont get any performance benefits of having two rtx 4060 ti besides more memory. In case i do some training ate some portion i can still rent some hardware. I my case i want to have a local llm for work where i can not use anything connected to the internet. So i would use it primarily for inference and if i need to optimize the model this should only occur one ore twice (hopefully)
@TheDataDaddi
@TheDataDaddi 14 днів тому
@@pixelslayertv7140 Sure! Yeah unfortunately since the memory pools for both 4060s would be separate you really don't get much benefit even having more total VRAM. You may be able to get away with fine tuning some of the smaller open source LLMs especially if you look into quantization. However, my gut tells me you will have a hard time doing much beyond that with 16GB VRAM. Like you said though you could always rent for the few times you do need access to more VRAM.
@user-uh8po2sx6y
@user-uh8po2sx6y 16 днів тому
Great video!I am really confused on which package & model of gpu & cpu to buy.I have seen these recommended package: AMD Ryzen 7 7800X3D Processor  Deepcool AK400 DIGITAL - AFTERSHOCK Edition  Gigabyte B650M Gaming Wifi  Gigabyte RTX 4090 Windforce V2 - 24GB  32GB Team T-Force Delta RGB 6000mhz (16x2)  2TB Lexar NQ710 Gen4 SSD  850W Deepcool 80+ Gold ATX3.0 (ZC850D)  AX Wireless + Bluetooth Included Is this package powerful & fast enough to run majority of AI applications & video rendering?Does the number of cores and threads in CPU affect the performance & speed of AI workloads?Should i buy intel or AMD processor? I would like to hear your recommendation 😍
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there! Thanks so much for your question! The package seems capable, but here are a few enhancements I'd recommend for running AI applications and video rendering smoothly: 1) Increase RAM to at least 64 GB: This helps tremendously with data preprocessing and multitasking, especially when handling large datasets typical in AI tasks and video projects. 2) Opt for a CPU with 12-16 cores: For intensive tasks like model training and video rendering, especially if done simultaneously, a 16-core CPU would prevent bottlenecks. This ensures the CPU can efficiently supply the GPU with data and handle video tasks without slowdowns. Regarding your other questions: Impact of CPU Cores on AI: Yes, the number of cores significantly affects AI performance because the CPU prepares and feeds data to the GPU. More cores mean smoother data handling and less waiting time for the GPU. Intel vs AMD: Both brands offer competitive performance for AI workloads. I recommend choosing based on the best price-to-performance ratio you can find, often influenced by sales. Check out CPUBenchmark for current pricing and performance comparisons. cpu_value_available.html I hope this helps! Let me know if you have any more questions.
@TazzSmk
@TazzSmk 17 днів тому
for AI/ML performance, Tensor cores really help speed things up, so recent nVidia cards are much faster, that said, speed is less concern than sufficient amount of VRAM, so 24GB Pascal P40 can yield better value than let's say 11GB GTX1080Ti from similar age
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there. Thanks so much for the comment. Yep, these are my thoughts exactly. Tensor cores are certainly wonderful to have, but if you don't have enough VRAM to load the model then the number of tensor cores you have is irrelevant.
@optiondrone5468
@optiondrone5468 17 днів тому
man this was an excellent video both the build and the software setup, now please make a video showing us how you use this system for ML both from setting up your venv and your ML coding setup and doing some ML training to stress test your system, how about trying some SDXL image upscaling workflow?
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there. Thanks so much for the comment! Very glad to hear that the content was helpful to you. Okay great feedback. I will try to do a video soon on my workflow, and how I manage multiple projects. I will also be making some video soon to benchmark the system and GPUs so say tuned for that. I am not familiar with the SDXL image upscaling workflow, but I can certainly take a look when I have some time and see if I can make one on that as well. Thanks so much again for the suggestions! Really appreciate it.
@optiondrone5468
@optiondrone5468 14 днів тому
@@TheDataDaddi thanks mate, your content are top notch, keep up the good work.
@TheDataDaddi
@TheDataDaddi 14 днів тому
@@optiondrone5468 Appreciate the kind words! Cheers
@starlordhero1607
@starlordhero1607 18 днів тому
00:11 Choose GPU with newer architecture for better performance 02:28 Choose NVIDIA GPUs with active support and sufficient VRAM for future scalability. 06:51 Key considerations for choosing an NVIDIA GPU for deep learning 09:08 Consider driver support for deep learning framework compatibility. 13:10 Factors to consider when choosing an NVIDIA GPU for deep learning 15:12 Understanding the key GPU metrics is crucial for making the right choice. 19:46 Choosing GPU based on performance, memory, and bandwidth criteria. 22:00 GeForce RTX 2060 Super and GeForce RTX 4060 TI 8 Gbit are the best bang for your buck GPUs. 26:27 Comparison of NVIDIA GPU models for Deep Learning in 2023 28:45 GeForce RTX 4060 Ti 16GB has the best raw performance 33:18 Choosing NVIDIA GPUs for Deep Learning in 2023 35:36 Best bang for your buck: P100 and P40 GPUs 39:22 P100 and P40 are recommended for deep learning Crafted by Merlin AI.
@TheDataDaddi
@TheDataDaddi 14 днів тому
Thanks so much for adding these! I appreciate it.
@hisredrighthand5212
@hisredrighthand5212 19 днів тому
Hi, I've been looking at refurbished Dell PowerEdge ((720) after one of your previous videos. If I have a budget of $2000+ and I can get 2 x RTX 4060 Ti (16 GB) for $900, would your PowerEdge 720 build make sense replacing the P40s for RTX 4060 Ti? Thank you so much for your help.
@TheDataDaddi
@TheDataDaddi 14 днів тому
Hi there! Thanks so much for the question. You can certainly go that route. The GPUs themselves should be compatible with the server. However, I would strongly caution you to pay attention to the form factor of whatever flavor of RTX 4060 that you buy. It is likely that it will not fit inside the server chassis. Even if it does fit you may run into issues due to the position of the 8 pin power connectors. Even if it will not fit inside, you can still go this route though you will just have to make an external rig for the GPUs. I have a video on this as well. ukposts.info/have/v-deo/sqKUf319aGtnw3U.html Hope this helps! Please let me know if you have any other questions
@user-uh8po2sx6y
@user-uh8po2sx6y 19 днів тому
Is there a big difference in performance and speed in AI tasks like stable diffusion etc between RTX 4080 super and RTX 4090?Which one should i buy as I seldom play games or should i wait for 5090 at the end of the year?I am not a video editor or hold any jobs related to designing or editing,just a casual home user.
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there! Thanks so much for the question. I would say at this current moment. For stable diffusion related tasks, I would go with the 4080. 16GB VRAM should be enough to comfortably handle pretty much all stable diffusion tasks are this point (to my knowledge) and its performance is 60% of what you get with the 4090 for less than half the price. All that said, If you are not pressed for time, I would probably wait to see what happens to the market when the new 5000 series GPUs come on market. It could bring the prices down for the 3000 and 4000 series GPUs as people dump their older GPUs in favor of the latest and greatest. This approach is always a gamble though so if you prefer a safe bet I would look for good deals on a 4080 and not worry too much about what will happen down the road.
@rohithkyla7595
@rohithkyla7595 19 днів тому
Great video! I'm looking to replicate this :) Quick question about the PCIE lanes, it looks like all v3 and v4 gen xeon's have 40 lanes support each, so 80 total. How will 8x 16 (= 128 lanes) fit into this 80 lanes supported by the CPU?
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hey there. Thanks so much for the comment. Great question! Yes, its absolutely true that each CPU only support 40 PCIE lanes. However, the 4028GR-TRT employs something called PLX technology. These are basically PCIE switches that manage and distribute the 80 physically available lanes to provide higher connectivity, effectively multiplying the number of PCIE lanes coming from the CPUs and thereby increasing the bandwidth available to each GPU. It also provides intelligent data routing between the CPUs and GPUS. This allows for dynamic optimization of PCIE lane usage. Exactly how this works I am honestly not sure. I should probably research this more. All that to say, it should be theoretically possible with this technology to support 8 GPUs at the full X16 lanes bandwidth. However, while it may certainly increase the bandwidth available to each GPU, I highly doubt that under full load with 8 GPUs you would get the full x16 lane bandwidth for each. This would actually be a really interesting thing to test! One day when I get 8 of the same GPUs I will definitely have to try it. lol. Anyway, I hope this answers your question. I wish could give you a more concrete answer, but this is the best I can do for now.
@rohithkyla7595
@rohithkyla7595 19 днів тому
@@TheDataDaddi thanks a lot for your reply! Will look further into this :) I've ordered one of these servers, looking forward to getting it!
@rohithkyla7595
@rohithkyla7595 12 днів тому
@@TheDataDaddi Hi Mate, I've just received my 4028GR, I currently have 4 GPUs total (2 p100s and 1 p40 and a rtx 3080), the 2 p100s are in the supermicro, and the p40 is in a separate desktop. In terms of splitting the 2 p100s across the 2 CPUs on the supermicro, would you recommend having both on 1 CPU or split it between 2 CPUs?
@SanjaySharma-ov1kf
@SanjaySharma-ov1kf 5 днів тому
@@rohithkyla7595 Hey I have ordered the same server 4028GR-TRT and have 3060 and 3080 GPU, but for some reason the server does not recognize the GPU. I have noticed that the power adapter is bit tight fit for these GPU cards. Did you install 3080 on 4028GR-TRT server? Do we need to change anything in the BIOS or did you use a different power cable for 3080 or 3060 GPU card? I am stuck without the GPU detection on this server.
@rohithkyla7595
@rohithkyla7595 5 днів тому
@@SanjaySharma-ov1kf Hey, I am currently only running the P40/P100 GPUs which only need 1 8 pin connector which comes with the 4028GR-TRT. I do however have the 3080, but it has power adaptors at the top which makes closing the server's lid difficult. So, I'm currently not using the 3080 in the server. Regarding your issue, it could just be power related, I think an PCIE elevator + outside power should let you know if it's the server's fault.
@autkarsh8830
@autkarsh8830 20 днів тому
sweet video!
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there. Thanks for the comment. So glad you enjoyed it!
@louietownsend2457
@louietownsend2457 20 днів тому
*Promosm*
@mannfamilyMI
@mannfamilyMI 20 днів тому
Would love to see inference speed of something like llama on the P100 and P40 cards. I have dual 3090's so I'm familiar with that, but looking to 8x to gain more vram, but don't want the complexity of consumer cards. Have you considered the AMD MI100 card by chance?
@TheDataDaddi
@TheDataDaddi 19 днів тому
I am working on some benchmarks now. I will try to get some quick ones for some of the open source LLMs because everyone seems to be most interested in those at the moment. Stay tunned, and I will try my best to get them out as quick as I can. I have not personally gone the AMD route yet, but its something I plan on experimenting with down the road. Its funny you mention the MI 100 card though. I was actually talking to a viewer the other day and he was telling me about his experiences with AMD and using the MI 100. To summarize his experience: "AMD is not worth it if you value your time, but once it's working it is fairly decent and a good alternative to Nvidia." If you are interested in this route, please reach out to me. You can find my contact info in my UKposts bio, and I can try to put you in touch with the other viewer.
@mannfamilyMI
@mannfamilyMI 20 днів тому
Hi, did you run any inference of an LLM with the P40 and P100 cards? Could you share your experience with that in a video? Do you have plans to try and build a small cluster and spread training or inference across nodes?
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there! Thanks so much for the comment. As of this moment, I have actually not used any of my GPUs for working with LLMs. I definitely plan on doing this as soon as I get a chance. My plan is to make a video series devoted to exactly what you need to work with LLMs as a home lab or small business. Eventually, I would love to try this! I just do not really have the resources at the moment to turn all of my current servers into a cluster. So, for the moment I will be using my SM 4028GR-TRT as my main DL machine. I do plan on doing that eventually, and I will be sure to share the journey.
@sushmaanilsingh4489
@sushmaanilsingh4489 21 день тому
4:02 But Sir, LLM like chatgpt are trained from multiple gpus only right?
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there! Thanks for the comment. For LLMs like GPT-3, the resources required to train or even pre-train such models from scratch are beyond the reach of most individuals and many companies. This is due to the immense computational power and data handling capacities needed. For example, models like GPT-3 are were basically trained on most of the internet (Common Crawl, WebText2, etc.), Tons of full books, a snapshot of Wikipedia at that time, and more. Training was preformed with literally thousands of high-end GPUs across various datacenters. Also, ChatGPT and other similar models are proprietary so lack of detailed specifics about their training processes and architecture make it hard to know exactly what it would take to train something like GPT-3 (or other similar model) from scratch. What is accessible for most people and organizations is running inference using these models. "Inference" refers to the process of using a pre-trained model to make predictions or generate text in the case of LLMs. The feasibility of running inference smoothly depends largely on the amount of VRAM available, as larger models require more memory to operate efficiently. For instance, smaller versions of LLMs might run on a single GPU with 12 GB of VRAM, while more extensive models might require a GPU setup with significantly more memory. For those with more robust computing setups, such as advanced home labs or small to medium-sized enterprises, fine-tuning an LLM might be within reach. Fine-tuning involves adjusting a pre-trained model on a new dataset or for a specific task, which typically requires fewer resources than full-scale training from scratch. This process allows users to tailor the model's responses to better fit particular contexts or industry-specific needs without the prohibitive cost of training a new model from the ground up. The following Reddit thread is pretty useful in providing more details here: www.reddit.com/r/MachineLearning/comments/15uzeld/d_estimating_hardware_for_finetuning_llm/ For fine-tuning, a setup with one or more high-end GPUs, such as the NVIDIA A100 or V100 (or RTX 3090 as I advocate for), would generally suffice. This allows for modifications of large LLMs using varied sizes of data, making it a viable option for enhancing model performance on specialized tasks. In summary, while training large-scale LLMs from scratch is out of reach for most, leveraging these models through inference or fine-tuning them for specific applications is quite feasible with the right hardware setup. This opens up opportunities for a wide range of applications, from personalized AI assistants to sophisticated data analysis tools, even for smaller organizations or dedicated individuals with the appropriate resources.
@SuccessDynamics
@SuccessDynamics 21 день тому
Thank you very much for the idea 💡 my R730 and 2x P40s are UP and running. I see, I need liquid cooling, because its very nosy 😂
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there. Thank for the comment. Liquid cooling is certainly an option and depending on the environment where the server lives this may be necessary. I have a video specifically on cooling the GPUs. I would advise you to watch it before you go that route. I created a python script to adjust the fans based on GPU temperature. It seems to work pretty well to keep the GPUs from throttling and might save you some money and time installing liquid cooling. ukposts.info/have/v-deo/ioWIZ3ewbKikpI0.html Let me know if you have any questions!
@many151000
@many151000 22 дні тому
Friend, I have a problem when connecting a Tesla M10 to my r720 server, when I put in the power cable the sources start pulsing amber, do you know why it is...
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there. Thanks so much for the question. Is the server able to boot, when you have the M10 in?
@user-cx6rg6mr7d
@user-cx6rg6mr7d 23 дні тому
do you get 48 GB VRAM when using 2X P40? thank you
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there. This a great question. Unfortunately, when you use 2X P40 the memory pool is not shared so you just have 2 separate 24GB memory pools. This is still useful because it allows you to parallelize data processing across multiple GPUs. You can also split part of a particular model across GPUs and process data sequentially across GPUs. However, you would not be able to load say a 40GB model directly onto the GPUs.
@user-cx6rg6mr7d
@user-cx6rg6mr7d 17 днів тому
​@@TheDataDaddithank you
@ShakeShifter84
@ShakeShifter84 23 дні тому
Would have loved to see the top 2 of each category put in a list together and compared in the spreadsheet. Great video
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there. Thanks so much for the comment! I am actually in the process right now of creating by own website: thedatadaddi.com. One of the first things I am going to put on the website is a real time GPUs price to performance dashboard. I will certain add in the ability to make this kind of comparison. Please stay tuned for progress here.
@MattJonesYT
@MattJonesYT 24 дні тому
When you get it running I will be very interested to see where the bottle neck is and whether it's the cpu or the gpus because the older xeons seem to be a bottle neck
@TheDataDaddi
@TheDataDaddi 19 днів тому
I will eventually make a whole series devoted to LLMs and best setups just for that. In that series, I will definitely report back on where the bounds are. Unfortunately, my time is just incredibly limited these days. I will try to get this info out as soon as I can.
@MattJonesYT
@MattJonesYT 24 дні тому
I did a similar build with dual E5-2697 v3 and it's CPU bound with just one P40 when using aphrodite. I'm not sure if switching to llama-cpp will fix that but so far I'm thinking a more modern CPU would have been better.
@TheDataDaddi
@TheDataDaddi 19 днів тому
Hi there! Thanks so much for the comment. I have not used the setup much yet for LLMs so I will report back as I get a chance to test in that area. Unfortunately, to my knowledge the V4s are the newest CPUs that this server will support so if you wanted something newer you would likely have to go a different route. So far though in my applications I have not experienced CPU bounding. Even when using all my GPUs.
@soulTlMAThE
@soulTlMAThE 26 днів тому
My God... I was FINALLY able to instal those bloody drivers on a ESXi VM with Ubuntu Server. Yeah I know, I'm a noob, but to anyone else having the same problem: 1. To "really" enable the passthrough of GPU with 16GB or more of VRAM you need to add 2 settings in the advanced tab when creating a new VM: pciPassthru.use64bitMMIO=”TRUE” pciPassthru.64bitMMIOSizeGB=32 Those are cruicial for older versions of ESXi (mine is 7.0 U3) 2. Change the firmware in VM options to "EFI". 3. Check the "Reserve all guest memory (All locked)". 4. Install Ubuntu Server LTS. 5. Finally, follow the @TheDataDaddi instructions in this vid to install the drivers. You'd be suprised how long can it take for someone who has no idea about Linux and virtual machines to install those f*****s XD. Enjoy the VM with installed and working drivers!
@ALEFA-ID
@ALEFA-ID 27 днів тому
do we need nvidia SLI to fine tune LLM with multiple GPU?
@TheDataDaddi
@TheDataDaddi 25 днів тому
Hi there. Thanks so much for the comment. First and foremost, NVIDIA SLI is different than NVLink. SLI is primarily designed for linking two or more GPUs together to produce a single output for graphically intensive application (gaming in particular). It is not really designed for AI/ML/DL, and is not really used for this purpose to my knowledge. Also, for the 3090 it does not use SLI it uses NVLink. For NVLink, you do not necessarily need it. It does not make the total memory pool for each GPU any different, but it is certainly a nice to have. It will significantly speed up most operations as it allows communication directly between the GPUs at hundreds of GB/s. So, it will not prevent you from working with LLMs, but it will make you much faster when dealing with them if that makes sense.
@fullstackmatt
@fullstackmatt 28 днів тому
Hi, excellent video thanks! Is it possible to map existing downloaded blocks to the container so they don't have to be downloaded again?
@TheDataDaddi
@TheDataDaddi 25 днів тому
Hi there. Thanks so much for the comment! So, when you get BTC Core setup, it has an initial discovery phase called IBD (Initial Block Download). This actually goes through and grabs copies of every block to the current block in the chain and stores a copy on your machine. By default you should have a local mapped copy of each block on chain. If you stop the container and start it again after awhile, BTC Core will only download the new block that you do not already have locally. Hope this makes sense! Please let me know if you have any other questions.
@johnireoluwababalola629
@johnireoluwababalola629 28 днів тому
The effort you put into this video is just mind-blowing. I subscribed immediately 😁
@TheDataDaddi
@TheDataDaddi 28 днів тому
Hi there. So glad you enjoyed this video and thank you very much for subscribing! I will do my best to continue to make great content for you.
@ultraplexplextor
@ultraplexplextor 29 днів тому
working system super stability blender v4.1, flamenco v3.4, four nvidia grid k2 gpus, and two dell r720 servers. ukposts.info/have/v-deo/qXqFa2l4nmyrsKs.html