I love how you take nn, and explain to us, not by already built in function in pytorch, but by how things works, then giving us what the equivelent lf it in pytorch
@mnbvzxcv111 годин тому
GOD
@PopescuAlexandruCristian13 годин тому
This is the best back propagation explanation I ever got. Congratulations! This is great!
@AndresNamm15 годин тому
In this lecture at time arount 18.50 Andrej says that you get white when the statement is true and black if the statement is false. This is actually the opposite. The dead neuron problem occurs when you get completely black neurons under the imshow. In those cases the gradient does not change for the upstream parameters. Does anybody agree+
@Alley00Cat17 годин тому
The legend that introduced me to DL all those years ago continues to awe us. Bravo
@dead_d20dice6719 годин тому
Hi! Can you make video explanation about new Kolmogorov Arnold Network (KAN) ? It's seems to be a new revolution in NN 🤔
@drtech652119 годин тому
This was very effective, knowledgeable and overwhelming at the same time 😂
@alexanderchernikov7497День тому
Brilliant video!
@jianjielu9835День тому
extremely simple and interesting! Thank you!
@matthewhendricks648День тому
does anyone else hear that crazy loud ringing noise?
@herashakДень тому
thank-you! audio is very low though
@percevil8050День тому
My tiny brain just blew up!
@GlitchiPitchДень тому
hi everyone if i have losses like this (train loss 0.6417 val loss 6.7432) but an output not bad, is it normal?))
@hongwuhuai2 дні тому
Hi Andrej, thanks a ton for the free education, I am enjoying learning throught the lessions! A quick observation, the output of the explicit counting method and the nn method seem different in the end: counting table sampling -> nn sampling: koneraisah. -> kondlaisah. andhumizarie. -> anchthizarie. can you comment? my intuitions is telling me that the nn is just an approximation of the countering table, never exact. Not sure if this is the right way of reasoning. Thanks!
@FireFly9692 дні тому
Thank you so much Mr Andrey kaparthey, I watched and practice a pytorch course of like 52 hours and it was awesome, but after watching your video, it's seems that I was more of like learning how to build a neural network, more then how neural network works. With your video I know exactly how it works, and iam planning to watch all of this playlist, and see all of almost all your blog posts ❤ thank you and have a nice day.
@warwicknexus1602 дні тому
Brilliant
@oleksandrasaskia2 дні тому
Thank you so much!!! For democratizing education and this technology for all of us! AMAZING! Much much love!
@monocles.IcedPeaksOfFire2 дні тому
❤ > It's (a) pleasure
@akhilphilnat2 дні тому
fine tuning next please
@user-cb3pf6qf2z2 дні тому
Is it possible to just calculate the gradients once and then you know them? Not resetting and recalculating. What am I missing?
@lielbn02 дні тому
Thanks!! I have never understood better how a neural network works!
@Clammer9992 дні тому
One of the best videos on under the hood look in LLMs. Love the clarity and patience Andrej imparts considering he’s such a legend in AI.
@anthonyjackson76442 дні тому
Was i the only one invested with the leaf story😭
@mikezhao18383 дні тому
Hi Not sure what happens, but I can't join discord, it says "unable to accept invite". I have issue to get the "m" from multinormial distribution, I got 3 instead 13 if I use num_samples=1. if I use num_samples>1, I got 13. but I got "mi." if num_samples=2. people on line suggest to use touch 1.13.1, I dont have this old version on my macOs.
@AsmaKhan-lk3wb3 дні тому
Wow!!! This is extremely helpful and well-made. Thank you so much!
@AdityaAVG3 дні тому
This guy has become my favorite tutor .
@ezekwu773 дні тому
I love this learning resource and the simplicity of the tutorial style. Thanks at Andrej KArpathy.
@switchwithSagar4 дні тому
In the case of a simple Bigram model @32:38 we are sampling only one character, however, while calculating loss we consider the character with the highest probability. The character sampled is unlikely to be the same to the character with the highest probability in the row unless we sample a large number of characters from the multinomial distribution. So, my question is, does the loss function reflect the correct loss? Can anyone help me understand this.
@meow-mi3334 дні тому
Thanks I really need this level of details to understand what’s going on. ❤
@MagicBoterham4 дні тому
1:25:23 Why is the last layer made "less confident like we saw" and where did we see this?
@marcelomenezes37964 дні тому
This is the best video ever about Intro to LLM.
@MrManlify4 дні тому
How were you able to run the loop without adding a requires_grad command in the "implementing the training loop, overfitting one batch" section of the video? For me it only worked when I changed the lines to: g = torch.Generator().manual_seed(2147483647) # For Reproducibility C = torch.randn((27, 2), generator=g, requires_grad=True) W1 = torch.randn((6, 100), generator=g, requires_grad=True) b1 = torch.randn(100, generator=g, requires_grad=True) W2 = torch.randn((100, 27), generator=g, requires_grad=True) b2 = torch.randn(27, generator=g, requires_grad=True) parameters = [C, W1, b1, W2, b2]
@AlexTang994 дні тому
This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!
@adirmashiach46394 дні тому
Something you didn't explain - 51:30 - if we want L to go up we simply need to increase the variables in the direction of the gradient? How come it is so if some gradients are negative?
@yourxylitol4 дні тому
first question: yes second question: because thats the definition of a gradient -> If the gradient is negative, this means that if you make the data smaller, the loss will increase
@howardbaek54135 днів тому
This is the single best explanation of backpropagation in code that I've seen so far. Thanks Andrej.
@ThefirstrobloxCEO9895 днів тому
Thanks a lot for the insight. and demonstration. I really look forward more videos from you, Andrej!
@debdeepsanyal90305 днів тому
just a random fun fact, with gen = torch.Generator().manual_seed(2147483647), the bigram generated name i got was `c e x z e .`, amazing.
@mehulchopra15176 днів тому
Thanks a ton for this Andrej! Explained and presented in such simple and relatable terms. Gives confidence to get into the weeds now.
@wangcwy6 днів тому
The best ML tutorial video I have watched this year. I really like detailed example, and how these difficult concepts are explained in a simple manner. What a treat for me to watch and learn!
@hotshot-te9xw6 днів тому
What methods of alignment do you personally feel are extreamly promising for ensuring future AGI doesnt kill us all
@a000000j6 днів тому
One of the best explanation of LLM....
@ced14016 днів тому
Thanks you very much
@quentinquarantino82616 днів тому
Is this the real, one and only Andrej Karpathy? Or is this a deep fake?
@soumilbinhani88036 днів тому
Hello sir, it would be great if you could make a video on how exactly these tokens are converted into embedding vectors, eg - how to make word to vec.. Thank you
@soblueskyzll7 днів тому
I am following exactly (i believe) to calculate all the gradients, but beginning from dhpreact, the results show "exact: False, approximate True", with maxdiff on the order of 1e-9 ~ 1e-10. Is it just some numerical issue, or I did something wrong? Anyone had same issue?
@Kevin.Kawchak7 днів тому
Thank you for the discussion
@josephmathew46677 днів тому
Thank you so much Andrej. As many has already commented, this was by far one of the best lectures I have ever listened.
@chadlinden69127 днів тому
Learning the math is really interesting, helps to build a mental image of a plane or block of vectors shifting while training. I'd be curious to know if in the history/evolution of ML and AI if hardware drove intense matrix math derived software solutions to AI, or if improving hardware made this math possible.