КОМЕНТАРІ
@Mahi-sf8ck
@Mahi-sf8ck 7 годин тому
Such a superstar!!!!!!
@Ali-lt1kb
@Ali-lt1kb 9 годин тому
Thank you so much Andrej!
@FireFly969
@FireFly969 10 годин тому
I love how you take nn, and explain to us, not by already built in function in pytorch, but by how things works, then giving us what the equivelent lf it in pytorch
@mnbvzxcv1
@mnbvzxcv1 11 годин тому
GOD
@PopescuAlexandruCristian
@PopescuAlexandruCristian 13 годин тому
This is the best back propagation explanation I ever got. Congratulations! This is great!
@AndresNamm
@AndresNamm 15 годин тому
In this lecture at time arount 18.50 Andrej says that you get white when the statement is true and black if the statement is false. This is actually the opposite. The dead neuron problem occurs when you get completely black neurons under the imshow. In those cases the gradient does not change for the upstream parameters. Does anybody agree+
@Alley00Cat
@Alley00Cat 17 годин тому
The legend that introduced me to DL all those years ago continues to awe us. Bravo
@dead_d20dice67
@dead_d20dice67 19 годин тому
Hi! Can you make video explanation about new Kolmogorov Arnold Network (KAN) ? It's seems to be a new revolution in NN 🤔
@drtech6521
@drtech6521 19 годин тому
This was very effective, knowledgeable and overwhelming at the same time 😂
@alexanderchernikov7497
@alexanderchernikov7497 День тому
Brilliant video!
@jianjielu9835
@jianjielu9835 День тому
extremely simple and interesting! Thank you!
@matthewhendricks648
@matthewhendricks648 День тому
does anyone else hear that crazy loud ringing noise?
@herashak
@herashak День тому
thank-you! audio is very low though
@percevil8050
@percevil8050 День тому
My tiny brain just blew up!
@GlitchiPitch
@GlitchiPitch День тому
hi everyone if i have losses like this (train loss 0.6417 val loss 6.7432) but an output not bad, is it normal?))
@hongwuhuai
@hongwuhuai 2 дні тому
Hi Andrej, thanks a ton for the free education, I am enjoying learning throught the lessions! A quick observation, the output of the explicit counting method and the nn method seem different in the end: counting table sampling -> nn sampling: koneraisah. -> kondlaisah. andhumizarie. -> anchthizarie. can you comment? my intuitions is telling me that the nn is just an approximation of the countering table, never exact. Not sure if this is the right way of reasoning. Thanks!
@FireFly969
@FireFly969 2 дні тому
Thank you so much Mr Andrey kaparthey, I watched and practice a pytorch course of like 52 hours and it was awesome, but after watching your video, it's seems that I was more of like learning how to build a neural network, more then how neural network works. With your video I know exactly how it works, and iam planning to watch all of this playlist, and see all of almost all your blog posts ❤ thank you and have a nice day.
@warwicknexus160
@warwicknexus160 2 дні тому
Brilliant
@oleksandrasaskia
@oleksandrasaskia 2 дні тому
Thank you so much!!! For democratizing education and this technology for all of us! AMAZING! Much much love!
@monocles.IcedPeaksOfFire
@monocles.IcedPeaksOfFire 2 дні тому
❤ > It's (a) pleasure
@akhilphilnat
@akhilphilnat 2 дні тому
fine tuning next please
@user-cb3pf6qf2z
@user-cb3pf6qf2z 2 дні тому
Is it possible to just calculate the gradients once and then you know them? Not resetting and recalculating. What am I missing?
@lielbn0
@lielbn0 2 дні тому
Thanks!! I have never understood better how a neural network works!
@Clammer999
@Clammer999 2 дні тому
One of the best videos on under the hood look in LLMs. Love the clarity and patience Andrej imparts considering he’s such a legend in AI.
@anthonyjackson7644
@anthonyjackson7644 2 дні тому
Was i the only one invested with the leaf story😭
@mikezhao1838
@mikezhao1838 3 дні тому
Hi Not sure what happens, but I can't join discord, it says "unable to accept invite". I have issue to get the "m" from multinormial distribution, I got 3 instead 13 if I use num_samples=1. if I use num_samples>1, I got 13. but I got "mi." if num_samples=2. people on line suggest to use touch 1.13.1, I dont have this old version on my macOs.
@AsmaKhan-lk3wb
@AsmaKhan-lk3wb 3 дні тому
Wow!!! This is extremely helpful and well-made. Thank you so much!
@AdityaAVG
@AdityaAVG 3 дні тому
This guy has become my favorite tutor .
@ezekwu77
@ezekwu77 3 дні тому
I love this learning resource and the simplicity of the tutorial style. Thanks at Andrej KArpathy.
@switchwithSagar
@switchwithSagar 4 дні тому
In the case of a simple Bigram model @32:38 we are sampling only one character, however, while calculating loss we consider the character with the highest probability. The character sampled is unlikely to be the same to the character with the highest probability in the row unless we sample a large number of characters from the multinomial distribution. So, my question is, does the loss function reflect the correct loss? Can anyone help me understand this.
@meow-mi333
@meow-mi333 4 дні тому
Thanks I really need this level of details to understand what’s going on. ❤
@MagicBoterham
@MagicBoterham 4 дні тому
1:25:23 Why is the last layer made "less confident like we saw" and where did we see this?
@marcelomenezes3796
@marcelomenezes3796 4 дні тому
This is the best video ever about Intro to LLM.
@MrManlify
@MrManlify 4 дні тому
How were you able to run the loop without adding a requires_grad command in the "implementing the training loop, overfitting one batch" section of the video? For me it only worked when I changed the lines to: g = torch.Generator().manual_seed(2147483647) # For Reproducibility C = torch.randn((27, 2), generator=g, requires_grad=True) W1 = torch.randn((6, 100), generator=g, requires_grad=True) b1 = torch.randn(100, generator=g, requires_grad=True) W2 = torch.randn((100, 27), generator=g, requires_grad=True) b2 = torch.randn(27, generator=g, requires_grad=True) parameters = [C, W1, b1, W2, b2]
@AlexTang99
@AlexTang99 4 дні тому
This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!
@adirmashiach4639
@adirmashiach4639 4 дні тому
Something you didn't explain - 51:30 - if we want L to go up we simply need to increase the variables in the direction of the gradient? How come it is so if some gradients are negative?
@yourxylitol
@yourxylitol 4 дні тому
first question: yes second question: because thats the definition of a gradient -> If the gradient is negative, this means that if you make the data smaller, the loss will increase
@howardbaek5413
@howardbaek5413 5 днів тому
This is the single best explanation of backpropagation in code that I've seen so far. Thanks Andrej.
@ThefirstrobloxCEO989
@ThefirstrobloxCEO989 5 днів тому
Thanks a lot for the insight. and demonstration. I really look forward more videos from you, Andrej!
@debdeepsanyal9030
@debdeepsanyal9030 5 днів тому
just a random fun fact, with gen = torch.Generator().manual_seed(2147483647), the bigram generated name i got was `c e x z e .`, amazing.
@mehulchopra1517
@mehulchopra1517 6 днів тому
Thanks a ton for this Andrej! Explained and presented in such simple and relatable terms. Gives confidence to get into the weeds now.
@wangcwy
@wangcwy 6 днів тому
The best ML tutorial video I have watched this year. I really like detailed example, and how these difficult concepts are explained in a simple manner. What a treat for me to watch and learn!
@hotshot-te9xw
@hotshot-te9xw 6 днів тому
What methods of alignment do you personally feel are extreamly promising for ensuring future AGI doesnt kill us all
@a000000j
@a000000j 6 днів тому
One of the best explanation of LLM....
@ced1401
@ced1401 6 днів тому
Thanks you very much
@quentinquarantino8261
@quentinquarantino8261 6 днів тому
Is this the real, one and only Andrej Karpathy? Or is this a deep fake?
@soumilbinhani8803
@soumilbinhani8803 6 днів тому
Hello sir, it would be great if you could make a video on how exactly these tokens are converted into embedding vectors, eg - how to make word to vec.. Thank you
@soblueskyzll
@soblueskyzll 7 днів тому
I am following exactly (i believe) to calculate all the gradients, but beginning from dhpreact, the results show "exact: False, approximate True", with maxdiff on the order of 1e-9 ~ 1e-10. Is it just some numerical issue, or I did something wrong? Anyone had same issue?
@Kevin.Kawchak
@Kevin.Kawchak 7 днів тому
Thank you for the discussion
@josephmathew4667
@josephmathew4667 7 днів тому
Thank you so much Andrej. As many has already commented, this was by far one of the best lectures I have ever listened.
@chadlinden6912
@chadlinden6912 7 днів тому
Learning the math is really interesting, helps to build a mental image of a plane or block of vectors shifting while training. I'd be curious to know if in the history/evolution of ML and AI if hardware drove intense matrix math derived software solutions to AI, or if improving hardware made this math possible.