Building makemore Part 5: Building a WaveNet

  Переглядів 152,256

Andrej Karpathy

Andrej Karpathy

День тому

We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dilated convolutions (not yet covered). Along the way we get a better sense of torch.nn and what it is and how it works under the hood, and what a typical deep learning development process looks like (a lot of reading of documentation, keeping track of multidimensional tensor shapes, moving between jupyter notebooks and repository code, ...).
Links:
- makemore on github: github.com/karpathy/makemore
- jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
- collab notebook: colab.research.google.com/dri...
- my website: karpathy.ai
- my twitter: / karpathy
- our Discord channel: / discord
Supplementary links:
- WaveNet 2016 from DeepMind arxiv.org/abs/1609.03499
- Bengio et al. 2003 MLP LM www.jmlr.org/papers/volume3/b...
Chapters:
intro
00:00:00 intro
00:01:40 starter code walkthrough
00:06:56 let’s fix the learning rate plot
00:09:16 pytorchifying our code: layers, containers, torch.nn, fun bugs
implementing wavenet
00:17:11 overview: WaveNet
00:19:33 dataset bump the context size to 8
00:19:55 re-running baseline code on block_size 8
00:21:36 implementing WaveNet
00:37:41 training the WaveNet: first pass
00:38:50 fixing batchnorm1d bug
00:45:21 re-training WaveNet with bug fix
00:46:07 scaling up our WaveNet
conclusions
00:46:58 experimental harness
00:47:44 WaveNet but with “dilated causal convolutions”
00:51:34 torch.nn
00:52:28 the development process of building deep neural nets
00:54:17 going forward
00:55:26 improve on my loss! how far can we improve a WaveNet on this data?

КОМЕНТАРІ: 176
@khisrowhashimi
@khisrowhashimi Рік тому
I love how we are all so stressed and worried that Andrej might grow apathetic to his UKposts channel, so everyone wants to be extra supportive 😆 Really shows how awesome of a communicator he is.
@TL-fe9si
@TL-fe9si Рік тому
I'm literally thinking about it when I saw this comment
@jordankuzmanovik5297
@jordankuzmanovik5297 7 місяців тому
Unfortunately he did it :(
@isaac10231
@isaac10231 7 місяців тому
​@@jordankuzmanovik5297Hopefully he comes back.
@crayc3
@crayc3 Рік тому
Notification for a new andrej video guide feels like a new season of game of thrones just dropped at this point.
@nervoushero1391
@nervoushero1391 Рік тому
As a independent deep learning undergrad student ur videos helps me a lot. Thank u andrej Never stop this series.
@anrilombard1121
@anrilombard1121 Рік тому
We're on the same road!
@tanguyrenaudie1261
@tanguyrenaudie1261 Рік тому
Love the series as well ! Coding through all of it. Would love to get together with people to replicate deep learning papers, like Andrej does here, to learn faster and not by myself.
@raghavravishankar6262
@raghavravishankar6262 Рік тому
@@tanguyrenaudie1261 I'm in the same boat as well do you have a discord or something where we can talk further?
@raghavravishankar6262
@raghavravishankar6262 Рік тому
@Anri Lombard @ Nervous Hero
@Katatonya
@Katatonya 22 дні тому
@@raghavravishankar6262Andrej does have a server, we could meet there and then start our own. My handle is vady. (with a dot) if anyone wants to add me, or ping me in Andrej's server.
@PollPoII
@PollPoII Рік тому
This series is the most interesting resource for DL I've come across, being a junior ML engineer myself. To be able to watch such a knowledgeable domain expert as Andrej explaining everything in the most understandable ways is a real privilege. A million thanks for you time and effort, looking forward to the next one and hopefully many more.
@GlennGasner
@GlennGasner 11 місяців тому
I really, really appreciate you putting in the work to create these lectures. I hope you can really feel the weight of the nearly hundred thousand humans who pushed through 12 hours of lectures on this because you've made it accessible. And that's just through now. These videos are such an incredible gift. Half of the views are me because I needed to watch each so many times in order to understand what's happening, because I started from so little. Also, it's super weird how different you are from other UKpostsrs and yet how likable you become as a human during this series. You are doing this right, and I appreciate it.
@timelapseguys4042
@timelapseguys4042 Рік тому
Andrej, thanks a lot for the video! Please do not stop continuing the series. It's an honor to learn from you.
@maestbobo
@maestbobo Рік тому
Best resource by far for this content. Please keep making more of these; I feel I'm learning a huge amount from each video.
@Zaphod42Beeblebrox
@Zaphod42Beeblebrox Рік тому
I experimented a bit with the MLP with 1 hidden layer and managed to scale it up to your fancy hierarchical model. :) Here is what i got: MLP(105k parameters): block_size = 10 emb_dim = 18 n_hidden = 500 lr = 0.1 # used the same learning rate decay as in the video epochs = 200000 mini_batch = 32 lambd = 1 ### added L2 regularization seed is 42 Training error: 1.7801 Dev error: 1.9884 Test error: 1.9863 (I checked this only becouse I was worried that somehow I overfitted the dev set) Some examples generated from the model that I kinda liked: Angelise Fantumrise Bowin Xian Jaydan
@oklm2109
@oklm2109 Рік тому
What's the formula to calculate the number of parameters of an MLP model?
@amgad_hasan
@amgad_hasan 11 місяців тому
@@oklm2109 You just add the trainable parameters of every layer. If the model contains only Fully connected layers (aka linear in pytorch or dense in tf), the number of parameters for each layer is: n_weights = n_in*n_hidden_units n_biases = n_hidden units n_params = n_weights + n_biases = (1+n_input)*(n_hidden_units) n_in: number of inputs (think of it as the number of outputs(or hidden units) from the last layer. This formula is valid for Linear layers, other types of layers may have different formula.
@glebzarin2619
@glebzarin2619 8 місяців тому
I'd say that it is slightly not fair not to compare models with different block sizes. Because it not only influences the number of parameters but also the amount of information given as input.
@kshitijbanerjee6927
@kshitijbanerjee6927 10 місяців тому
Hey Andrej! I hope you continue and give us the RNN, GRU & Transformer lectures as well! The chatGPT one is great, but I feel like we missed the story in the middle, and jumped the story because of ChatGPT
@SupeHero00
@SupeHero00 10 місяців тому
The ChatGPT lecture is the Transformer lecture.. And regarding RNNs, I don't see why would anyone still use it...
@kshitijbanerjee6927
@kshitijbanerjee6927 10 місяців тому
transformers yes . but it’s not like anyone will build bigrams either, it’s about learning the concepts like BPTT etc from roots
@SupeHero00
@SupeHero00 10 місяців тому
@kshitijbanerjee6927 Bigrams and MLPs help you understand Transformers (which is the SOA).. Anyway IMO it would be a waste of time creating a lecture on RNNs, but if the majority want it, then maybe he should do it.. I don't care
@kshitijbanerjee6927
@kshitijbanerjee6927 10 місяців тому
Fully disagree that it’s not useful. I think the concepts of how they came up unrolling and BPTT, the gates used to solve long term memory problems are invaluable to appreciate and understand why transformers are such a big deal.
@attilakun7850
@attilakun7850 4 місяці тому
@@SupeHero00 RNNs are coming back due to SSMs like Mamba.
@brittaruiters6309
@brittaruiters6309 Рік тому
I love this series so much :) it has profoundly deepened my understanding of neural networks and especially backpropagation. Thank you
@mipmap256
@mipmap256 Рік тому
Can't wait for part 6! So clear and I can follow step by step. Thanks so much
@hintzod
@hintzod Рік тому
Thank you so much for these videos. I really enjoy these deep dives, things make so much more sense when you're hand coding all the functions and running through examples. It's less of a black box and more intuitive. I hope this comment will encourage you to keep this going!
@ephemer
@ephemer Рік тому
Thanks so much for this series, I feel like this is the most important skill I might ever learn and it’s never been more accessible than in your lectures. Thank you!
@aanchalagarwal6886
@aanchalagarwal6886 10 місяців тому
Thank you Andrej for creating this series. It has been very helpful. I just hope you get the time to continue with it.
@sakthigeek2458
@sakthigeek2458 Місяць тому
Learned a lot of practical tips and theoretical knowledge of why we do what we do and also the history of how Deep Learning evolved. Thanks a lot for this series. Requesting you to continue the series.
@panagiotistseles1118
@panagiotistseles1118 5 місяців тому
Totally amazed by the amount of good work you put in. You've helped a lot of people Andrej. Keep up the good work
@stanislawcronberg3271
@stanislawcronberg3271 Рік тому
My favorite way to start a Monday morning is to wake up to a new lecture in Andrej's masterclass :)
@sunderrajan6172
@sunderrajan6172 Рік тому
Beautifully explained as always - thanks. It shows how much passion you have to come up with these awesome videos. We all blessed!
@Leon-yp9yw
@Leon-yp9yw Рік тому
I was worried I was going to have to wait a couple of months for the next video as I finished part 4 just last week. Can't wait to get into this one, thanks a lot for this series Andrej
@nikitaandriievskyi3448
@nikitaandriievskyi3448 Рік тому
I just found your youtube channel, and this is just amazing, please do not stop doing these videos, they are incredible
@eustin
@eustin Рік тому
Yes! I've been telling everyone about these videos. I've been checking whether you posted the next video everyday. Thank you.
@stracci_5698
@stracci_5698 Рік тому
This is truly the best dl content out there. Most courses just focus on the theory but lack deep understanding.
@rajeshparekh
@rajeshparekh 3 місяці тому
Thank you so much for creating this video lecture series. Your passion for this topic comes through so vividly in your lectures. I learned so much from every lecture and especially appreciated how the lectures started from the foundational concepts and built up to the state-of-the art techniques. Thank you!
@1knmd
@1knmd Рік тому
Everytime a new video is out is like christmas for me!, please don't stop doing this, best ML content out there.
@tecknowledger
@tecknowledger Рік тому
Thanks again Andrej! Love these videos! Dream come true to watch and learn these! Thanks for all you do to help people! You're helpfulness ripples throughout the world! Thanks again! lol
@NarendraBME
@NarendraBME 3 місяці тому
So far THE BEST lecture series I came across on UKposts. Along side learning the neural networks in this series, I have learned the PyTorch more than learning it by waching a PyTorch video series of 26 hrs from a youtuber.
@cktse_jp
@cktse_jp 2 місяці тому
Just wanna say thank you for sharing your experience -- love this from-scratch series starting from first principles!
@brianwhite9137
@brianwhite9137 Рік тому
Very grateful for these. An early endearing moment was in the Spelled-Out Intro when you took a moment to find the missing parentheses for 'print.'
@flwi
@flwi Рік тому
Great series! I really enjoy the progress and good explanations.
@yanazarov
@yanazarov Рік тому
Absolutely awesome stuff Andrej. Thank you for doing this.
@timandersen8030
@timandersen8030 Рік тому
Thank you, Andrej! Looking forward to the rest of the series!
@timowidyanvolta
@timowidyanvolta 9 місяців тому
Please continue, I really like this series. You are an awesome teacher!
@WarrenLacefield
@WarrenLacefield Рік тому
Enjoying these video so much. To refresh most of what I've forgotten about Python and to begin playing with pytorch. Last I did this stuff myself was with C# and CNTK. Now going back to rebuild and rerun old models and data (much faster even & "better" results). Thank you.
@thanikhurshid7403
@thanikhurshid7403 Рік тому
Andrej you are the absolute greatest. Keep making your videos. Anxiously waiting to implement Transformers with you
@AndrewOrtman
@AndrewOrtman Рік тому
When I did the mean() trick at ~8:50 I left out an audible gasp! That was such a neat trick, going to use that one in the future
@VasudevaK
@VasudevaK Рік тому
Sir, it's pleasure to learn from you! Thank you so much. Will be meeting you one day in-person, just to thank you.
@AlienLogic775
@AlienLogic775 Рік тому
Thanks so much Andrej! Hope to see a Part 6
@aurelienmontmejat1077
@aurelienmontmejat1077 Рік тому
This is the best deep learning course I've followed! Even better than the one on Coursera. Thanks!
@ShinShanIV
@ShinShanIV Рік тому
Thank you so much Andrej for the series, it helps me a lot. You are one of the reasons I was able to get into ML and build a career there. I admire your teaching skills! I didn't get why the sequence dim has to be part of the batch dimension, and I didn't hear Andrej talk about it explicitly, so here is my reasoning: The sequence dimension is an additional batch dimension because the output before batch norm is created by a linear layer with (32, 4, 20) @ (20, 68) + (68) which performs the matrix multiplication only with the last dimension (.., .., 20) and in parallel on the first two. So, the matrix multiplication is performed 32 * 4 times with (20) @ (20, 68). Thus, it's the same as having a (128, 20) @ (20, 68) calculation, where (32 * 4) = 128 is the batch dimension. So, the sequence dimension is treated effectively as if it was a "batch dimensions" in the linear layer and must be treated that way in batch norm too. (would be great if someone could confirm)
@ThemeParkTeslaCamping360
@ThemeParkTeslaCamping360 Рік тому
Incredible video this helps a lot. Thank you for videos, especially I loved your Stanford videos regarding machine learning from scratch and that's how you do it without any libraries like tensorflow and pytorch. Keep going and thank you for helping hungry learners like me!!! Cheers 🥂
@ishaanrajpal273
@ishaanrajpal273 Рік тому
My best way to learn is to learn from one of the most experienced person in the field. Thanks for everything Andrej
@art4eigen93
@art4eigen93 9 місяців тому
Please continue this series Sir Andrje. You are the savior!
@milankordic
@milankordic Рік тому
Was looking forward to this one. Thanks, Andrej!
@EsdrasSoutoCosta
@EsdrasSoutoCosta Рік тому
Awesome! Well explained and clear what's being done. Please keep doing this fantastic videos!!!
@creatureOfnature1
@creatureOfnature1 Рік тому
Much appreciated, Andrej. Your tutorials are gem!
@kaushik333ify
@kaushik333ify 8 місяців тому
Thank you so much for these lectures ! Can you please make a video on the “experimental harness” you mention at the end of the video? It would be super helpful and informative.
@kimiochang
@kimiochang Рік тому
Finally Completed this one. As always thank you Andrej for your generosity! Next I will practice through all five parts again and learn how to accelerate the training process by using GPUs.
@kaenovama
@kaenovama Рік тому
Thank you! Love the series! Helped me a lot with my learning experience with PyTorch
@meisherenow
@meisherenow 4 місяці тому
How cool is it that anyone with an internet connection has access to such a great teacher? (answer: very)
@4mb127
@4mb127 Рік тому
Thanks for continuing this fantastic series.
@ERRORfred2458
@ERRORfred2458 10 місяців тому
Andrej, thanks for all you do for us. You're the best.
@ayogheswaran9270
@ayogheswaran9270 Рік тому
@Andrej thank you for making this. Please continue making such videos. It really helps beginners like me. If possible, could you please make a series of how actual development and production is done.
@Leo-sy4vu
@Leo-sy4vu Рік тому
Thank you soo much for the series i recently started it and its the best thing on the entire youtube. keep it up
@mellyb.1347
@mellyb.1347 Місяць тому
Loved this series. Would you please be willing to continue it so we get to work through the rest of CNN, RNN, and LSTM? Thanks!
@michaelmuller136
@michaelmuller136 3 місяці тому
That was a very great playlist, easy to understand and very helpfull, thank you very much!!
@utkarshsingh1663
@utkarshsingh1663 Рік тому
Thanks Andrej this course is awesome for base building..
@polloramirez
@polloramirez Рік тому
Great content, Andrej! Keep them coming!
@kemalware4912
@kemalware4912 Рік тому
Deliberate errors on the right spot.. Your lectures are great.
@veeramahendranathreddygang1086
@veeramahendranathreddygang1086 Рік тому
Thank you Sir. Have been waiting for this.
@fatihveyselnurcin
@fatihveyselnurcin Рік тому
Thank you Andrej, hope to see you again soon
@mobkiller111
@mobkiller111 Рік тому
Thanks for the content & explanations Andrej and have a great time in Kyoto :)
@kindoblue
@kindoblue Рік тому
Every video another solid pure gold bar
@thehazarika
@thehazarika Рік тому
This is philanthropy! I love you man!
@pablofernandez2671
@pablofernandez2671 Рік тому
Andrej, we all love you. You're amazing!
@wholenutsanddonuts5741
@wholenutsanddonuts5741 Рік тому
Fant wait for this next step in the process!
@Abhishekkumar-qj6hb
@Abhishekkumar-qj6hb 10 місяців тому
So I ended up this lecture series and I was expecting RNN/LSTM/GRU but was not there however throughout learnt a lot that can definitely on my own. Thanks Andrej
@aidanbraski
@aidanbraski 3 місяці тому
great video, been learning a ton from you recently. thank you andrej!
@BlockDesignz
@BlockDesignz Рік тому
Please keep these coming!
@vivekpandit7417
@vivekpandit7417 Рік тому
Been waiting for awhile. Thankyouuu !!
@duonga.nguyen7826
@duonga.nguyen7826 Рік тому
Keep up your great work!
@enchanted_swiftie
@enchanted_swiftie 9 місяців тому
The sentence that Anderej said at 49:26 made me realize something, something very deep. 🔥
@ivaninkorea
@ivaninkorea 2 місяці тому
Awesome series!
@repostcussion
@repostcussion Рік тому
Amazing video! I'm absolutely loving the series, and following along in my own notebooks :) I'm curious about the first layer embedding, and what kinds of alternatives there are? More information could be given by increasing the size of the embedding to the size of the vocab to make it a onehot. I imagine there should be more alternatives beyond this, maybe something that can use the int32 char ints directly?
@DanteNoguez
@DanteNoguez Рік тому
Thanks, Andrej, you're awesome!
@nickgannon7466
@nickgannon7466 Рік тому
You're crushing it, thanks a bunch.
@Jack-vv7zb
@Jack-vv7zb Місяць тому
i love it when you say bye and then pop back up 😂😂😂😂
@reubenthomas1033
@reubenthomas1033 Рік тому
Awesome content!!
@fajarsuharyanto8871
@fajarsuharyanto8871 Рік тому
Rarely finish entire episode. He'i Andrej 👌
@lotfullahandishmand4973
@lotfullahandishmand4973 Рік тому
Dear Andrej your work is amazing, we are here to share and have a beautiful world all together and you are doing that. If you could make a video about Convolution NNs, or Image net top architectures, any thing deep related to vision, that would be great Thank you !
@CarlosReyes-ku6ub
@CarlosReyes-ku6ub Рік тому
Awesome, thank you so much
@joekharris
@joekharris Рік тому
I'm learning so much. I really appreciate the lucidity and simplicity of your approach. I do have a question. Why not initialize running_mean and running_var to None and then set them on the first batch? That would seem to be a better approach than to start them at zero and would be consistent with making them exponentially weighted moving averages - which they are except for the initialization at 0.0.
@venkateshmunagala205
@venkateshmunagala205 Рік тому
AI Devil is back . Thanks for the video @Andrej Karpathy.
@dimitristaufer
@dimitristaufer Рік тому
Hi Andrej, thank you for taking the time to create these videos. In this video, for the first time, I'm having difficulties understanding what the model is actually learning. I've watched it twice and tried to understand the WaveNet paper, but that isn't really helping. Given an example input “emma“, the following character is supposed to be “.“, why is it beneficial to create a hidden layer to process “em“, “ma“, and then “emma“? Are we essentially encoding that given a 4 character word, IF the first two characters are “em“ it is likely that the 5th character is “.“, no matter what the third and fourth characters are? In other words, this implementation would probably assign a higher probability that “.“ is the fifth character after an unseen name, e.g. “emli“, simply because it starts with the bigram “em“? Thanks in advance, Dimitri.
@arielfayol7198
@arielfayol7198 11 місяців тому
Please don't stop the series😢
@8eck
@8eck Рік тому
Finally finished all the lectures and i understood that i have a bad math understanding and bad understanding of dimensionality and operations over it. Anyways, thank you for helping out with the rest concepts and practices, i do better understand now of how backprop is working and what it is doing and what for.
@Ali-lm7uw
@Ali-lm7uw Рік тому
Jon Krohn has some a full playlist of algebra and calculus before starting machine learning
@Joker1531993
@Joker1531993 Рік тому
I am subscribing Andrej, just to support someone from our country, Slovakia. Even I don't understand nothing from the video >D
@alekseizinchenko1171
@alekseizinchenko1171 Рік тому
Just in time ❤
@philipwoods6720
@philipwoods6720 Рік тому
SO EXCITED TO SEE THIS POSTED LEEEEETS GOOOOOOOO
@amirkonjkav5374
@amirkonjkav5374 Рік тому
Thanks for your videos ,is it possible to talk about nlp special about the background of it?
@aisolutions834
@aisolutions834 Рік тому
Hi Andrej, Great content, Would you please go over the Transformer paper and its implementation?
@shouryamann7830
@shouryamann7830 10 місяців тому
ive been using this step loss function and I've been consistently getting slight better training and validation losses. for this i got 1.98 val loss. lr = 0.1 if i < 100000 else (0.01 if i < 150000 else 0.001)
@netanelmad
@netanelmad 6 місяців тому
Thank you very much.
@MrKonstantiniesta
@MrKonstantiniesta 7 місяців тому
Hi, what is the next video to watch in the series Andrey announced at the end?
@Erosis
@Erosis Рік тому
Numpy / torch / tf tensor reshaping always feels like handwaivy magic.
@simonkotchou9644
@simonkotchou9644 Рік тому
Thanks so much
@redthunder6183
@redthunder6183 8 місяців тому
With the batchnorm bug at around 46:00, why does it still work?, if the batch norm is producing the wrong shape why is there not an error? Also why does the network still learn almost perfectly when the batch norm is normalizing over the wrong dimension???
@nova2577
@nova2577 Рік тому
Could you also do some video related to wave2vec, as well as GPT series? Much appreciated!! Started to follow your online video lectures when you were at Stanford.
@colehoward5144
@colehoward5144 Рік тому
Great video! In your next video, would you be able add a section where you show how to matrix multiply n-dimensional tensors? I am a little confused by what the output/shape should be for something like (6, 3, 9, 9) @ (3, 9, 3)
@milosz7
@milosz7 7 місяців тому
multiplying matrces with these shapes is not possible
@colehoward5144
@colehoward5144 7 місяців тому
@@milosz7yeah it doesn't look like it at first, but they are compatible. Results in output shape (6,3,9,3)
@user-co6pu8zv3v
@user-co6pu8zv3v Рік тому
Thanl you, Andrej
@jackfrost7734
@jackfrost7734 Рік тому
@AndrejKarpathy are you planning to introduce the topic of uncertainty estimation on NN model?
Building makemore Part 3: Activations & Gradients, BatchNorm
1:55:58
Andrej Karpathy
Переглядів 237 тис.
What is Ideatecraft? Follow the storyline...
15:46
Driftingdane
Переглядів 1
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Переглядів 4,1 млн
Heroes of Deep Learning: Andrew Ng interviews Andrej Karpathy
15:11
Preserve Knowledge
Переглядів 175 тис.
Day in the life of Andrej Karpathy | Lex Fridman Podcast Clips
12:45
Building makemore Part 4: Becoming a Backprop Ninja
1:55:24
Andrej Karpathy
Переглядів 165 тис.
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
TED
Переглядів 243 тис.
Let's build the GPT Tokenizer
2:13:35
Andrej Karpathy
Переглядів 442 тис.
Jeff Dean (Google): Exciting Trends in Machine Learning
1:12:30
Rice Ken Kennedy Institute
Переглядів 163 тис.
Power AC Coolness with Anker SOLIX F3800
0:27
Anker SOLIX
Переглядів 2,9 млн
The BRIGHTEST Monitor We've EVER Seen - Sun Vision rE rLCD Display
14:25
Linus Tech Tips
Переглядів 868 тис.
GOOGLE СДЕЛАЛИ НЕВОЗМОЖНОЕ! Это круче любого Samsung, Apple и Xiaomi…
13:16
Thebox - о технике и гаджетах
Переглядів 58 тис.
Это БЕСИТ ВСЕХ пользователей iPhone!!! 😡
28:07
Яблочный Маньяк
Переглядів 24 тис.