Building makemore Part 5: Building a WaveNet

День тому

We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dilated convolutions (not yet covered). Along the way we get a better sense of torch.nn and what it is and how it works under the hood, and what a typical deep learning development process looks like (a lot of reading of documentation, keeping track of multidimensional tensor shapes, moving between jupyter notebooks and repository code, ...).
Links:
- makemore on github: github.com/karpathy/makemore
- jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
- collab notebook: colab.research.google.com/dri...
- my website: karpathy.ai
- my twitter: / karpathy
- our Discord channel: / discord
Supplementary links:
- WaveNet 2016 from DeepMind arxiv.org/abs/1609.03499
- Bengio et al. 2003 MLP LM www.jmlr.org/papers/volume3/b...
Chapters:
intro
00:00:00 intro
00:01:40 starter code walkthrough
00:06:56 let’s fix the learning rate plot
00:09:16 pytorchifying our code: layers, containers, torch.nn, fun bugs
implementing wavenet
00:17:11 overview: WaveNet
00:19:33 dataset bump the context size to 8
00:19:55 re-running baseline code on block_size 8
00:21:36 implementing WaveNet
00:37:41 training the WaveNet: first pass
00:38:50 fixing batchnorm1d bug
00:45:21 re-training WaveNet with bug fix
00:46:07 scaling up our WaveNet
conclusions
00:46:58 experimental harness
00:47:44 WaveNet but with “dilated causal convolutions”
00:51:34 torch.nn
00:52:28 the development process of building deep neural nets
00:54:17 going forward
00:55:26 improve on my loss! how far can we improve a WaveNet on this data?

КОМЕНТАРІ: 176

@khisrowhashimi Рік тому

I love how we are all so stressed and worried that Andrej might grow apathetic to his UKposts channel, so everyone wants to be extra supportive 😆 Really shows how awesome of a communicator he is.

@TL-fe9si Рік тому

I'm literally thinking about it when I saw this comment

@jordankuzmanovik5297 7 місяців тому

Unfortunately he did it :(

@isaac10231 7 місяців тому

@@jordankuzmanovik5297Hopefully he comes back.

@crayc3 Рік тому

Notification for a new andrej video guide feels like a new season of game of thrones just dropped at this point.

@nervoushero1391 Рік тому

As a independent deep learning undergrad student ur videos helps me a lot. Thank u andrej Never stop this series.

@anrilombard1121 Рік тому

We're on the same road!

@tanguyrenaudie1261 Рік тому

Love the series as well ! Coding through all of it. Would love to get together with people to replicate deep learning papers, like Andrej does here, to learn faster and not by myself.

@raghavravishankar6262 Рік тому

@@tanguyrenaudie1261 I'm in the same boat as well do you have a discord or something where we can talk further?

@raghavravishankar6262 Рік тому

@Anri Lombard @ Nervous Hero

@Katatonya 22 дні тому

@@raghavravishankar6262Andrej does have a server, we could meet there and then start our own. My handle is vady. (with a dot) if anyone wants to add me, or ping me in Andrej's server.

@PollPoII Рік тому

This series is the most interesting resource for DL I've come across, being a junior ML engineer myself. To be able to watch such a knowledgeable domain expert as Andrej explaining everything in the most understandable ways is a real privilege. A million thanks for you time and effort, looking forward to the next one and hopefully many more.

@GlennGasner 11 місяців тому

I really, really appreciate you putting in the work to create these lectures. I hope you can really feel the weight of the nearly hundred thousand humans who pushed through 12 hours of lectures on this because you've made it accessible. And that's just through now. These videos are such an incredible gift. Half of the views are me because I needed to watch each so many times in order to understand what's happening, because I started from so little. Also, it's super weird how different you are from other UKpostsrs and yet how likable you become as a human during this series. You are doing this right, and I appreciate it.

@timelapseguys4042 Рік тому

Andrej, thanks a lot for the video! Please do not stop continuing the series. It's an honor to learn from you.

@maestbobo Рік тому

Best resource by far for this content. Please keep making more of these; I feel I'm learning a huge amount from each video.

@Zaphod42Beeblebrox Рік тому

I experimented a bit with the MLP with 1 hidden layer and managed to scale it up to your fancy hierarchical model. :) Here is what i got: MLP(105k parameters): block_size = 10 emb_dim = 18 n_hidden = 500 lr = 0.1 # used the same learning rate decay as in the video epochs = 200000 mini_batch = 32 lambd = 1 ### added L2 regularization seed is 42 Training error: 1.7801 Dev error: 1.9884 Test error: 1.9863 (I checked this only becouse I was worried that somehow I overfitted the dev set) Some examples generated from the model that I kinda liked: Angelise Fantumrise Bowin Xian Jaydan

@oklm2109 Рік тому

What's the formula to calculate the number of parameters of an MLP model?

@amgad_hasan 11 місяців тому

@@oklm2109 You just add the trainable parameters of every layer. If the model contains only Fully connected layers (aka linear in pytorch or dense in tf), the number of parameters for each layer is: n_weights = n_in*n_hidden_units n_biases = n_hidden units n_params = n_weights + n_biases = (1+n_input)*(n_hidden_units) n_in: number of inputs (think of it as the number of outputs(or hidden units) from the last layer. This formula is valid for Linear layers, other types of layers may have different formula.

@glebzarin2619 8 місяців тому

I'd say that it is slightly not fair not to compare models with different block sizes. Because it not only influences the number of parameters but also the amount of information given as input.

@kshitijbanerjee6927 10 місяців тому

Hey Andrej! I hope you continue and give us the RNN, GRU & Transformer lectures as well! The chatGPT one is great, but I feel like we missed the story in the middle, and jumped the story because of ChatGPT

@SupeHero00 10 місяців тому

The ChatGPT lecture is the Transformer lecture.. And regarding RNNs, I don't see why would anyone still use it...

@kshitijbanerjee6927 10 місяців тому

transformers yes . but it’s not like anyone will build bigrams either, it’s about learning the concepts like BPTT etc from roots

@SupeHero00 10 місяців тому

@kshitijbanerjee6927 Bigrams and MLPs help you understand Transformers (which is the SOA).. Anyway IMO it would be a waste of time creating a lecture on RNNs, but if the majority want it, then maybe he should do it.. I don't care

@kshitijbanerjee6927 10 місяців тому

Fully disagree that it’s not useful. I think the concepts of how they came up unrolling and BPTT, the gates used to solve long term memory problems are invaluable to appreciate and understand why transformers are such a big deal.

@attilakun7850 4 місяці тому

@@SupeHero00 RNNs are coming back due to SSMs like Mamba.

@brittaruiters6309 Рік тому

I love this series so much :) it has profoundly deepened my understanding of neural networks and especially backpropagation. Thank you

@mipmap256 Рік тому

Can't wait for part 6! So clear and I can follow step by step. Thanks so much

@hintzod Рік тому

Thank you so much for these videos. I really enjoy these deep dives, things make so much more sense when you're hand coding all the functions and running through examples. It's less of a black box and more intuitive. I hope this comment will encourage you to keep this going!

@ephemer Рік тому

Thanks so much for this series, I feel like this is the most important skill I might ever learn and it’s never been more accessible than in your lectures. Thank you!

@aanchalagarwal6886 10 місяців тому

Thank you Andrej for creating this series. It has been very helpful. I just hope you get the time to continue with it.

@sakthigeek2458 Місяць тому

Learned a lot of practical tips and theoretical knowledge of why we do what we do and also the history of how Deep Learning evolved. Thanks a lot for this series. Requesting you to continue the series.

@panagiotistseles1118 5 місяців тому

Totally amazed by the amount of good work you put in. You've helped a lot of people Andrej. Keep up the good work

@stanislawcronberg3271 Рік тому

My favorite way to start a Monday morning is to wake up to a new lecture in Andrej's masterclass :)

@sunderrajan6172 Рік тому

Beautifully explained as always - thanks. It shows how much passion you have to come up with these awesome videos. We all blessed!

@Leon-yp9yw Рік тому

I was worried I was going to have to wait a couple of months for the next video as I finished part 4 just last week. Can't wait to get into this one, thanks a lot for this series Andrej

@nikitaandriievskyi3448 Рік тому

I just found your youtube channel, and this is just amazing, please do not stop doing these videos, they are incredible

@eustin Рік тому

Yes! I've been telling everyone about these videos. I've been checking whether you posted the next video everyday. Thank you.

@stracci_5698 Рік тому

This is truly the best dl content out there. Most courses just focus on the theory but lack deep understanding.

@rajeshparekh 3 місяці тому

Thank you so much for creating this video lecture series. Your passion for this topic comes through so vividly in your lectures. I learned so much from every lecture and especially appreciated how the lectures started from the foundational concepts and built up to the state-of-the art techniques. Thank you!

@1knmd Рік тому

Everytime a new video is out is like christmas for me!, please don't stop doing this, best ML content out there.

@tecknowledger Рік тому

Thanks again Andrej! Love these videos! Dream come true to watch and learn these! Thanks for all you do to help people! You're helpfulness ripples throughout the world! Thanks again! lol

@NarendraBME 3 місяці тому

So far THE BEST lecture series I came across on UKposts. Along side learning the neural networks in this series, I have learned the PyTorch more than learning it by waching a PyTorch video series of 26 hrs from a youtuber.

@cktse_jp 2 місяці тому

Just wanna say thank you for sharing your experience -- love this from-scratch series starting from first principles!

@brianwhite9137 Рік тому

Very grateful for these. An early endearing moment was in the Spelled-Out Intro when you took a moment to find the missing parentheses for 'print.'

@flwi Рік тому

Great series! I really enjoy the progress and good explanations.

@yanazarov Рік тому

Absolutely awesome stuff Andrej. Thank you for doing this.

@timandersen8030 Рік тому

Thank you, Andrej! Looking forward to the rest of the series!

@timowidyanvolta 9 місяців тому

Please continue, I really like this series. You are an awesome teacher!

@WarrenLacefield Рік тому

Enjoying these video so much. To refresh most of what I've forgotten about Python and to begin playing with pytorch. Last I did this stuff myself was with C# and CNTK. Now going back to rebuild and rerun old models and data (much faster even & "better" results). Thank you.

@thanikhurshid7403 Рік тому

Andrej you are the absolute greatest. Keep making your videos. Anxiously waiting to implement Transformers with you

@AndrewOrtman Рік тому

When I did the mean() trick at ~8:50 I left out an audible gasp! That was such a neat trick, going to use that one in the future

@VasudevaK Рік тому

Sir, it's pleasure to learn from you! Thank you so much. Will be meeting you one day in-person, just to thank you.

@AlienLogic775 Рік тому

Thanks so much Andrej! Hope to see a Part 6

@aurelienmontmejat1077 Рік тому

This is the best deep learning course I've followed! Even better than the one on Coursera. Thanks!

@ShinShanIV Рік тому

Thank you so much Andrej for the series, it helps me a lot. You are one of the reasons I was able to get into ML and build a career there. I admire your teaching skills! I didn't get why the sequence dim has to be part of the batch dimension, and I didn't hear Andrej talk about it explicitly, so here is my reasoning: The sequence dimension is an additional batch dimension because the output before batch norm is created by a linear layer with (32, 4, 20) @ (20, 68) + (68) which performs the matrix multiplication only with the last dimension (.., .., 20) and in parallel on the first two. So, the matrix multiplication is performed 32 * 4 times with (20) @ (20, 68). Thus, it's the same as having a (128, 20) @ (20, 68) calculation, where (32 * 4) = 128 is the batch dimension. So, the sequence dimension is treated effectively as if it was a "batch dimensions" in the linear layer and must be treated that way in batch norm too. (would be great if someone could confirm)

@ThemeParkTeslaCamping360 Рік тому

Incredible video this helps a lot. Thank you for videos, especially I loved your Stanford videos regarding machine learning from scratch and that's how you do it without any libraries like tensorflow and pytorch. Keep going and thank you for helping hungry learners like me!!! Cheers 🥂

@ishaanrajpal273 Рік тому

My best way to learn is to learn from one of the most experienced person in the field. Thanks for everything Andrej

@art4eigen93 9 місяців тому

Please continue this series Sir Andrje. You are the savior!

@milankordic Рік тому

Was looking forward to this one. Thanks, Andrej!

@EsdrasSoutoCosta Рік тому

Awesome! Well explained and clear what's being done. Please keep doing this fantastic videos!!!

@creatureOfnature1 Рік тому

Much appreciated, Andrej. Your tutorials are gem!

@kaushik333ify 8 місяців тому

Thank you so much for these lectures ! Can you please make a video on the “experimental harness” you mention at the end of the video? It would be super helpful and informative.

@kimiochang Рік тому

Finally Completed this one. As always thank you Andrej for your generosity! Next I will practice through all five parts again and learn how to accelerate the training process by using GPUs.

@kaenovama Рік тому

Thank you! Love the series! Helped me a lot with my learning experience with PyTorch

@meisherenow 4 місяці тому

How cool is it that anyone with an internet connection has access to such a great teacher? (answer: very)

@4mb127 Рік тому

Thanks for continuing this fantastic series.

@ERRORfred2458 10 місяців тому

Andrej, thanks for all you do for us. You're the best.

@ayogheswaran9270 Рік тому

@Andrej thank you for making this. Please continue making such videos. It really helps beginners like me. If possible, could you please make a series of how actual development and production is done.

@Leo-sy4vu Рік тому

Thank you soo much for the series i recently started it and its the best thing on the entire youtube. keep it up

@mellyb.1347 Місяць тому

Loved this series. Would you please be willing to continue it so we get to work through the rest of CNN, RNN, and LSTM? Thanks!

@michaelmuller136 3 місяці тому

That was a very great playlist, easy to understand and very helpfull, thank you very much!!

@utkarshsingh1663 Рік тому

Thanks Andrej this course is awesome for base building..

@polloramirez Рік тому

Great content, Andrej! Keep them coming!

@kemalware4912 Рік тому

Deliberate errors on the right spot.. Your lectures are great.

@veeramahendranathreddygang1086 Рік тому

Thank you Sir. Have been waiting for this.

@fatihveyselnurcin Рік тому

Thank you Andrej, hope to see you again soon

@mobkiller111 Рік тому

Thanks for the content & explanations Andrej and have a great time in Kyoto :)

@kindoblue Рік тому

Every video another solid pure gold bar

@thehazarika Рік тому

This is philanthropy! I love you man!

@pablofernandez2671 Рік тому

Andrej, we all love you. You're amazing!

@wholenutsanddonuts5741 Рік тому

Fant wait for this next step in the process!

@Abhishekkumar-qj6hb 10 місяців тому

So I ended up this lecture series and I was expecting RNN/LSTM/GRU but was not there however throughout learnt a lot that can definitely on my own. Thanks Andrej

@aidanbraski 3 місяці тому

great video, been learning a ton from you recently. thank you andrej!

@BlockDesignz Рік тому

Please keep these coming!

@vivekpandit7417 Рік тому

Been waiting for awhile. Thankyouuu !!

@duonga.nguyen7826 Рік тому

Keep up your great work!

@enchanted_swiftie 9 місяців тому

The sentence that Anderej said at 49:26 made me realize something, something very deep. 🔥

@ivaninkorea 2 місяці тому

Awesome series!

@repostcussion Рік тому

Amazing video! I'm absolutely loving the series, and following along in my own notebooks :) I'm curious about the first layer embedding, and what kinds of alternatives there are? More information could be given by increasing the size of the embedding to the size of the vocab to make it a onehot. I imagine there should be more alternatives beyond this, maybe something that can use the int32 char ints directly?

@DanteNoguez Рік тому

Thanks, Andrej, you're awesome!

@nickgannon7466 Рік тому

You're crushing it, thanks a bunch.

@Jack-vv7zb Місяць тому

i love it when you say bye and then pop back up 😂😂😂😂

@reubenthomas1033 Рік тому

Awesome content!!

@fajarsuharyanto8871 Рік тому

Rarely finish entire episode. He'i Andrej 👌

@lotfullahandishmand4973 Рік тому

Dear Andrej your work is amazing, we are here to share and have a beautiful world all together and you are doing that. If you could make a video about Convolution NNs, or Image net top architectures, any thing deep related to vision, that would be great Thank you !

@CarlosReyes-ku6ub Рік тому

Awesome, thank you so much

@joekharris Рік тому

I'm learning so much. I really appreciate the lucidity and simplicity of your approach. I do have a question. Why not initialize running_mean and running_var to None and then set them on the first batch? That would seem to be a better approach than to start them at zero and would be consistent with making them exponentially weighted moving averages - which they are except for the initialization at 0.0.

@venkateshmunagala205 Рік тому

AI Devil is back . Thanks for the video @Andrej Karpathy.

@dimitristaufer Рік тому

Hi Andrej, thank you for taking the time to create these videos. In this video, for the first time, I'm having difficulties understanding what the model is actually learning. I've watched it twice and tried to understand the WaveNet paper, but that isn't really helping. Given an example input “emma“, the following character is supposed to be “.“, why is it beneficial to create a hidden layer to process “em“, “ma“, and then “emma“? Are we essentially encoding that given a 4 character word, IF the first two characters are “em“ it is likely that the 5th character is “.“, no matter what the third and fourth characters are? In other words, this implementation would probably assign a higher probability that “.“ is the fifth character after an unseen name, e.g. “emli“, simply because it starts with the bigram “em“? Thanks in advance, Dimitri.

@arielfayol7198 11 місяців тому

Please don't stop the series😢

@8eck Рік тому

Finally finished all the lectures and i understood that i have a bad math understanding and bad understanding of dimensionality and operations over it. Anyways, thank you for helping out with the rest concepts and practices, i do better understand now of how backprop is working and what it is doing and what for.

@Ali-lm7uw Рік тому

Jon Krohn has some a full playlist of algebra and calculus before starting machine learning

@Joker1531993 Рік тому

I am subscribing Andrej, just to support someone from our country, Slovakia. Even I don't understand nothing from the video >D

@alekseizinchenko1171 Рік тому

Just in time ❤

@philipwoods6720 Рік тому

SO EXCITED TO SEE THIS POSTED LEEEEETS GOOOOOOOO

@amirkonjkav5374 Рік тому

Thanks for your videos ,is it possible to talk about nlp special about the background of it?

@aisolutions834 Рік тому

Hi Andrej, Great content, Would you please go over the Transformer paper and its implementation?

@shouryamann7830 10 місяців тому

ive been using this step loss function and I've been consistently getting slight better training and validation losses. for this i got 1.98 val loss. lr = 0.1 if i < 100000 else (0.01 if i < 150000 else 0.001)

@netanelmad 6 місяців тому

Thank you very much.

@MrKonstantiniesta 7 місяців тому

Hi, what is the next video to watch in the series Andrey announced at the end?

@Erosis Рік тому

Numpy / torch / tf tensor reshaping always feels like handwaivy magic.

@simonkotchou9644 Рік тому

Thanks so much

@redthunder6183 8 місяців тому

With the batchnorm bug at around 46:00, why does it still work?, if the batch norm is producing the wrong shape why is there not an error? Also why does the network still learn almost perfectly when the batch norm is normalizing over the wrong dimension???

@nova2577 Рік тому

Could you also do some video related to wave2vec, as well as GPT series? Much appreciated!! Started to follow your online video lectures when you were at Stanford.

@colehoward5144 Рік тому

Great video! In your next video, would you be able add a section where you show how to matrix multiply n-dimensional tensors? I am a little confused by what the output/shape should be for something like (6, 3, 9, 9) @ (3, 9, 3)

@milosz7 7 місяців тому

multiplying matrces with these shapes is not possible

@colehoward5144 7 місяців тому

@@milosz7yeah it doesn't look like it at first, but they are compatible. Results in output shape (6,3,9,3)