Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

  Переглядів 296,489

Umar Jamil

Umar Jamil

День тому

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.
Paper: Attention is all you need - arxiv.org/abs/1706.03762
Slides PDF: github.com/hkproj/transformer...
Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

КОМЕНТАРІ: 538
@umarjamilai
@umarjamilai 11 місяців тому
Slides' PDF: github.com/hkproj/transformer-from-scratch-notes
@bhaskartripathi
@bhaskartripathi 4 місяці тому
I am not able to download the pdf file. My friends also tried. Will it be possible to put it on a downloadable link please? your content is too good and needs to be read again and again.
@mahek6110
@mahek6110 3 місяці тому
its getting downloaded@@bhaskartripathi
@hackie321
@hackie321 14 годин тому
The best Transformer explanation on internet till now and I have seen almost all of it. Kudos! You are a true teacher. I dare to compare you with Andrew NG. Please become a professor and not a corporate slave.
@kerrykilian9127
@kerrykilian9127 4 дні тому
best explanation of the paper on the whole internet
@gabrielnsionu8583
@gabrielnsionu8583 5 місяців тому
This is arguably the best explaination of the multi-head attention in the internet hands down. Very thorough and most important to folks like me using attention mechanism as my underpinning mechanism in developing my novel neural architecture to be applied to my deep reinforcement learning architecture. Sir, pls never stop making this type of videos.
@umarjamilai
@umarjamilai 5 місяців тому
You're welcome! 🤓
@csikel22
@csikel22 5 місяців тому
I couldn't agree more. Best video on transformers I have seen so far. I doesn't get clearer than this. It would be very interesting to give some insight why this whole thing works and what are other variations and alternative architectures.
@rkbshiva
@rkbshiva 5 місяців тому
​@@umarjamilaibro you're a legend!!!!
@pablofe123
@pablofe123 Місяць тому
There are still a couple of things that are not explained well in the video. Q, K and V matrixs are the same matrix? and where do the parameters matrix Wq, Wk and Wv comes from? Besides that, excellent video.
@peregudovoleg
@peregudovoleg 22 дні тому
@@pablofe123 21:25 "QKV are the same matreces". As for W matrices, he only says that they are "parameter matrices", and parameters is something we train during training process.
@DembaDiop-om3gv
@DembaDiop-om3gv 4 місяці тому
The best explanation of "Attention is all you need" from my point of view, guys "This explanation is all you need". Thank you very much
@tariqkhan1518
@tariqkhan1518 19 днів тому
TBH The best Explanation of Attention in whole Internet.
@KunalTiwariBCI
@KunalTiwariBCI 7 днів тому
Bro, legit the best explanation I have ever seen so far.
@sushantpenshanwar8038
@sushantpenshanwar8038 6 місяців тому
You did the best job of describing the complicated details in a fluid manner. Sat, watched and took notes in one sitting. Hands down best one so far.
@utkarshashinde9167
@utkarshashinde9167 Місяць тому
I cannot tell you how grateful I am for this explanation provided by you .............. nowhere I find this detailed and easy-to-understand description, a go-to video for every interview preparing students
@Udayanverma
@Udayanverma 6 місяців тому
I would understand much deeper with your explanation. Rest of the world is scarying with diagrams and tables without explaining practical implementation. thank you dear!
@calewang3713
@calewang3713 7 місяців тому
Oh Man, you deserve a Turing Award.....
@_seeker423
@_seeker423 2 місяці тому
The clearest explanation of a very important breakthrough paper that I have seen on UKposts. Thank you!
@_seeker423
@_seeker423 2 місяці тому
One thing that I felt was missing is the logical explanation of what is the role of value vector (V).
@ajithshenoy5566
@ajithshenoy5566 6 місяців тому
Bless you Umar One of the finest tutorials out there. Please don't ever stop. We're willing to support you in every way possible.
@abc-by1kb
@abc-by1kb 9 місяців тому
Such a great video! Explained all the key concepts so clearly and precisely while giving very nice intuition!
@AvinashKumar-pb2op
@AvinashKumar-pb2op 16 днів тому
Best Explanation Ever Existed in the whole Universe !!
@andreicristea997
@andreicristea997 7 місяців тому
Finally the fancy "black box" called transformer became more understandable for me. Really interested in the other content you are making. Thanks for the explanation.
@AIVidya
@AIVidya 5 місяців тому
One of the best transforrmers videos encountered so far.
@NJCLM
@NJCLM 3 місяці тому
This video is surely among the top 3 among the 50 videos that I watched to understand this subject. We are very grateful to you, keep the energy, UKposts numbers will follow !
@marsupilami125
@marsupilami125 2 місяці тому
Can you tell me the other 2?🙏
@AbhinavSharma-dc3kv
@AbhinavSharma-dc3kv Місяць тому
the best explanation for attention architecture. kudos to you sir!
@silasnginyo7744
@silasnginyo7744 5 місяців тому
So far the best laid out presentation of Transformers I have ever walked through
@abhilashbalachandran7160
@abhilashbalachandran7160 7 місяців тому
super useful. I really loved how you explain this with linear algebra. Very insightful. actually easier to understand than a lot of lectures at universities
@Patrick-wn6uj
@Patrick-wn6uj Місяць тому
This is the most important channel I have come across on youtube. keep creating these long form videos you are saving our lives in a huge away
@brunogatti383
@brunogatti383 Місяць тому
Best video for attention mechanism hands down
@sergewilsonmendy9051
@sergewilsonmendy9051 10 місяців тому
Thank you man, this is the best transformer video I've seen. Well explained and very detailed.
@jdbrinton
@jdbrinton 5 місяців тому
the clearest description I've found to-date. bravo!
@albert4392
@albert4392 9 місяців тому
I really appreciate your talent to present knowledge. Nice explaination, thank you so much!
@vrvlbl
@vrvlbl 3 місяці тому
Amazing explanation. I struggled too long to understand the architecture until I landed on your video. Way to go!!
@Nereus22
@Nereus22 5 місяців тому
This is really a great video, exactly what I was searching for! Everything that you mentionned was explained in details (others are skipping a lot).
@lethnis9307
@lethnis9307 Місяць тому
Finally, after a lot of articles and videos i found a video a could understand. Thank you, sir. I am not strong in math but i think i understood a lot with this explanation
@haoming3430
@haoming3430 Місяць тому
Your video is very helpful and easy to follow. I have to say this is the best tutorial about transformer I've seen.
@tipu461
@tipu461 9 місяців тому
I really appreciate your efforts to make it understandable for us 👍. Thanks a lot.
@channel8048
@channel8048 10 місяців тому
This is very clear! Better than anything I have read up till now. Grazie!
@ltbd78
@ltbd78 Місяць тому
You are incredible. Please continue making these type of tutorials.
@ameyadesai6382
@ameyadesai6382 6 місяців тому
The best explanation on this paper, can't wait to see the other videos on this topic.
@ishaanjoshi6959
@ishaanjoshi6959 4 місяці тому
The best explanation of attention based mechanism I found online , thank you so much Umar for making this video.
@SagarVibhute
@SagarVibhute 5 місяців тому
Kudos on the commendable work, and simplified explanation! I appreciate that you are also trying to explain the intuition behind each step and not just math. I'll view and re-view this a few times to understand more with successive passes. Thank you!
@dalilabdouraman3557
@dalilabdouraman3557 4 місяці тому
Definetely the best explanation of the mutli head attention with the transformer ...just awesome
@hamzaomari7052
@hamzaomari7052 Місяць тому
This is the best explanation, it took me 4 hours, to take notes and revise stuff, and going with you word by word, with intuitions, and now I feel that I truly understand the transformer architecture and the mathematical intuition behind every detail. A thing that you cannot find in any other video. Thank you so much sir, this is very instructif and helpful.
@rkjellbe
@rkjellbe 6 місяців тому
Thank you, Umar. This was very helpful and I feel I have a much better understanding of the process now. Great work!
@JohnSmith-he5xg
@JohnSmith-he5xg 7 місяців тому
The best overview I've seen. Great job!
@debjyotimukherjee8275
@debjyotimukherjee8275 Місяць тому
Excellent video gave a complete description with a great explanation. Looking forward to more such amazing content!
@aurelagbodoyetin3321
@aurelagbodoyetin3321 5 місяців тому
This is a masterclass. Thank you for your work
@profyao
@profyao Місяць тому
Absolutely the best explanation for multi-head attention so far!
@1tahirrauf
@1tahirrauf 8 місяців тому
Umar! You nailed it. Please make more videos. It was truly helpful. Thank you.
@mculabs
@mculabs 4 місяці тому
Probably the best explanation of the paper and the encoder and decoder sub layers. Kudos!!
@cristinaballesteros93
@cristinaballesteros93 2 місяці тому
I have watched a lot of videos about transformers, and this is by far the best one. I finally understand how they work. Thank you so much!
@keithchua1723
@keithchua1723 2 місяці тому
Spent days trying to understand this and I wished I had come across this video first because now I understand everything fully. Immediately subscribed, keep it up!!
@yuk-hoiyiu7023
@yuk-hoiyiu7023 3 місяці тому
The only video that explains the difference between training and inference in the Transformer model!
@zeeshanmehdi3994
@zeeshanmehdi3994 2 місяці тому
can't thank you enough, this is the best explanation of transformers i could find after trying for days to understand it. Thank you ❤
@70152136
@70152136 4 місяці тому
your presentation skill are simply amazing!!! best video on transformers I've seen so far
@juwanyirenda3457
@juwanyirenda3457 5 місяців тому
Excellent exposition! Thank you Umar for the great work.
@saima6759
@saima6759 2 місяці тому
transformer model never got so clear to me! thank you Umar!
@lyte69
@lyte69 6 місяців тому
Thank you for your great explanation and effort, this was very informative and honestly there are no problems with the video, it's only a preference for me if there was some code alongside each part explained so it's even better understood, but I want you to know that this was a huge help thank you again. ❤
@anirudhjoshi1607
@anirudhjoshi1607 6 місяців тому
This is the clearest explanation on this paper I have ever heard. Always had doubts about Multi-Head attention and now finally I can visualise this 100%. Thanks a lot Umar Jamil.
@ddstar
@ddstar 3 місяці тому
Excellent. You answered a lot of questions I had about where the weights come from and how they were updated
@madhuvamsi7055
@madhuvamsi7055 7 місяців тому
You've definitely earned a lifelong subscriber bro! Great video.
@brothachris
@brothachris 10 місяців тому
Excellent tutorial! Please keep up the great work.
@NazerkeSafina
@NazerkeSafina Місяць тому
This is brilliant. Thank you Umar for your hard work. Please keep new videos coming. You are helping immensely. May you live long and happy and healthy
@megatroneata9911
@megatroneata9911 3 місяці тому
After watching this video and the stable diffusion video, I can say forsure that you are an amazing teacher. Extremely digestible content and easy to follow along.
@saravanannatarajan6515
@saravanannatarajan6515 2 місяці тому
One of the best videos I have seen on this topic. Thanks a lot for making it easy for us. Great effort, hats off!
@priyanjaligoel4294
@priyanjaligoel4294 3 місяці тому
omg! I love it. Finally so many answers to my questions. I had a very abstract version of the process in my head before but now its much clearer. Thank you so much!
@shuchenwu170
@shuchenwu170 2 місяці тому
This tutorial translates complex and terse structures into intuitions. A masterpiece of tutorials!
@TheFitsome
@TheFitsome 5 місяців тому
I've seen a TON of videos and articles on transformers, enough to say "This is Number 1"
@sedthh
@sedthh 9 місяців тому
Thank you, this was really helpful! One minor correction: the LayerNorm does not normalize to a 0-1 range rather it standardizes to 0 mean with unit variance.
@umarjamilai
@umarjamilai 8 місяців тому
You're right! Thanks for pointing out.
@BritskNguyen
@BritskNguyen 2 місяці тому
this is the best lecture on transformer one can get, period.
@richeek10
@richeek10 2 місяці тому
Such a nice explanation with a soothing voice. Thanks so much!
@tgyawali
@tgyawali 5 місяців тому
Thank you, so much for putting together such a detailed video. This helps technical people who do not have a lot of experience in research but have some background in machine learning to understand this very important and historic paper in AI.
@nirajdesai
@nirajdesai Місяць тому
Brilliant explanation of basics - thanks for putting this video together!
@jeffrey5602
@jeffrey5602 7 місяців тому
This is pure gold. Thank you so much for your efforts
@danielvillalba4457
@danielvillalba4457 4 місяці тому
Lots of new insights about transformers technology, every document and video provides more details, great video sir!
@sujeethav9885
@sujeethav9885 13 днів тому
This is just perfect! A wholesome video on Transformers!
@ankitkacchap
@ankitkacchap 26 днів тому
Awesome explanation , our professor also doesn't explain like you did thank youTube recommendation and special thanks to u
@sudzam
@sudzam Місяць тому
What a wonderful video with clear explanation! Thanks for making this and sharing with the community.
@gauravmalik3911
@gauravmalik3911 3 місяці тому
Detailed explanation, did great work on explaining difficult topic by dividing in chunks, I don't think any part is missed in explanation. Best Explanation
@skc909887u
@skc909887u 7 місяців тому
This is the best explanation for an engineer for sure .love this
@vincetran6321
@vincetran6321 8 місяців тому
Best explanation of transformer ive come across! Thanks so much :)
@atrijpaul4009
@atrijpaul4009 4 місяці тому
Best explanation of Attention throughout UKposts!!!!! Thank you sir for making this video and helping us..
@koko-wf8vz
@koko-wf8vz 6 місяців тому
Thank you so much for this video, hands on the best in depth video i have seen. I love the graphical explanations, it helps to visualize matrixes for a math noob :) much love
@rajkrishnamurthy8474
@rajkrishnamurthy8474 7 місяців тому
Love it Umar. This is the best explanation of the paper. Thank you very much.
@Zineb-ru8bp
@Zineb-ru8bp 5 місяців тому
I was struggling trying to understand Transformers but you make it easy for me. Thank you so much
@steffenw7429
@steffenw7429 8 місяців тому
Great video and explanations! Many thanks 🙏
@shakibyazdani9276
@shakibyazdani9276 4 місяці тому
best video on transformers I've seen so far
@nadyaabdel5559
@nadyaabdel5559 3 місяці тому
Amazing explanation. First time every bit is super clear. Thank you.
@abdulmajid8731
@abdulmajid8731 4 місяці тому
It would be harsh if not rated on top. Absolutely the best explaination so far around the 'world'. Thanks Umar for your efforts. Keep the good work up.
@huseyngorbani6544
@huseyngorbani6544 10 місяців тому
This video is hands down the best explanation I've come across so far! The level of detail provided is fantastic, but if there's one aspect I'd love to delve deeper into, it's the normalization part. It would be incredibly helpful if you could expand on that topic a bit more. Furthermore, I'm quite curious about the process of weight learning. With so many weights involved, such as those for Q, K, V, and the fully connected layer, as well as the weights in the decoder part, understanding how they are learned would be immensely valuable. If you have any recommended resources or links that explain this aspect, I would greatly appreciate it. Thanks again for the amazing content!
@umarjamilai
@umarjamilai 9 місяців тому
Hi @huseyngorbani6544 The process of weights learning is determined exclusively by the back-propagation algorithm. Since it's a fundamental algorithm in machine learning, I will make a video on how it works and how to write an autograd system from scratch, so that anyone, even with little maths background, can understand it. As you know making videos, especially when it's not your source of income, is very difficult. I try to make high quality content and for free, not only for my own personal pleasure in teaching, but especially for helping others struggling to enter this magical world called AI. Have faith and I'll make try to satisfy everyone's requests. Have a wonderful day with your family, friends, pets (and VS code)!
@huseyngorbani6544
@huseyngorbani6544 9 місяців тому
@@umarjamilai Oh understood. Thank you.
@smartwakeAI
@smartwakeAI 8 місяців тому
@@umarjamilai Thanks for being such a genuine human being. Being extraordinary smart and remaining humble at the same time is a difficult challenge that most highly intelligent people seem to fail. I am fairly new to AI and I loved your video! Thanks for making those videos! They are super helpful!
@alexanderlevakin9001
@alexanderlevakin9001 7 місяців тому
​@@umarjamilaiyou make it to help us and it works, thank you.
@bsuhaib
@bsuhaib 8 місяців тому
This is called decoding a transformer. What I really liked was explaining each chunk. That was really helpful for this topic and surely taught me the approach to decode any problem. Jazaakallah ul Khair
@ciliamadani3046
@ciliamadani3046 2 місяці тому
The best explanation I have ever watched, thank you
@srikanthvoleti5942
@srikanthvoleti5942 3 місяці тому
Superb video, the best explanation, I have been trying to understand transformers for a long time and this definitely helped me a lot
@ActualCode0
@ActualCode0 5 місяців тому
I like how u used examples and drew out the matrices to show what was going on in the attention block. It rly helped me understand the concept better
@oleksandrasaskia
@oleksandrasaskia 2 місяці тому
Thank you SO MUCH for your humane, empathic explanation! This means a lot! Keep it up!
@adrianovr9735
@adrianovr9735 Місяць тому
Best explanation of Transformer, HANDS DOWN
@xue8888
@xue8888 11 місяців тому
Thank you man, you are amazing. Keep it up ❤ good luck, I have fingers crossed for your success
@jawadhaidar3931
@jawadhaidar3931 7 місяців тому
Top-notch explanation, Thank you!
@noeloc
@noeloc 4 місяці тому
Great work, thanks for putting this together!!
@Stephanfreund
@Stephanfreund 4 місяці тому
Awesome explanation for those who seek to truly understand the fundamentals of the most important paper of this decade
@wilsonzheng1198
@wilsonzheng1198 10 місяців тому
Amazing explanation. Thank you!
@hugopristauz538
@hugopristauz538 7 місяців тому
good job - your single stepping (with remarking) is really helpful
@user-pz5nn2kg2j
@user-pz5nn2kg2j 4 місяці тому
The best video explaining the Transformer so clearly I have ever seen. Thanks very much for your efforts. I really appreciate your methods of explaining every steps with a concrete examples and explicitly give the shapes of every matrices that involve. The shapes of matrices in each step are the most confusing part for me to understand Transformer models, and you make it so clear for me. Thanks a lot Umar.
@umarjamilai
@umarjamilai 4 місяці тому
不客气!你们可以在领英交流
@baabakasadi5440
@baabakasadi5440 2 місяці тому
Thanks for the beautiful explanation.
@bornabiljan1294
@bornabiljan1294 10 місяців тому
Excellent video! Thank you for making it.
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Переглядів 548 тис.
Эффект Карбонаро и пончики
01:01
История одного вокалиста
Переглядів 8 млн
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Переглядів 193 тис.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Переглядів 182 тис.
NEW GPT-4o: My Mind is Blown.
6:28
Joshua Chang
Переглядів 9 тис.
Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention
15:25
Transformers, explained: Understand the model behind GPT, BERT, and T5
9:11
Google Cloud Tech
Переглядів 863 тис.
Transformer Neural Networks - EXPLAINED! (Attention is all you need)
13:05
🤯Самая КРУТАЯ Функция #shorts
0:58
YOLODROID
Переглядів 3,3 млн
Which Phone Unlock Code Will You Choose? 🤔️
0:14
Game9bit
Переглядів 6 млн
How Neuralink Works 🧠
0:28
Zack D. Films
Переглядів 26 млн
поворотний механізм для антени
0:17
Lazeruk
Переглядів 14 тис.
Какой телефон лучше всего снимает? 🤯
0:42