Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

День тому

Visual Guide to Transformer Neural Networks (Series) - Step by Step Intuitive Explanation
Episode 0 - [OPTIONAL] The Neuroscience of "Attention"
• The Neuroscience of “A...
Episode 1 - Position Embeddings
• Visual Guide to Transf...
Episode 2 - Multi-Head & Self-Attention
• Visual Guide to Transf...
Episode 3 - Decoder’s Masked Attention
• Visual Guide to Transf...
This video series explains the math, as well as the intuition behind the Transformer Neural Networks that were first introduced by the “Attention is All You Need” paper.
--------------------------------------------------------------
References and Other Great Resources
--------------------------------------------------------------
Attention is All You Need
arxiv.org/abs/1706.03762
Jay Alammar - The Illustrated Transformer
jalammar.github.io/illustrated...
The A.I Hacker - Illustrated Guide to Transformers Neural Networks: A step by step explanation
jalammar.github.io/illustrated...
Amirhoussein Kazemnejad Blog Post - Transformer Architecture: The Positional Encoding
kazemnejad.com/blog/transform...
Yannic Kilcher UKposts Video - Attention is All You Need
www.youtube.com/watch?v=iDulh...

КОМЕНТАРІ: 606

@HeduAI 3 роки тому

*CORRECTIONS* A big shoutout to the following awesome viewers for these 2 corrections: 1. @Henry Wang and @Holger Urbanek - At (10:28), "dk" is actually the hidden dimension of the Key matrix and not the sequence length. In the original paper (Attention is all you need), it is taken to be 512. 2. @JU PING NG - The result of concatenation at (14:58) is supposed to be 7 x 9 instead of 21 x 3 (that is to so that the concatenation of z matrices happens horizontally and not vertically). With this we can apply a nn.Linear(9, 5) to get the final 7 x 5 shape. Here are the timestamps associated with the concepts covered in this video: 0:00 - Recaps of Part 0 and 1 0:56 - Difference between Simple and Self-Attention 3:11 - Multi-Head Attention Layer - Query, Key and Value matrices 11:44 - Intuition for Multi-Head Attention Layer with Examples

@amortalbeing 2 роки тому

Where's the first video?

@HeduAI Рік тому

@@amortalbeing Episode 0 can be found here - ukposts.info/have/v-deo/bGiYdoaDbpd5q40.html

@amortalbeing Рік тому

@@HeduAI thanks a lot really appreciate it:)

@omkiranmalepati1645 Рік тому

Awesome...So dk value is 3?

@jasonwheeler2986 Рік тому

@@omkiranmalepati1645 d_k = embedding dimensions // number of heads

@thegigasurgeon Рік тому

Need to say this out loud, I saw Yannic Kilcher's video, read tonnes of materials on internet, went through atleast 7 playlists, and this is the first time I really understood the inner mechanism of Q, K and V vectors in transformers. You did a great job here

@HeduAI Рік тому

This made my day :,)

@afsalmuhammed4239 9 місяців тому

True

@exciton007 7 місяців тому

Very intuitive explanation!

@EducationPersonal 6 місяців тому

Totally agree with this comment

@VitorMach 5 місяців тому

Yes, no other video actually explains what the actual input for these are

@nitroknocker14 2 роки тому

All 3 parts have been the best presentation I've ever seen of Transformers. Your step-by-step visualizations have filled in so many gaps left by other videos and blog posts. Thank you very much for creating this series.

@HeduAI 2 роки тому

This comment made my day :,) Thanks!

@bryanbaek75 2 роки тому

Me, too!

@lessw2020 2 роки тому

Definitely agree. These videos really crystallize a lot of knowledge, thanks for making this series!

@Charmente2014 2 роки тому

@devstuff2576 Рік тому

@@HeduAI absolutely awesome . You are the best.

@nurjafri 3 роки тому

Damn. This is exactly what a developer coming from other backgrounds need. Simple analogies for a rapid understanding. Thanks a ton. Keep uploadinggggggggggg plss

@Xeneon341 3 роки тому

Agreed, very well done. You do a very good job of explaining difficult concepts to a non-industry developer (fyi I'm an accountant) without assuming a lot of prior knowledge. I look forward to your next video on masked decoders!!!

@HeduAI 3 роки тому

@@Xeneon341 Oh nice! Glad you enjoyed these videos! :)

@ML-ok9nf 6 місяців тому

Absolutely underrated, hands down one of the best explanations I've found on the internet

@chaitanyachhibba255 3 роки тому

Were you the one who wrote transformers in the fist place, because no one explained it like you did. This is undoubtfully the best info I have seen. I hope you please keep posting more videos. Thanks a lot.

@HeduAI 3 роки тому

This comment made my day! :) Thank you.

@malekkamoua5968 2 роки тому

I've been stuck for so long trying to get the Transformer Neural Networks and this is by far the best explanation ! The examples are so fun making it easier to comprehend. Thank you so much for you effort !

@HeduAI 8 місяців тому

Cheers!

@HuyLe-nn5ft 8 місяців тому

The important detail that set you apart from the other videos and websites is that not only did you provide the model's architecture with numerous formulas but you also demonstrated them in vectors and matrixes, successfully walked us through each complicated and trivial concept. You really did a good job!

@forresthu6204 2 роки тому

Self-attention is a villain that has struck me for a long time. Your presentation has helped me to better understand this genius idea.

@MGMG-li6lt 3 роки тому

Finally! You delivered me from long nights of searching for good explanations about transformers! It was awesome! I can't wait to see the part 3 and beyond!

@HeduAI 3 роки тому

Thanks for this great feedback!

@HeduAI 3 роки тому

“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ukposts.info/have/v-deo/n3pqn5eBqntomZ8.html

@rohtashbeniwal9202 11 місяців тому

this channel needs more love (the way she explains is out of the box). I can say this because I have 4 years of experience in data science, she did a lot of hard work to get so much clarity in concepts (love from India)

@HeduAI 11 місяців тому

Thank you Rohtash! You made my day! :) धन्यवाद

@adscript4713 14 днів тому

As someone NOT in the field reading the Attention paper, after having watched DOZENS of videos on the topic this is the FIRST explanation that laid it out in an intuitive manner without leaving anything out. I don't know your background, but you are definitely a great teacher. Thank you.

@HeduAI 12 днів тому

So glad to hear this :)

@sebastiangarciaacosta5468 3 роки тому

The best explanation I've ever seen of such a powerful architecture. I'm glad of having found this Joy after searching for positional encoding details while implementing a Transformer from scratch today. Valar Morghulis!

@HeduAI 3 роки тому

Valar Dohaeris my friend ;)

@rohanvaidya3238 3 роки тому

Best explanation ever on Transformers !!!

@EducationPersonal 6 місяців тому

This is one of the best Transformer videos on UKposts. I hope UKposts always recommends this Value (V), aka video, as a first Key (K), aka Video Title, when someone uses the Query (Q) as "Transformer"!! 😄

@HeduAI 6 місяців тому

😄

@jackziad 3 роки тому

Your videos are so good at getting complex ideas across in an intuited way. You are like the 3Blue1Brown equivalent for AI. Keep it up and keep producing high-quality video content, at your own pace of course 😋

@HeduAI 3 роки тому

3Blue1Brown is one of my favorite channels! Therefore, you comparing these videos to that channel is one of the best compliments ever. Thank you! :)

@rishiraj8225 9 місяців тому

@@HeduAI yes.. this is awesome explanation comparable to 3Blue1Brown.. make more..

@alankarmisra 6 місяців тому

3 days, 16 different videos, and your video "just made sense". You just earned a subscriber and a life-long well-wisher.

@jonathanlarkin1112 3 роки тому

Excellent series. Looking forward to Part 3!

@HeduAI 3 роки тому

“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ukposts.info/have/v-deo/n3pqn5eBqntomZ8.html

@shubheshswain5480 3 роки тому

I went through many videos from Coursera, youtube, and some online blogs but none explained so clear about the Query, key, and values. You made my day.

@HeduAI 3 роки тому

Glad to hear this Shubhesh :)

@binhle9475 Рік тому

Your attention to details and information structuring are just exceptional. The Avatar and GoT references on top were hilarious and make things perfect. You literally made a story out of complex deep learning concept(s). This is just brillant. You have such a beautiful mind (if you get the reference :D). Please consider making more videos like this, such a gift is truly precious. May the force be always with you. 🤘

@fernandonoronha5035 2 роки тому

I don't have words to describe how much these videos saved me, thank you!

@jackskellingtron 2 роки тому

This is the most intuitive explanation of transformers that I've seen. Thank you hedu! I'm in awe. Liked & subbed.

@HeduAI Рік тому

So glad to know this! :)

@oliverhu1025 11 місяців тому

Probably the best explanation of transformers I’ve found online. Read the paper, watched Yannic’s video, some paper reading videos and a few others, the intuition is still missing. This connects the dots, keep up the great work!

@maryamkhademi 2 роки тому

Thank you for putting so much effort in the visualization and awesome narration of these series. These are by far the best videos to explain transformers. You should do more of these videos. You certainly have a gift!

@HeduAI Рік тому

Thank you for watching! Yep! Back on it :) Would love to hear which topic/model/algorithm are you most wanting to see on this channel. Will try to cover it in the upcoming videos.

@markpadley890 3 роки тому

Outstanding explanation and well delivered, both verbally and with the graphics. I look forward to the next in this series

@HeduAI 3 роки тому

“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ukposts.info/have/v-deo/n3pqn5eBqntomZ8.html

@lmxnothere 3 роки тому

Most clear explanation I have seen so far! Keep up the good work 💪

@HeduAI 3 роки тому

Glad to hear that :) 💪

@sowmendas812 Рік тому

This is literally the best explanation for self-attention I have seen anywhere! Really loved the videos!

@ariasardari8588 2 роки тому

Your ability to convey concepts is quite impressive! Probably the best tutorial video I've ever seen. From now on, every time I open UKposts, I first check if you have a new video It was fantastic! I greatly appreciate it.

@HeduAI Рік тому

Thanks a lot Aria! Really means a lot :)

@Andrew6James 3 роки тому

Wow. Amazing explanation! You have a gift for explaining quite complex material succinctly.

@HeduAI Рік тому

Thanks Andrew! Cheers! :D

@raunakdey3004 11 місяців тому

Really love coming back to your videos and get a recap on multi layered attention and the transformers! Sometimes I need to make my own specialized attention layers for the dataset in question and sometimes i dunno it just helps to just listen to you talk about transformers and attention ! Really intuitive and helps me to break out of some weird loop of algorithm design I might have gotten myself stuck at. So thank you so so much :D

@Scaryder92 Рік тому

Amazing video, showing how the attention matrix is created and what values it assumes is really awesome. Thanks!

@mirkitdss 3 роки тому

This is awesome. Thank you very much for taking your time to make the most amazing explanation video!!! Love your series!!!! :)

@HeduAI 3 роки тому

Comments such as these encourage me to continue making such videos. Thank you! :)

@Ariel-px7hz Рік тому

Such a fantastic and detailed yet digestible explanation. As others have said in the comments, other explanations leave so many gaps. Thank you for this gem!

@adithyakaravadi8170 Рік тому

You are so good, thank you for breaking down a seemingly scary topic for all of us.The original paper requires lot of background to understand clearly, and not all have it. I personally felt lost. Such videos help a lot!

@andybrice2711 4 дні тому

This really is an excellent explanation. I had some sense that self-attention layers acted like a table of relationships between tokens, but only now do I have more sense of how the Query, Key, and Value mechanism actually works.

@rushabhpatel007 2 роки тому

This is by far the best explanation I have seen about attention in NLP. Thank you! :))

@HeduAI Рік тому

You are very welcome! :)

@nicholasabad8361 Рік тому

By fair the best explanation of Multi-Head Attention I've ever seen on UKposts! Thanks!

@HeduAI Рік тому

Glad to hear this :)

@prashantgaigavale5973 3 роки тому

Excellent explanation. Thank you!!

@HeduAI 3 роки тому

Thank you for watching :)

@aaryannakhat1842 2 роки тому

Spectacular explanation! This channel is sooo underrated!

@jamesshady5483 Рік тому

This explanation is incredible and better than 99% of what I found on the Internet. Thank you!

@leticiabomfim8570 2 роки тому

It was exactly what I was looking for! This is an amazing presentation! Thank you very much!

@HeduAI Рік тому

Thanks Leticia! Cheers! :)

@hewas321 Рік тому

No way. This video is insane!! The most accurate and excellent explanation of self-attention mechanism. Subscribed to your channel!

@freaknextdoor9040 3 роки тому

Hands down, this series is the best one explaining the essence of transformers I have found online!! Thanks a lot, you are awesome!!!!

@HeduAI 3 роки тому

Cheers! 🙌

@skramturbo8499 Рік тому

I really like the fact that you ask questions within the video. In fact those are the same questions one has and first reading about transformers. Keep up the awesome work!

@wireghost897 9 місяців тому

Finally a video on transformers that actually makes sense. Not a single lecture video from any of the reputed universities managed to cover the topic with such brilliant clarity.

@ghostvillage1 Рік тому

Hands down the best series I've found on the web about transformers. Thank you

@ja100o Рік тому

I'm currently reading a book about transformers and was scratching my head over the reason for the multi-headed attention architecture. Thank you so much for the clearest explanation yet that finally gave me this satisfying 💡-moment

@henrylouis5143 2 роки тому

Brilliant presentation, it's none other than the best I've seen. Great appreciation for your work!!! Cristal clear organization.

@HeduAI Рік тому

Thanks Henry! Glad you liked it :)

@marcosmartinez9241 3 роки тому

These are the best serie of videos where I finally can find a good explanation about the Transformer network. Thanks a lot!!

@HeduAI 3 роки тому

Cheers! 🙌

@aritamrayul4307 2 місяці тому

Ohh why I get to know this channel now . This channel is criminally underrated!!

@geetanshkalra8340 Рік тому

This is by far the best video to understand Attention Networks. Awesome work !!

@chenlim2165 11 місяців тому

Bravo! After watching dozens of other explainer videos, I can finally grasp the reason for multi-headed attention. Excellent video. Please make more!

@neomcbraida6298 2 роки тому

thanks a lot, explains both the fundamental and more complex parts to self-attention in a concise way, very helpful.

@HeduAI Рік тому

Thanks Neo! Glad you liked it :)

@noorhassanwazir8133 2 роки тому

Lovely and systematic method of explanation .I have never seen such a teacher till date.love and respect.

@HeduAI Рік тому

Thanks Noor

@gowthamkrishna6283 2 роки тому

wow!! The best transformers series ever. Thanks a ton for making these

@chinbryan9544 2 роки тому

The best explanation I have seen. So much AHA moment for it. THANK YOU VERY MUCH!

@HeduAI 2 роки тому

Cheers 🙌

@artukikemty 11 місяців тому

Thanks for posting, by far this is the most didactic Transformer presentation I've ever seen. AMAZING!

@SuilujChannel Рік тому

thanks for these great videos! The visualizations and extra explanations on details are perfect!

@selimcanbayrak3278 2 роки тому

Thank you for the clearest and the best explanation I have found on the web!

@HeduAI Рік тому

Thanks Selim! Cheers! :D

@benakhovan184 2 роки тому

Amazing video, tried to understand it for a couple of days, this was the final touch of explanation I needed

@HeduAI Рік тому

Thanks Ben! Cheers!

@laalbujhakkar 25 днів тому

Amazing explanation! Best on UKposts! totally under-rated! I feel fortunate to have found it. Thank you! :) 💐👏👏

@jikin91 10 місяців тому

I've read and watched tonnes of materials to understand the role of Q,K,V in transformers.. this is hands down the BEST I have seen. Also, this is the first time i placed any comment on any UKposts video in my 15 years of watching. Your video made me do this! Please produce more awesome content👏

@HeduAI 10 місяців тому

You truly made my day! :) Thank you!

@williamqh 2 роки тому

Very clear and detailed explanation. Thank you so much for clear the fog in my head about query, key and value concept!

@HeduAI 2 роки тому

You are very welcome :)

@aeny 3 роки тому

Amazing job.. excited for the 3rd part :-)

@HeduAI 3 роки тому

“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ukposts.info/have/v-deo/n3pqn5eBqntomZ8.html

@iliyasbektas9189 2 роки тому

This was simply amazing! Great work! Finally, I am starting to understand transformers!

@HeduAI Рік тому

Glad to know Illiya! Cheers! :D

@darkcrafteur165 Рік тому

Never posting but right now I need to thank you, I really don't believe that it exists a better way to understand self attention than watching your video. Thank you !

@cracksomeface Рік тому

I'm a grad student currently applying NLP - this is literally the best explanation of self-attention I have ever seen. Thank you so much for a great vid!

@mk_upo 3 роки тому

keep up the good work, clearly explained. I knew the explanation of score normalization by expected variance, but now when I saw you bringing up the cosine similarity formula, I realized that the normalization can be justified by the division by the product of norms.

@HeduAI 3 роки тому

Glad you found it useful! :)

@krishnakumarprathipati7186 3 роки тому

The MOST MOST MOST MOST ..........................useful and THE BEST video ever on Multi head attention........Thanks a lot for your work

@HeduAI 3 роки тому

So glad you liked it! :)

@marsgrins 27 днів тому

This is the best. Thank you sooooo much Batool for helping me understand this!!!

@HeduAI 26 днів тому

You are very welcome :)

@m2editz816 3 роки тому

Best ever explanation.. Hope you will continue your amazing work. Thank you very much!!!!!!!!

@HeduAI 3 роки тому

Thanks for your these awesome words!

@jeremyhofmann7034 2 роки тому

I’ve watched dozens of these and read as many articles and none have been able to explain in detail what self-attention is doing as well as this one. Finally I get it! Great work.

@HeduAI Рік тому

I feel so glad upon reading your comment! :) Mission served.

@adityaghosh8601 2 роки тому

Blown away by your explanation . You are a great teacher.

@cihankatar7310 Рік тому

This is the best explanation of transformers architecture with a lot of basic analogy ! Thanks a lot!

@PratikChatse Рік тому

Amazing !! loved the explanation! Subscribed

@danyailmateen1673 2 роки тому

This is by far the best explanation on the topic..thank you so much..really really appreciate the effort you put into it..

@HeduAI Рік тому

Glad to hear that Danyail! Cheers!

@salmaabdelmagid7908 2 роки тому

Such a wonderful and clear explanation! Thank you for sharing :)

@HeduAI Рік тому

Thanks Salma! Glad you liked it! :)

@AllOne95 3 роки тому

Really clear and great! Loved the examples!

@HeduAI 3 роки тому

Glad you found it useful :)

@chaitanyaparmar888 3 роки тому

Amazing videos! Very clear, concise and fun!

@HeduAI 3 роки тому

Glad to hear :)

@madhusharath 2 роки тому

Wow! Truly wow! Your ability to explain complex stuff in layman terms + reference to well known series/anime shows how in-depth your understanding actually is!

@HeduAI 2 роки тому

Your comment made my day :)

@suttonmattp Рік тому

Honestly understood this far better from this 15 minute video than from the 90 minute university lecture I went to on the subject. Really excellent explanation.

@abhishektyagi154 2 роки тому

Thank you. You have an amazing talent to simplify complex things. Please make more videos.

@HeduAI Рік тому

You are very welcome Abhishek! :)

@jboyce007 4 місяці тому

If only I saw your videos earlier. As everyone in the comments says, these are THE BEST videos on the subject matter found anywhere! Thank you so very much for helping us all!

@HeduAI 4 місяці тому

Cheers! :)

@yassine20909 Рік тому

This is a great work, thank you. keep uploading. 👏

@madhu1987ful Рік тому

Wow. Just wow !! This video needs to be in the top most position when searched for content on transformers and their explanation

@HeduAI Рік тому

So glad to see this feedback! :)

@minruihu Рік тому

it is impressive, you explain so complicated topics in a vivid and easy way!!!

@nali9527 3 роки тому

The best explanation I have ever seen !!! Thank you so much. Hope you can publish more videos

@HeduAI Рік тому

Yep! On it :) What would you like to be covered as future videos?

@nali9527 Рік тому

@@HeduAI anything related to math, and deep learning, maybe variational inference, or time series predictions. I really love all of your videos which helped me a lot in my work. Thanks again for your best videos 😃

@jadermcs 3 роки тому

I looked at many videos and yours is the first to explain the details about the "correlation" matrix in 9:58 which is the harder part to understand transformers and none of the other videos explained this, thanks a lot! I wish you the best!

@HeduAI 3 роки тому

Glad to know you found it useful :)

@bochengxiao1352 2 роки тому

Thank you so much! It's the best Transformer video ever! Really hope more on other models.

@HeduAI Рік тому

Glad to hear that! :) Do let me know if there are certain models that you would like to see covered in future videos.

@stache53 2 роки тому

Extremely good explanation. Thank you!

@HeduAI 2 роки тому

You are very welcome! :)

@dominikburkert2824 3 роки тому

best transformer explanation on UKposts!

@HeduAI 3 роки тому

So glad to hear this! :D

@pulkitsrivastava1316 2 роки тому

This is by far the best video I have come across that explains small details about Transformer. A video worth watching !! Best part: The explanation about query-key-value concept and multi-head attention with an example from computer vision. Eagerly waiting for videos on other topics of AI and ML.

@HeduAI Рік тому

Thanks Pulkit! :) Do let me know if there are certain models/algorithms that you would like to see covered in future videos.

@MANUSHEORANBEE 3 роки тому

The best explanation for this topic. So much detail for the most crucial part, just loved that. And also liked your concept of referring to web-series/movies/memes while explaining the concept. It sort of balances the knowledge and entrainment part. Great job, waiting for 3rd part!!!!

@HeduAI 3 роки тому

Thanks for this awesome feedback!

@HeduAI 3 роки тому

“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D ukposts.info/have/v-deo/n3pqn5eBqntomZ8.html

@OpenDiaryNKM Рік тому

I wish I could understand something this deep so I could explain it to others in details just like you did.

@VADemon Рік тому

Excellent examples and explanation. Don't shy away from using more examples of things that you love, this love shows and will translate to better work overall. Cheers!

@RafidAslam Місяць тому

Thank you so much! This is by far the clearest explanation that I've ever seen on this topic

@xiny4833 3 роки тому

very clearly explained, thank you so much

@HeduAI 3 роки тому

You are very welcome!

@jasonpeloquin9950 10 місяців тому

Hands down the best explanation of the use of Query, Key and Value matrices. Great video with an easy example to understand.

@alirezamogharabi8733 2 роки тому

Great explanation and visualization, thanks a lot. Please keep making such helpful videos.

@nizamphoenix 6 місяців тому

Being a professional in this field for ~5years can say this is by far the best explanation of attention. Amused as to why this doesn't pop up on YT's recommendation for attention at the top. Probably, YT's attention needs some attention to fix its Q, K, Vs

@HeduAI 6 місяців тому

You made my day :)