Neural Network Learns to Play Snake using Deep Reinforcement Learning

4 роки тому

Can an AI play Snake well by only looking at it? In this video I use two separate deep reinforcement learning algorithms to try to answer that question. The video was heavily inspired by and is a spiritual continuation of CodeBullet's excellent "A.I. Learns to play Snake using Deep Q Learning" ( • A.I. Learns to play Sn... )
Twitter: / safijari
Patreon: / jackofsome
SOME OF MY OTHER VIDOES:
○ Snake Programming Stream: • Chill Music and Coding...
○ Deep RL Stream: • How to Solve a Basic R...
○ Vanilla Q Learning Stream: • How to Solve a Basic R...
○ Explaining RL to a baby: • Baby Learns about Arti...
○ 5 Common Python Mistakes: • 5 Things You're Doing ...
○ Making Python fast: • Can VSCode be a reason...
VIDEOS MENTIONED:
Alex Patrenko's Snake AI: • Advantage Actor-Critic...
OpenAI Hide and Seek: • Multi-Agent Hide and Seek
AlphaGo: • AlphaGo Official Trailer
OpenAI Five (Dota): • OpenAI Five
#deeplearning #machinelearning #ai

КОМЕНТАРІ: 53

@swordriffraff-green6319 4 роки тому

Really enjoyed the brief overviews of each of the algorithms used - would watch more of this type of video!

@JackofSome 4 роки тому

Thank you for your kind words. I'm going to be doing more of these. So far this has fared much worse than my other work but I think it has potential.

@shahzebafroze4093 4 роки тому

This is awesome!! Looking forward to more videos over the summer!

@martinsosmucnieks8515 3 роки тому

Very great video. I'm sad I haven't found this channel earlier!

@bobingstern4448 3 роки тому

Fantastic stuff dude, keep it up!

@tan-uz4oe 4 роки тому

Nice video! Clean and easy to follow explanation :)

@lucamehl3109 4 роки тому

Awesome video! Subbed!

@NightsAndDays 4 роки тому

wow, great video!

@NikoKun 19 днів тому

I took my own weird route to playing around with this idea.. First, I wrote a more classic rules based bot to play snake, with it's own recursive function to check future choices for dead ends. It's not perfect, but it can often play the game well enough, to fill half the available space with snake before dying to more difficult to avoid dead-ends. I then used that bot to record 10,000 of it's highest scoring games frame by frame, a couple million frames in total, also recording each action it took per frame. Then I fed all that data into a basic neural network, and ran a few hundred training epochs. So far I've gotten the neural network to play the game alright, but only about as good as my bot. heh

@andrewsimon6058 3 роки тому

I admit the music in this video is perfectly fit.

@ozmandunn 4 роки тому

Here from Lex Fridman Podcast Discord!

@a7744hsc 2 роки тому

Great video, many useful thoughts in it! Could you share more details about your model? e.g. how did you design the reward system?

@nickmoen6017 Рік тому

+1 for approach food, -1 for go away from food, +10 eat food, -100 die. It is a CNN. Calculate the deltas for each episode and then update the weights after all 4 snakes die.

@elliotg8403 4 роки тому

Is there source code for this somewhere? Would be super helpful to have a look at it

@cobuslouw8319 3 роки тому

I have a deep q-learning snake on my channel. It managed to get to a score of 107 on a 14x14 grid. Awesome video! Also interested to see how PPO performs.

@JackofSome 3 роки тому

That is really really nice. I watched the video it looks great. Do you have a repo i could look at?

@cobuslouw8319 3 роки тому

@@JackofSome I don't have a nice repo to share now...I 'm busy finishing my master's degree at the moment, but I'm going put everything in a repo as soon as I'm done. It is a distributed deep q-learning algorithm with prioritized experience replay and n-step updates. The other videos on my channel are trained using the same codebase.

@KrzysztofDerecki 4 роки тому

If I understand you correctly you are feeding your learning alghorithm with full enviroment data. Thats why you endup with having not enough resources very fast. In my opinion genral snake agent should not be dependant on board size. For start you can try using moving 16x16 window around head of snake as an enviroment input, and use it on any board size. Later you can try experinent with other additional enviroment inputs like current snake length, distances to every board edge etc. I may be wrong, but maybe it's a good hint :) Nice video!

@sirynka 4 роки тому

And also, does it make sense to feed an actual image to the neural network instead of an array filled with [-1, 0, 1] corresponding to states of cells on a board.

@JackofSome 4 роки тому

The way I set up my network it would immediately scale down the image to the representation you're describing. The overhead here is learning one additional convolutional layer which isn't that bad and it keeps the problem err... "pure" for the lack of a better term (since the aim was to learn from pixels)

@McMurchie 2 роки тому

What really really gets on my tits is how all reinforcement videos are about trivial stuff, trying to make it seem hard like 'oooh how do we train the network?', when the overwhelmingly hard thing is, how to get the pixel data from the screen, how to format it, how do we build the game etc.

@JackofSome 2 роки тому

That's ... That's the easy stuff to me. I did a silent stream prior to these ones where I built all of that in under an hour. There's lots of tutorials out there on how to make games and capture screen data. It really doesn't belong in a discussion of RL algorithms. That said _some_ aspects are covered in my RL streams

@-mwolf Рік тому

@@JackofSome Could you share the code you used for this video?

@MrCmon113 4 роки тому

Damn, I wanted to do something like that for my Bachelor Thesis. If your agent can actually learn to play a board of any size in theory, I may go for something else.

@JackofSome 4 роки тому

The video of supposed to inspire, not discourage 😅. There's a lot of cool things you can do with Snake with many different approaches. I encourage you to still go down this route. If you'd like advice, come to the machine learning discord discord.gg/yHh5UwJ

@MrCmon113 4 роки тому

@@JackofSome It's just that this is the first solution to snake with RL I have seen. ^^ However I wrote this before watching the entire video. Doesn't seem like everything is said and done about snake after all. I'll check out that discord tomorrow. Thanks. : D

@revimfadli4666 Рік тому

I mean, there are many possible ways you could develop this. Using new architectures(such as modifications to LSTM/GRU), or hindsight experience replay without knowing the goal state, etc

@valentinpopescu98 3 роки тому

Can you explain me how did you calculate the complexity of the problem at 9:05? As I'm thinking about it, on 20x20 res, there are 400 states and actions. So it has to compute 400 actions depending on the environment, meaning 3^400 (given a pixel is 0 - no food/no body, 1 - there is food, 2 - there is part of his body), so 400 * (3 ^ 400)?

@jeffreyhsiao7938 Рік тому

我有類似疑問

@agentkoko3988 3 роки тому

Would love to watch more ❤️ . . .can we feed real game as an environment like as an input the program takes it frame by frame and we define the set of actions something like that ?

@JackofSome 3 роки тому

Yes. OpenAI does that with DotA and deepmind did it for StarCraft. Both very impressive projects though they both had a bit more information into the game than just the frames and input (deepmind plays by just looking at the image though). There's also some projects people have done playing Mario or the chrome dinosaur game. The data requirements make all these tasks really daunting though (e.g. OpenAI trains on 5000 CPUs for 8 months...)

@agentkoko3988 3 роки тому

@@JackofSome Yes. Something we could do? For example we feed the game frame by frame and then the AI analyse the frame to recognise the Text and then AI does actions . . . actions which reduces the distance between text and agent get rewards and like more information is required for better results agent have it's position in the frame/game and be analysing every frame he's able to get the position of the text. That example was in general terms.. Now something you could relate . . . Mario game.... there is that a new demon and our Agent/Player is standing and that demon is standing not moving, the Agent performs actions and distance is reduced only when right action is performed after performing series of right actions the distance is very much reduced and then agent will fire and demon gets killed . . . The goal is to kill demon . . .the reward I have thought of are * Rewarded when distance gets reduced * Reward till the time agent is alive [As if it continues to reduce the distance it will collide with the demon and die] *Reward to achieve the goad '''to kill demon. NOTE: demon has a shape and "demon" is written on it so the AI recognise the text from the frames and gets to know there is a demon. Might have missed something . . . . Hope you understand it and I hope we could do anything about the same . Thank you ❤️

@tk421allday Рік тому

Any chance we could see the code for this?

@lored6811 4 роки тому

Beautiful video, are you working in the industry?

@JackofSome 4 роки тому

Thank you so much for watching. I do deep learning and computer vision professionally. Teaching myself deep RL for fun and also because I think it'll be important soon.

@sabrimahmoud383 2 роки тому

the link to the code please !

@bobhanger3370 2 роки тому

Bruhh wheres the coooode? I am doing literally the same project and hoping to take < a month

@luissoares2467 Рік тому

Can you share the sorce code ?

@marlhex6280 Рік тому

I’ve never watched that game on its final form 😂😂😂

@imkukis5949 3 роки тому

Did you create the game? And you apply AI on it??

@JackofSome 3 роки тому

Yes

@sohampatil6539 3 роки тому

Why was the input to the neural network 4 frames of 84 by 84? Why 84?

@sohampatil6539 3 роки тому

Also, for different sizes of the snake grid, would you change the complexity of the model?

@JackofSome 3 роки тому

I wasn't changing the complexity, but I don't think that should matter. The model I was using was probably more complex than it needed to be. 84 comes from the image size used in the original Atari paper. No real reason to pick that number other than that.

@mamo987 2 роки тому

pls more v nice

@arcadesoft294 4 роки тому

If you need it to be trained on a powerful computer I think you should contact sentdex he released a video where he said he might accept projects from fans to be run on his new 30k$ PC

@JackofSome 4 роки тому

I did. Haven't heard back.

@los4776 Рік тому

HER might have helped

@v.gedace1519 3 роки тому

04:15 - Take a look here to get the details about Hamiltonian Cycle and how to solve __any(*)__ snake game perfectly - independently of the playfield size! (any -> Hamiltonian Cycles arent possible for playgrounds with odd width and odd height; see repository why.) Video: ukposts.info/have/v-deo/jXmQfWyqgY6Sq6s.html Sorry video has no sound, no animation, etc. it just shows the two parts of my solution in action. Repository: github.com/UweR70/Hamiltonian-Cylce-Snake Contains deep but easy to understand explanations.

@JackofSome 3 роки тому

I'm familiar with the hamiltonian cycle and the video I mentioned at the start also used a similar solution. This video isn't about solving snake, it's about learning how to solve snake where the input is the game image.

@v.gedace1519 3 роки тому

@@JackofSome The comment was meant as starting point for detail background info for your viewers.

@simonstrandgaard5503 4 роки тому

Nice snake you have there. Interesting to see what approaches that you are exporing. I'm working on a snake ai myself, and is experimenting with having obstacles. ukposts.info/have/v-deo/jGWpmq-qrnuEtoU.html repo here: github.com/neoneye/SwiftSnakeEngine

@JackofSome 4 роки тому

Hey that looks fantastic. I don't have a mac or iPhone unfortunately otherwise I would have tried it out. What method are you using for the autonomous behavior?