Training AI to Play Pokemon with Reinforcement Learning

June 28, 2024

in Videos

Reading Time: 1 min read

61 1

Share on Facebook Share on Twitter

Never-ending Learning of User Interfaces

Recommended For You

Apple WWDC 2024: Everything Revealed in 12 Minutes

by Robotics Intl

June 22, 2024

At WWDC 2024, Apple reveals software program updates to Imaginative and prescient Professional, iPhone, iPad, Mac and Apple Watch. The corporate additionally launched ... source

The 8 New Disturbing AI Breakthroughs

by Robotics Intl

June 28, 2024

The 8 New Disturbing AI Breakthroughs Maintain Your Digital Life Non-public and Be Protected On-line: ... source

Unitree G1 AI Robot: Buy It Yourself For $16,000

by Robotics Intl

June 8, 2024

Meet the Unitree G1 AI Humanoid Robotic! This superior know-how is altering the sport in robotics. Uncover the capabilities of ... source

Pascal Bornet Artificial Intelligence – Weekly News

by Robotics Intl

May 22, 2024

Pascal Bornet Synthetic Intelligence - Weekly Information On this week's AI information roundup, we delve into the evolving relationship ... source

AI revolutionizing the real estate business #ai #ainews #artificialintelligence #news

by Robotics Intl

May 15, 2024

AI revolutionizing the true property enterprise #ai #ainews #artificialintelligence #information synthetic intelligence information. source

Comments 40

@ItsMrMooza says:

1 week ago

I wonder what will happen when you try this on newer games. Like the 3DS versions op pokémon.

Reply
@senseikazuto says:

1 week ago

I had problems with installing the requirements – 1. error: subprocess-exited-with-error – 2. error: metadata-generation-failed

Reply
@OutcastSpartan says:

1 week ago

I want to know the button press layout to capture a pokemon first try in the game, I think it would be cool to show my friends…

Reply
@francois1280 says:

1 week ago

I'm very impressed, great work and edit

Reply
@SpeedyGwen says:

1 week ago

having to go back to the start city… I remember never going past the second city in the game as a kid because of that ! that special case where u have to go back, its probably something a lot of humans struggle with today, pokemon heartgold has a similar issue where at one point u gotta go back to the start point of the game, I literally had to search a playthrough of the game online to learn that I had to do that… tbh, this ai is probably more clever than I am in therm of pokémon, and especially for the older pokémon games, I get sooooooo lost if I dont use a map of the whole game while playing along with a guide…

Reply
@dryrunhd says:

1 week ago

Very cool. So fun fact regarding Squirtle as the starter choice, I went through all the major encounters and did a +/- breakdown for the starters.
The results were:
Squirtle +46
Charmander +25 (or -4 if you don't use the TM for Dig/EQ on it)
Bulbasaur -16

Bulbasaur does have an edge at the very beginning against Brock and Misty, but falls off after that due to a horrible move pool in Gen 1.

Reply
@whaleship says:

1 week ago

next can we train it to play a nuzlocke?

Reply
@thomasleaver7362 says:

1 week ago

God please just teach it how to hold down the d-pad… poor little guy

Reply
@thomasleaver7362 says:

1 week ago

Brooooo please start a Patreon and keep going. Solve the beginning of the game, beat the game, optimize for speed run =D

Reply
@L0rdRead says:

7 days ago

turns out the real AI were the friends we made along the way

Reply
@immorttalis says:

6 days ago

I love how the AI just deducted that the best way to not lose a match is to not let the game progress to a game over state. That's some "the only winning move is to not play" energy.

Reply
@therhymerr says:

6 days ago

Any insight on how the first ai figured out how to talk to the Mart owner in viridian and then take the parcel
Back to oak in pallet? If it needs to explore new areas how was they possible with the reward system?

Reply
@youtubeuniversity3638 says:

6 days ago

I wonder if you could make a Map A. I. and a Battle A. I. and have them swap out or somesuch.

Could maybe even let you SUPER fine tune them, like rewarding the battle A. I. for getting the "It's Super Effective!" text, and a lower but still positive tick for a "not very effective…"

Reply
@youtubeuniversity3638 says:

6 days ago

9:30 Traumatizing A. I., oh geez…

Reply
@thomasowen5785 says:

6 days ago

I never though I would see so much pokemon at once on a screen 0:37

Reply
@locker5383 says:

6 days ago

awesome video

Reply
@I703I says:

6 days ago

The patience and dedication you put in this project is more than I have in my whole life combined

Reply
@DavidBennettPiano says:

6 days ago

22:26 would a good incentive to encourage it to re-explore this area be that, if it has explored for X amount of time without finding anything new, it then can get a new reward for revisiting previous locations?
I feel like this would mirror what a human player would do where, after exhausting other options, they would re-explore previous options.

Reply
@kalplokah says:

5 days ago

Upload more videos

Reply
@isiahduran2041 says:

5 days ago

I love how it gets salty when losing a pokemon at first and just shuts down completely

Reply
@beans7363 says:

5 days ago

Awesome video, I know for a fact this vid is front and center on ur LinkedIn, resume, and probably any project portfolios you have lmfao

Reply
@ceoofbears1771 says:

5 days ago

Please keep going until one beats the game

Reply
@jacobsimanek1333 says:

5 days ago

Batch upload every playthrough to YouTube and publish. Then just ask for summary notes from the reviewers… Maybe send them a muffin basket…

Or have another A.I. do it, but what do I know? I'm just a volunteer gas-station washroom-attendant.

Reply
@dannynguyen7076 says:

5 days ago

This video is phenomenal.

Reply
@MrCloxacillin250mg says:

5 days ago

grinding level by defeating low level pokemon is pain in the ass.

Reply
@jemormaypa4005 says:

5 days ago

Dude that was awesome

Reply
@ayyndrew says:

5 days ago

this is so well done

Reply
@self-parternerd8661 says:

5 days ago

AI only using offensive moves is accurate with a first playthrough of any Pokemon game!🤣🤣

Reply
@crysthiangonzalezfuentes7181 says:

5 days ago

I think that it could be reasonable to let the AI get a small reward based on effectivity of the movements (the HP lost by the opponent or something similar)

Reply
@240SX2pAc says:

4 days ago

This is a masterpiece of a video

Reply
@redheadedengineer6889 says:

4 days ago

4:06 how to explain bad programming

Reply
@darkassassin1989 says:

4 days ago

Ants

Reply
@Dominicthedonkey says:

4 days ago

I love AI so much, this is probably one of my favorite applications and showcase of an AI program. Insanely entertaining and interesting!

Reply
@nhd6128 says:

3 days ago

Cant wait to see the next video! Great stuff

Reply
@brandonquailer8595 says:

3 days ago

The pokemon theme in the background through the second half of the video with no breaks is makin me go insane.

Reply
@ValensBellator says:

2 days ago

I was surprised once you made the change to assign points for level increases that they didn’t start banking Pokémon and fast-leveling low level ones. Perhaps there’s a limitation due to the cost of pokeballs? I’m curious what would happen if increasing points were rewarded for faster times at checkpoints were implemented?

Reply
@ValensBellator says:

2 days ago

How is this your only video?? It was so great!!

Reply
@lunamoor1647 says:

2 days ago

9:30 explained in perfect AI detail how trauma can happen from not caring enough to make the pattern better

Reply
@TheColorOrang says:

2 days ago

This is really cool, but I think there could be some micro rewards to improve its combat skills, such as rewarding it for using super effective attacks, and punishing for super ineffective (of course, a minimal reward, but that could still roughly teach it to use specific attacks on specific enemies it sees)

Reply
@arivbf says:

1 day ago

14:24 i feel like the human side of this comparison is overgeneralised. first, it's arbitrary where you draw the line between for example "junk" food and "real" food, as in food which, when you eat it, doesn't lead to a so-called "misaligned result". the machine learning program is very straight forward with the result, since there's only one simple objective here: gaining levels. and the +5 is bigger than the +1 it gains from leveling up its pokémon, therefore the objective loses its originally intended purpose. whereas with humans, it's way harder to determine where to draw this line, since we as real creatures are more complex. if buying junk food is a misaligned result where the proxy objective doesn't serve the original objective, is it also a misaligned result in other cases where we do things that are not necessary or can potentially be counterproductive to our "survival" objective if done irresponsibly? the thing is that all things can eventually be harmful to our survival, even those that we started getting rewarded for via dopamine since certain amounts of them benefit us, if we do them too much. therefore, i think the act of buying junk food itself is not a misaligned objective on its own, but doing it too much is. the same way you can't just buy the same three healthy meals day after day. sure, they have the nutrients you need, but they also lack others that are necessary for you. and there's also the aspect of how even things like buying junk food can in other ways be beneficial to survival, such as in cases where people are completely mentally exhausted and don't have the willpower to cook at home. plus, in this case, the dopamine it "wrongfully" sends could increase the person's determination to carry on in a way, despite it technically changing things for the worse, this is also something that is not the case with the program. so yes, on a superficial level, you could see humans as acting similarly to this machine learning program, but rarely can you clearly define where they went wrong, since besides a "misaligned proxy objective", there are many other factors at play. if we had a solution as simple as people getting out of whatever bad situation they're stuck in by fixing just one misaligned reward structure, such as not buying junk food anymore, the same way you could tell the program "don't buy the magikarp, it's useless", then humanity would definitely be better off

Reply