Code: Discord: Collaborations, Sponsors: …
source
Tags: aiartificial intelligenceartificial intelligence newsartificial intelligence news 2023computer sciencecsemulatorgameboyHackinglatest news about robotics technologylatest robotslatest robots 2023learningmachine learningMLplayPokemonPPOProgrammingPythonPyTorchReinforcementreinforcement learningrlrobot newsrobotics newsrobotics news 2023robotics technologies llcrobotics technologyTraining
I wonder what will happen when you try this on newer games. Like the 3DS versions op pokémon.
I had problems with installing the requirements – 1. error: subprocess-exited-with-error – 2. error: metadata-generation-failed
I want to know the button press layout to capture a pokemon first try in the game, I think it would be cool to show my friends…
I'm very impressed, great work and edit
having to go back to the start city… I remember never going past the second city in the game as a kid because of that ! that special case where u have to go back, its probably something a lot of humans struggle with today, pokemon heartgold has a similar issue where at one point u gotta go back to the start point of the game, I literally had to search a playthrough of the game online to learn that I had to do that… tbh, this ai is probably more clever than I am in therm of pokémon, and especially for the older pokémon games, I get sooooooo lost if I dont use a map of the whole game while playing along with a guide…
Very cool. So fun fact regarding Squirtle as the starter choice, I went through all the major encounters and did a +/- breakdown for the starters.
The results were:
Squirtle +46
Charmander +25 (or -4 if you don't use the TM for Dig/EQ on it)
Bulbasaur -16
Bulbasaur does have an edge at the very beginning against Brock and Misty, but falls off after that due to a horrible move pool in Gen 1.
next can we train it to play a nuzlocke?
God please just teach it how to hold down the d-pad… poor little guy
Brooooo please start a Patreon and keep going. Solve the beginning of the game, beat the game, optimize for speed run =D
turns out the real AI were the friends we made along the way
I love how the AI just deducted that the best way to not lose a match is to not let the game progress to a game over state. That's some "the only winning move is to not play" energy.
Any insight on how the first ai figured out how to talk to the Mart owner in viridian and then take the parcel
Back to oak in pallet? If it needs to explore new areas how was they possible with the reward system?
I wonder if you could make a Map A. I. and a Battle A. I. and have them swap out or somesuch.
Could maybe even let you SUPER fine tune them, like rewarding the battle A. I. for getting the "It's Super Effective!" text, and a lower but still positive tick for a "not very effective…"
9:30 Traumatizing A. I., oh geez…
I never though I would see so much pokemon at once on a screen 0:37
awesome video
The patience and dedication you put in this project is more than I have in my whole life combined
22:26 would a good incentive to encourage it to re-explore this area be that, if it has explored for X amount of time without finding anything new, it then can get a new reward for revisiting previous locations?
I feel like this would mirror what a human player would do where, after exhausting other options, they would re-explore previous options.
Upload more videos
I love how it gets salty when losing a pokemon at first and just shuts down completely
Awesome video, I know for a fact this vid is front and center on ur LinkedIn, resume, and probably any project portfolios you have lmfao
Please keep going until one beats the game
Batch upload every playthrough to YouTube and publish. Then just ask for summary notes from the reviewers… Maybe send them a muffin basket…
Or have another A.I. do it, but what do I know? I'm just a volunteer gas-station washroom-attendant.
This video is phenomenal.
grinding level by defeating low level pokemon is pain in the ass.
Dude that was awesome
this is so well done
AI only using offensive moves is accurate with a first playthrough of any Pokemon game!🤣🤣
I think that it could be reasonable to let the AI get a small reward based on effectivity of the movements (the HP lost by the opponent or something similar)
This is a masterpiece of a video
4:06 how to explain bad programming
Ants
I love AI so much, this is probably one of my favorite applications and showcase of an AI program. Insanely entertaining and interesting!
Cant wait to see the next video! Great stuff
The pokemon theme in the background through the second half of the video with no breaks is makin me go insane.
I was surprised once you made the change to assign points for level increases that they didn’t start banking Pokémon and fast-leveling low level ones. Perhaps there’s a limitation due to the cost of pokeballs? I’m curious what would happen if increasing points were rewarded for faster times at checkpoints were implemented?
How is this your only video?? It was so great!!
9:30 explained in perfect AI detail how trauma can happen from not caring enough to make the pattern better
This is really cool, but I think there could be some micro rewards to improve its combat skills, such as rewarding it for using super effective attacks, and punishing for super ineffective (of course, a minimal reward, but that could still roughly teach it to use specific attacks on specific enemies it sees)
14:24 i feel like the human side of this comparison is overgeneralised. first, it's arbitrary where you draw the line between for example "junk" food and "real" food, as in food which, when you eat it, doesn't lead to a so-called "misaligned result". the machine learning program is very straight forward with the result, since there's only one simple objective here: gaining levels. and the +5 is bigger than the +1 it gains from leveling up its pokémon, therefore the objective loses its originally intended purpose. whereas with humans, it's way harder to determine where to draw this line, since we as real creatures are more complex. if buying junk food is a misaligned result where the proxy objective doesn't serve the original objective, is it also a misaligned result in other cases where we do things that are not necessary or can potentially be counterproductive to our "survival" objective if done irresponsibly? the thing is that all things can eventually be harmful to our survival, even those that we started getting rewarded for via dopamine since certain amounts of them benefit us, if we do them too much. therefore, i think the act of buying junk food itself is not a misaligned objective on its own, but doing it too much is. the same way you can't just buy the same three healthy meals day after day. sure, they have the nutrients you need, but they also lack others that are necessary for you. and there's also the aspect of how even things like buying junk food can in other ways be beneficial to survival, such as in cases where people are completely mentally exhausted and don't have the willpower to cook at home. plus, in this case, the dopamine it "wrongfully" sends could increase the person's determination to carry on in a way, despite it technically changing things for the worse, this is also something that is not the case with the program. so yes, on a superficial level, you could see humans as acting similarly to this machine learning program, but rarely can you clearly define where they went wrong, since besides a "misaligned proxy objective", there are many other factors at play. if we had a solution as simple as people getting out of whatever bad situation they're stuck in by fixing just one misaligned reward structure, such as not buying junk food anymore, the same way you could tell the program "don't buy the magikarp, it's useless", then humanity would definitely be better off