Completed Dec. 2017 as a solo side project
This project is a simple example of how to apply deep reinforcement learning to video games. The model is trained to move the cart left and right in order to keep the pole upright. The agent learns how to play only through its experiences playing the game. This is a similar model (though much simpler) as AlphaGo, which is famous for beating the best Go players in the world.
One of the coolest aspects of Q-learning is that the agent learns how to play the game through self exploration instead of human-labeled data. One of the drawbacks of supervised learning based on human-curated data is that the model can, at best, learn how to mimic that data and do the job as well as the humans who curated the data. Humans do a near perfect job solving some problems, but most of the time we are not perfect. By training the agent through its own self play, it can actually become better than humans and devise strategies that have never been used before. This is how AlphaGo is able to play Go at a super-human skill level.