Completed Oct. 2017 as a solo side project
I'm a huge Game of Thrones/A Song of Ice and Fire fan. During season 7 of Game of Thrones I decided to apply some of my deep learning knowledge to the ASOIAF books in order to generate some text, which I dubbed "book 6." I posted my finished project on HackerNews and it went viral from there. This is probably the most famous I will ever be—millions of people saw my project around the world. As a result of my GOT AI, I got to do several interviews with major news publishers, be a guest host on NVIDIA’s AI Podcast, and was invited to New York to consult for Moody’s.
The stack for this project was pretty simple since nothing needed to be deployed in a production environment. I used an AWS GPU instance to host a Jupyter web server allowing me to work on the project in my browser with the speed/parallelization capabilities of a GPU. Tensorflow is the deep learning library that I used to build and train my model, which was a three-layer LSTM cell.
My first time using machine learning to generate text was back in college when I trained a Monte Carlo Simulation to write poems. Fast forward a few years... In 2017 I was spending a lot of my time practicing and applying deep learning, which exposed me to RNNs. As I went further down the rabbit hole I discovered that LSTM cells were the current state-of-the-art algorithm for text generation.
When I first ran the ASOIAF text through a standard LSTM cell, the results were ok, but not great. I messed around with the hyperparameters some, but that only gave me marginal improvements. The first big leap in quality I saw was cleaning the text better—tagging special characters, removing cruft text at the start and end of chapters, etc.
After these changes I started to get text that was intelligible, though many grammar mistakes still existed. I noticed that the training loss was quickly equilibrating to around the same value as the test loss, which is a sign of under-fitting. So I did some research on how to properly scale LSTM cells up and settled on a three-cell architecture.
The last step of the process was tuning the hyperparameters, which I did by training several instances of the model on AWS GPUs at once and comparing results. Over the course of two weeks (this model took a little over 24 hours to train) I had found a good set of hyperparameter values that trained a model which gave consistently good results.
That's when the fun began. I spent an entire night generating 500-character "chapters" and picking the best ones to publish. It was a funny one enjoyed alongside a few beers.
I have several hypotheses on how I could improve my text generation quality, but I haven't had the time to try and implement them. The first experiment I would like to try is training the model first on a large body of generic text, and then retraining on the ASOIAF books. Even though the ASOIAF books are long (over a millions words), that's still not a ton of text to train a three-layer LSTM cell on. And you can see this in the quality of the text—sometimes it still makes grammar errors that a model like this should be able to learn.
My second hypothesis is much more experimental and probably wouldn't work, but who knows. One of the problems models like this still struggle with are digesting context. They are great at generating sequences of words that when looked at in groups of a few sentences seem to perform well. What the model still struggles with is understanding context of the plot. For example, there are an infinite number of ways an author could kill a character. The almost never say "and then X died." They describe the scene and we are able to infer from context that the character is dead (like if he doesn't have a head anymore). I'd be interested in tyring to accumulate a trove of manually tagged phrases and training a model that could somehow interpret context. I envision the output of this model feeding into the LSTM cell somehow. This is a crazy idea and obviously pretty abstract, but someday I hope I get the time to do this experiment.
When I set out to create my Game of Thrones AI, I didn't intend for it to go viral. I thought at best it would be a funny thing to show my friends and put on my resume. I posted my finished project on HackerNews and it took off from there. My Game of Thrones AI made it to the top of the HackerNews feed and the ASOIAF reddit thread.
Within a few hours a freelance journalist called for an interview and his article was initially picked up by Vice and published on Motherboard. This is when my project went global. Media outlets all over the world reposted this interview, such as New York Post, Huffington Post, IFL Science, and dozens more.
These repostings made my social feeds blow up with messages and requests. I got to write a blog post that was published by Udacity, do more interviews with global media outlets like Tencent, Discovery Channel, and Shenzhen, record a podcast for NVIDIA's AI Podcast, and was offered the opportunity to be flown to New York by Moody's to show them how the AI works.