Day 5 with GPT-2

Getting back into the groove of writing and working on things really just took a real and fun challenge to kickstart. Having a set of real work to complete always makes things a little bit easier and clearer. Instead of thinking about the possible you end up thinking about the pathing to get things done. Being focused on inflight work has been a nice change of direction. Maybe I underestimated how much a good challenge would improve my quarantine experience. Things have been a little weird since March and the quarantine came into being and it is about to be June on Monday. That is something to consider in a moment of reflection.  

I have been actively working in the Google Colab environment and on my Windows 10 Corsair Cube to really understand the GPT-2 model. My interest in that has been pretty high the last couple of days and I have been working locally in Windows and after that became frustrating I switched over to using GCP hardware via the Google Colab environment. One of the benefits of switching over is that instead of trying to share a series of commands and some notes on what happened I can work out of a series of Jupyter notebooks. They are easy to share, download, and mostly importantly to create from scratch. The other major benefit of working in the Google Colab environment is that I can dump everything and reset the environment. Being able to share the notebook with other people is important. That allows me to actively look at and understand other methods being used.  

One of the things that happened after working in Google Colab for a while was the inactivity timeouts made me sad. I’m not the fastest Python coder in the world. I frequently end up trying things and moving along very quickly for short bursts that are followed by longer periods of inactivity while I research an error, think about what to do next, or wonder what went wrong. Alternatively, I might be happy that something went right and that might create enough of a window that a timeout occurs. At that point, the Colab environment connection to the underlying hardware in the cloud drops off and things have to be restarted from the beginning. That is not a big deal unless you are in the middle of training something and did not have proper checkpoints saved off to preserve your efforts. I ended up subscribing to Google’s Colab Pro which has apparently faster GPUs, longer runtimes (less idle timeouts), and more memory. At the moment, the subscription costs $9.99 a month and that seems reasonable to me based on my experiences so far this week. 

Anyway —- I was actively digging into the GPT-2 model and making good progress in Google Colab and then on May 28 the OpenAI team dropped another model called GPT-3 with a corresponding paper, “Language Models are Few-Shot Learners.” That one is different and has proven a little harder to work with at the moment. I’m slowly working on a Jupyter notebook version. 

Git: https://github.com/openai/gpt-3
PDF: https://arxiv.org/pdf/2005.14165.pdf

Leave a Reply

Your email address will not be published.