My training data set for my big GPT-2 adventure is everything published on my weblog. That includes about 20 years of content that spans. The local copy of the original Microsoft Word document with all the formatting was 217,918 kilobytes whereas the text document version dropped all the way down to 3,958 kilobytes. I did go and manually open the text document version to make sure it was still readable and structured content.
The first problem is probably easily solved and it related to a missing module named “numpy”
PS F:\GPT-2\gpt-2-finetuning> python encode.py nlindahl.txt nlindahl.npz
Traceback (most recent call last):
File “encode.py”, line 7, in
import numpy as np
ModuleNotFoundError: No module named ‘numpy’
PS F:\GPT-2\gpt-2-finetuning>
Resolving that required a simple “pip install numpy” in PowerShell. That got me all the way to line 10 in the encode.py file. Where this new error occurred:
PS F:\GPT-2\gpt-2-finetuning> python encode.py nlindahl.txt nlindahl.npz
Traceback (most recent call last):
File “encode.py”, line 10, in
from load_dataset import load_dataset
File “F:\GPT-2\gpt-2-finetuning\load_dataset.py”, line 4, in
import tensorflow as tf
ModuleNotFoundError: No module named ‘tensorflow’
Solving this one required a similar method in PowerShell “pip install –upgrade pip install https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.8.0-py3-none-any.whl” that also included a specific path to tell is where to get TensorFlow.
I gave up on that path and went a different route…
https://github.com/openai/gpt-2/blob/master/DEVELOPERS.md
and
https://colab.research.google.com/github/ilopezfr/gpt-2/blob/master/gpt-2-playground_.ipynb
Leave a Reply