Write a GPT from scratch using C++ (TinyGPT)

August 14, 2023

I’ve been learning about neural networks recently and came across the blog post: GPT in 60 Lines of NumPy, I found that the structure of GPT-2 is not that complicated, and the blog post was implemented in Python with only 60 lines of code (without comments), however, for beginners, why not implement it from scratch in C++ for a deeper understanding of the details? That’s how we came up with this project, TinyGPT:

Code Structure

The Python project corresponding to the previously mentioned blog post is picoGPT, and this project, TinyGPT, can basically be regarded as its C++ version, which mainly consists of three classes: Tensor, Model, and Tokenizer:

Tensor: simulates the main interface of Numpy ndarray (at least meets the requirements of this project), this part is more time-consuming, after all, some operations on high-dimensional arrays are still quite troublesome, such as broadcast, reduce, split, stack and so on;
Model: GPT-2 model implementation, basically follow the picoGPT code gpt2_pico.py to write;
Tokenizer: the BPE lexicon implementation, the logic and the official source code of GPT-2 encoder.py is almost the same, which has a regular match using Google’s open source library re2 to achieve;

In addition to the above mentioned regular matching library re2, there is a json parsing library json11, and the matrix multiplication acceleration library Intel MKL, no more other libraries needed.

Results

The default model used in the project is the GPT-2 124M model, the language fluency is quite good:

[DEBUG] TIMER TinyGPT::Model::loadModelGPT2: cost: 582 ms
[DEBUG] TIMER TinyGPT::Encoder::getEncoder: cost: 116 ms
INPUT:Alan Turing theorized that computers would one day become
GPT:the most powerful machines on the planet.
INPUT:What color do plants usually have?
GPT:The color of plants is determined by the number of leaves and stems. The number of leaves and stems is determi
ned by the number of leaves and stems.

when it comes to performance, except for matrix multiplication accelerated by Intel MKL, there are no other optimisations at the moment, and the performance is not much different from the Python version.

Build and Run

Github repository: https://github.com/keith2018/TinyGPT

It has been verified that this project can be compiled and run normally on Windows, macOS and Linux, note that the MinGW environment is not supported on Windows currently (if you use Clion, you need to change the default ToolChain setting to Visual Studio).

1. Clone

git clone --recurse-submodules https://github.com/keith2018/TinyGPT.git

2. Install Intel MKL

Download and install Intel®-Optimized Math Library for Numerical Computing on CPUs & GPUs from the official website

3. Download the GPT-2 model file

The Python script is already in the project and can be downloaded with one click:

python3 tools/download_gpt2_model.py

When the download is complete, the model file model_file.data (~474MB) will be added to the assets/gpt2 directory.

4. Build and Run

mkdir build
cmake -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release
cd app/bin/Release
./TinyGPT_demo

Some previous projects: