PewDiePie details a DIY AI project that claims to rival ChatGPT in coding tests

Machine Learning


PewDiePie revealed that it spent months fine-tuning its own AI model, and claimed that it temporarily outperformed ChatGPT on coding benchmarks.

In a new YouTube video, the creator explained that the project began as a personal challenge to better understand machine learning, rather than building a model from scratch. Instead, we fine-tuned existing large-scale language models using custom datasets and coding-focused benchmarks aimed at improving the performance of our AI coding agents.

According to PewDiePie, the model initially scored just 8% on coding benchmarks, but has since gradually improved through retraining and format adjustments. After introducing inference data and refining the dataset, he claimed that one iteration reached 19.6%, temporarily surpassing ChatGPT’s score at the time.

But then he discovered benchmark contamination. This meant that some of the training data overlapped with the benchmark questions, invalidating the results and forcing a restart of the process.

After retraining with a more coding-specific base model, PewDiePie said performance improved significantly, ultimately reaching 36% and 39.1% after post-training adjustments.

The project was not without its setbacks. He detailed repeated system crashes, overheating issues, and even a failed GPU during training, and explained that the hardware settings were significantly changed to handle the computational load.

Despite the hurdles, PewDiePie emphasized that the experiment was primarily about learning through trial and error. He acknowledged that strong results in a single benchmark won’t necessarily translate into broader AI capabilities, and said further testing is needed before considering public release.

He also said that newer models, including Qwen 3, now score higher on the same benchmarks, meaning continued development is needed to remain competitive.

For now, the YouTuber says the experiment is a deep dive into AI development, proving that creators outside of traditional tech circles can explore the rapidly evolving field.



Source link