Build your own AI harness racing machine learning betting model

Written by Brett Sturman

Much has been accomplished through the long-standing gap between commercial betting syndicates and computer-assisted wagering (CAW) operations compared to the typical retail harness player. On the one hand, you have the average harness handicapper with just the format of a race program that hasn’t changed in 100 years. And on the other side, there was an army of groups with unlimited resources, data scientists, and access to their own data.

There has always been an impossible difference between the two groups, but times are changing rapidly and dramatically.

The latest versions of AI tools have helped level the playing field in ways that were unimaginable until recently. I think anyone with a knowledge of sports (if you’re reading this, that’s you), combined with curiosity and a basic understanding of data, can now build things that previously took teams of developers months or years to build.

The most impressive AI tool I’ve ever seen is Anthropic’s Claude. The extent to which we can take the domain knowledge provided by aspiring handicappers and use it to build actionable predictive models is truly remarkable. Even a year ago, building professional models using AI tools would have been a daunting task for non-developers. But now everything is almost completely automated, from converting data into readable formats, building end-to-end code in any programming language, implementing the right type of predictive techniques, building diagnostics, and validating them for troubleshooting.

Here’s a real guide on how it looks.

This example uses TrackMaster as the data provider and downloads the data file in XML format instead of visiting the TrackMaster website and purchasing the standard program in PDF file format. XML provides all the data behind a standard race program, but also includes additional data that is not necessarily visible on the program page. Understanding this level of data allows us to understand the advantages these large groups have had over the years, but this is just scratching the surface.

With Claude or any AI, you can upload a data file, parse all the thousands of lines of code, tell it what data to extract, and basically convert it into a standard Excel file. I ran this the other day with files from The Meadowlands last Friday (March 13th), and the raw data is clear.

This particular file contains over 1,600 rows of data, each row reflecting a separate line of past performance, but the one I found most useful was the class evaluation field. Each record was assigned a class rating, which also appears in the standard TrackMaster program next to the speed rating, but you can have the model analyze it in a readable format, giving you even more ways to analyze the numbers.

In any given race, depending on the quality of the horses in that particular race, a MADC 1 race can be considered exactly the same as a TM 71 field. However, some of the MADC 1 races are lower, some of the TM 71 races are higher, and vice versa, and it all depends on which specific horses are in those races. Which $10,000 race was rated as higher quality? What about a specific N/W $2,500 L5 race or a random New Jersey SDF race? This kind of information takes all the guesswork out of it and allows the model to treat these races differently based on their ratings, even though they all appear to be the same class at first glance.

From there, use the data as you see fit. You can tell the AI to include not only class ratings, but also track and post speed ratings, driver and trainer trends, location statistics, and proceed from there. How you weight recent races, treat races where your horse broke stride differently, incorporate pace predictions, build custom calculations, and make stable changes is all up to you. Trainers usually do these things briefly with horses after qualifying. There are no restrictions. All you do is tell the AI what you’re thinking, and it writes the programming code to do it.

One big caveat is that models can only do so much with raw data, especially in harness racing. Whereas in our sister sport, thoroughbred racing, these models are much more prevalent and allow us to build well-constructed models using vast amounts of data points, harness racing, in my opinion, has more “feel” and therefore more volatility.

In thoroughbred racing, any horse, if fast enough, has a chance to enter the race and win. But in harness racing, it’s much harder to model a driver who has shown fast speed for five races in a row with the same horse, but randomly decides to move back to ninth tonight at odds of 3-1, or a horse who arrives from fourth and gets shuffled, or a horse who drops two horses in front and thus falls further back than planned. The human factor goes on and on.

This is purely speculation, but for human reasons I believe that the best commercial operation of using models in harness racing requires first-hand driver and trainer information to be incorporated into the calculations. This is too important a factor to not include, but it may still be a unique advantage that these groups have. However, if you are closer to the physical horse than I am and have unique information, please feel free to incorporate that information into your model.

All of this means that the technology has never been in the state it is in today, where users can build their own models. When built on sound principles and layered with your own knowledge and judgment about relationships within the sport, real value is created.

Even though large-scale operations have given big data a huge advantage for years, this is the first time that regular players can begin to close the gap.

Source link