The Elo ratings system has become famous in a few contexts. Perhaps most famously, it has been been the basis of Chess ratings since the 1960’s. Additionally, the website 538 has successfully used modifications of it for most of their well-known sports ratings. Less publicly, many video game developers use variations of the Elo system behind the scenes in their matchmaking systems. If you’re reading this article, I’ll assume you have some familiarity with the system. Why is it so used in so many contexts? I would argue because of its computational scaling, versatility, and simplicity. There are however some drawbacks. In this article, we will address a very key one, while maintaining the advantages listed above.
While Large Language Models are currently getting all of the attention (pun intended), there are other exciting models that are being developed separately with very different use cases. Symbolic regression is typically well suited to discover closed-form analytical rules rather than attack a deep learning task like classifying an image or translating an audio recording. If you wanted to rediscover Newton’s law of cooling, for example, you could build a resource intensive dense neural network. This would do well with enough data, but would not be able to generalize to situations it hadn’t seen. However, symbolic regression would be the right tool for the task. It can find the exact formula with limited data, and therefore not only generalize, but save quite a bit on computation. One of my favorite papers of all time by Cranmer et al. goes into this further and even develops a previously undiscovered equation for dark matter overdensity.
Classic Elo ratings treat every rating as equally certain. This is usually a poor assumption for a large-scale ratings system. Simply put, newcomers to a ratings system should almost always be modeled with greater variance than those who have been around for awhile. Likewise, players that the ratings system hasn’t seen for a long period of time…