The Elo rating system is something you've probably already heard of. It's a set of rules for measuring player skill in competitive games - especially two player games. The idea is that every player is assigned a number, called their "Elo rating". Every time you win a game, your rating goes up. Every time you lose a game, your rating goes down. Over time, your rating is supposed to converge to a value that somehow represents your true level of skill at the game. Once we have their ratings, the Elo system comes with a formula to predict the probability that one player wins against another. The system is named after physicist Arpad Elo, who originally developed it for the United States Chess Federation in the late 1950s. Nowadays, the system is used for games other than chess, including online video games. So there's a lot of information out there about what the Elo system is - how to update a player's ratings, and how to calculate probabilities from them. But what you might be left wondering is why the system works. In other words - just where do these formulas come from? Before diving into the theory behind them, we'd better start by recapping exactly what the rules of the Elo rating system are. For simplicity, we'll assume a two player game in which the only possible outcomes are a win or a loss - no draws. Of course, Elo is applied to games with draws all the time - including chess - but we''ll ignore that in this video. To start with, every player is assigned a rating of 1500. Before we can talk about how these ratings are updated over time, we need to discuss the probability formula. If two players have ratings R1 and R2, the system predicts that the probability that player one beats player two in a game is given by this arcane formula. You can verify for yourself that if you swap R1 and R2 to get the probability of player two winning, this is algebraically equivalent to taking 1 minus this formula, as it should be. When two players play a game, we update the winner's rating by adding the following to it. A positive constant k multipled by the probability that the winner would have in fact lost. We update the loser's rating in a similar way, adding to it k times the probability that they would have won. These probabilities are the estimated probabilities using the previous probability formula, and based on the players' current ratings. Notice that the two probabilities are actually the same thing. The probability that the winner would have lost, and the probability that the loser would have won, are both just ways of saying the probability that the opposite outcome happened. So in fact, the amount of points we give to the winner is the same as the amount we take from the loser, which means if you like, you can think of this as the winner taking a certain amount of Elo from the loser. This means that the sum total of Elo in the entire playerbase, as long as no new players join and no players leave, remains constant, as if it's a kind of currency the players are trading with one another, with every new player joining the community injecting an extra 1500 points into the economy. This further implies that the average Elo across the playerbase is always exactly 1500. Remember this fact for later. There is absolutely no reason why a player's Elo rating couldn't be negative, although some implementations might artificially clamp your Elo at 0, just because the organizers think negative ratings look unseemly. Because this probability is between 0 and 1, the entire Elo update is between 0 and k. k is a constant chosen by the person or organization putting in place the rating system, usually between about 10 and 50. When it's higher, ratings swing more violently, and player ratings may be corrected more quickly if they're too high or too low, but oscillate more around their correct value. Finally, notice that the Elo update amount is larger the more unexpected the outcome of the game is. If a grandmaster beats a novice, their ratings barely budge. If a novice beats a grandmaster, their ratings move by the maximum possible amount of k. So the entire system is contained in just two formulas, the second of which relies on the first. The constants 10, 400 and the initial rating of 1500 are arbitrary, and were chosen when the system was put in place for chess to make the new ratings of players be around the same values they were in the old rating system that was used before Elo. That's it for the what - what about the why? Again, suppose we have a two player game in which drawing is impossible. We're going to imagine that every player has a secret, positive number inside them, which we'll call their strength. When two players with strengths A and B play a game, the probability of each player winning is that player's strength, divided by the sum of the strengths. If this model seems a little arbitrary, notice that this formula actually means something very simple. You can imagine that each player has a line segment, of length equal to their strength. We put these together into a single big segment, and pick a random point from the big segment. Whoever's side it lands on is the winner. If the model still seems a little arbitrary, well... it is! You're not missing anything, there's really no good justification for why this should be an accurate model of human skill at any real game. It's just mathematically very simple. The earliest description of a model like this that I'm aware of is this paper from 1929 by Zermelo, the Z in ZFC. The paper happens to be about exactly the same thing we're talking about here - modeling the skill of chess players, although Zermelo doesn't end up inventing quite what we now know as the Elo rating system. In any case, this is the mathematical microcosm in which we'll create our rating system, and we'll be calling it the Zermelo model. The key to thinking efficiently about this model is the notion of odds. Odds are simply an alternate way of representing a probability. Whereas the probability of an event is the ratio of the number of ways the event might happen to the total number of things that might happen, the odds of an event are the ratio of how many ways it might happen to how many ways it might not happen. We don't lose any information by representing a probability as odds, or vice versa. We can switch back and forth between the two using these formulas. When discussing Zermelo's model, odds are just a little more convenient than probability. If two players have strengths A and B, then the probability of the first player winning is A divided by A + B. But the odds of the first player winning are just A divided by B. Where odds really shine is when we involve more than two players. Suppose three players have strengths A, B and C. Then the odds of player one beating player two are A/B. The odds of player two beating player three are B/C. And the odds of player one beating player three are A/C, which happens to equal the product A/B times B/C. In other words Odds(P1 beats P3) = Odds(P1 beats P2) Odds(P2 beats P3) The odds that player 1 beats player 3 equals the odds that player 1 beats player 2, times the odds that player 2 beats player 3. We'll call this property transitivity of odds. What we have just done is prove that Zermelo's model posesses the transitivity of odds property. We can go further than this, and show the converse: any model with the transitivity of odds property must be equivalent to Zermelo's model. By "model", we just mean some assignment of a probability pij to each pair of players i and j, with of course the extra requirement that pji equals 1 - pij. Now, suppose such an assignment of probabilities has the transitivity of odds property. In other words, given players i, j and k, the odds that i beats j times the odds that j beats k equals the odds that i beats k. Here I'm using the Greek letter omega to represent odds. To show that this model is equivalent to Zermelo's, we have to come up with a positive strength Si for each player, such that for any two players i and j, the odds that i beats j are equal to Si/Sj. I claim that one such choice is to simply set Si equals the odds of player i beatnig player zero, where "player zero" is some fixed, but arbtirarily chosen player. You might have noticed that when we apply this formula to obtain the strength of player zero themselves, we end up with the meaningless odds of player zero beating themselves. To get around this, we'll just set player zero's strength to one. Now, consider two players i and j. By transitivity of odds, the odds of i beating j are just... A general property of odds is that the odds of an event are equal to one divided by the odds of that event's complement, so we can rewrite the odds of zero beating j as one over the odds of j beating zero. Q.E.D. I've glossed over a few small details in this proof, which you can find addressed in the supplementary material to this video. Zermelo's model has this intrinsic multiplicative structure which we call odds transitivity. We can transform this multiplicative structure into an additive one by taking a logarithm. If instead of talking about the strengths of the players, we talk about their log-strengths, then the difference between the log-strengths of two players is equal to the log of the ratio of their strengths, which by the definition of the Zermelo model, equals the log-odds of the first player winning. Since the log-odds ultimately are enough to deduce the probability of that player's victory, this means that the difference between two players' log strengths in principle encodes all of the information about how much more skilled one player is than another. This makes the log strengths a more convenient way of expressing player skill, since you can meaningfully compare them by how far apart they are, rather than by taking a ratio. Let's derive the formula for the probability of one player's victory given the log strengths of both players. The difference of the log strengths equals the log-odds of player one winning, which means we can get the odds by exponentiating. Now, apply the formula for probability in terms of odds. This should look familiar. If we use a base 10 logarithm instead of a natural logarithm, and if for no particular reason we decided all the log-strengths would look prettier if we multiplied them by 400 first, calling the resulting values R1 and R2, then we obtain the familiar probability formula from the Elo system. I'll call this formula the logistic formula, because the 1-over-1-plus-exponential form on the right hand side is called a logistic function. This is the source of the mysterious Elo probability formula. What we have shown is that in the Zermelo model, there exist "correct" ratings such that, when those ratings are assigned to each player, the Elo system's logistic formula produces the correct probabilities. Those correct ratings are simply the Zermelo model's player strengths, taken to a base 10 logarithm, and then multiplied by 400. Of course, those constants are arbitrary, and can be changed so long as you also update the probability formula accordingly. It is also easy to show the converse. If there exist ratings that can be assigned to each player, such that the probability of any player winning against another is given by the logistic formula, then the whole situation is equivalent to a Zermelo model. All you have to do is divide the ratings by 400 and exponentiate to recover equivalent player strengths. In summary, the Zermelo model, the logistic formula, and odds transitivity are all equivalent assumptions about a two player game. So in a Zermelo model, we know that the Elo system is at least capable of producing correct results, in that a set of ratings exists that will make the probability formula correct. What remains to be seen is whether and why the update algorithm of the Elo system produces those correct ratings. An actual proof of convergence would be well outside the scope of this video, which really only aims to explain the theoretical model underlying the Elo system. But now that we understand that model, one thing we can do is simulate it. When the Elo system is applied to a real game, like chess, the question of whether the Elo ratings converge to "correct" ratings is meaningless, since we can't observe the true ratings of the players, if such a thing even makes sense. But if we run an Elo-ranked tournament in our own simulated Zermelo universe, we don't have this problem. We'll generate a population of players, and assign them random positive strengths. When two players play a game, we'll randomly decide the outcome using the Zermelo model formula. We can plot each player as a dot on a graph. The x coordinate indicates the player's true rating. This is the logarithm of the player's strength, multiplied by 400. We'll know what this value is, because we're generating the strengths, but the Elo update algorithm can't see it. The y coordinate indicates their current Elo rating, as estimated by the algorithm. All players start at an Elo rating of 1500. What we should see is that as we play more and more games, all of these dots line up on the line y = x. There's just one problem. If you inspect the logistic formula more closely, you can see that the probability it predicts depends only on the difference R2 - R1. This means that if we were to add the same number to each rating, the formula would predict the same probability. And that means that if we take the "true" ratings of all the players, and add the same constant to them, the resulting shifted ratings predict the same, correct win probabilities. This reflects the underlying ambiguity that if we multiply all the player's strengths by some positive constant, the odds of any player winning also remain the same. This is a problem, because since the update algorithm can only observe the player's true ratings through the random outcomes of games, any change to the true ratings that leaves the win probabilities constant is invisible to it. So the best we can hope for, even if the Elo algorithm works perfectly, is for the Elo ratings to converge not to the true ratings themselves, but the true ratings plus some additive constant. If we call that additive constant C, then the estimated ratings won't actually line up on the line y = x, but rather on the line y = x + C. So if we want to be able to recognize visually that the algorithm is working, we'd better figure out in advance where that line is. This will involve predicting in advance what exactly C will turn out to be. Is that even possible? Since the simulation involves randomness, could C be different every time? Happily, no. If the estimated ratings end up converging to the true ratings plus C, then we can find C as the difference between the average true rating, and the average estimated rating at the end of the simulation. Since the true ratings are being generated by us , we know the average true rating. And as we mentioned earlier, the average Elo rating of a population of players is constant, and always equal to the initial rating, 1500. Therefore, we can accurately predict on which line of slope one the dots in our simulation should line up if the Elo algorithm works, and run our simulation. In these simulations, at each time step all players are simply randomly paired up with one another to play a single game. We can also vary the step-size parameter in the Elo update formula, and observe that higher step sizes result in faster convergence, but greater oscillation around the correct rating. Throughout this video, we have a described a certain rating system, and referred to it as the Elo rating system. That's because when most people say "Elo rating system" today, this is the system they're talking about, based on the core logistic probability formula, and equivalent to assuming the game follows the Zermelo model. Arpad Elo himself wrote one book about his rating system, published in 1978. Read this book, and you'll find that the system it describes is entirely different from the one referred to as "The Elo Rating System" in modern sources. Elo's actual rating system is equivalent to assuming that when two players play, they each generate a random number, independently of one another. Whoever generates the larger number wins. These numbers are drawn from a normal distribution. While the variance of these normal distributions are assumed to be the same for all players, the means of the distributions are taken to be different for each player. Players with higher means will tend to generate larger numbers, and win more often. These means are taken to be the players' ratings that we are trying to estimate. Let's derive the probability that player one beats player two, in this alternate model. Player one generates a number X1 from a normal distribution having their rating R1 as the mean. Player two does the same, generating a number X2. The standard deviations of these two normal distributions are taken to be some constant, which we'll call sigma. Player one wins when X1 is greater than X2. This happens precisely when the difference X1 - X2 is positive. Due to properties of the normal distribution, this difference - X1 minus X2 - also follows a normal distribution, whose mean is equal to the difference in ratings, and whose variance is the sum of the variances, in other words, two sigma. Therefore, the probability that player one wins is the probability that this normally distributed random variable is greater than zero. To find this probability, we need to integrate the probability density function of a normal distribution. We won't compute this today. Suffice it to say that this is not the same formula as the logistic probability formula used in what we now call the "Elo system". Elo's historical model, and the model which we have been calling the "Zermelo model", are not equivalent to one another. This causes confusion to this day. Many explanations of the Elo system on the internet still claim that the model underlying it is this normal variable comparison system. So to be clear, if you ever see an explanation which on the one hand says something like: "The core assumption of the Elo system is that player performance is normally distributed..." And on the other hand, gives the logistic formula for probability, then the explanation is wrong. These two statements describe entirely different models of a two player game, and entirely different rating systems. In terms of priority, the three main papers on the rating system described in this video are Zermelo 1929, Good 1955, and Bradley & Terry 1952. Zermelo's paper is explicitly about chess, and introduces exactly the model we described here, with secret positive strengths held by each player. Good's paper cites Zermelo's and uses the same system, although Good prefers to base it on the equivalent assumption of odds transitivity instead of explicitly talking about strengths. Bradley & Terry use explicit strengths like Zermelo, and seem to have come up with the model independently, but their work is not about chess or games at all, only the mathematics happen to be similar. I've been calling this model the Zermelo model throughout this video because his paper is the oldest one that I'm aware of, and because it was explicitly about chess, but Bradley-Terry model is the most common name you'll see in the literature. The system that Elo spends the majority of his book discussing, using a normal distribution, is usually called the Thurstone model, or Thurstonian model, as it dates back to some papers by Louis Thurstone. The name "Elo" is here to stay at this point. But whatever you call the world's favorite logistic-flavored rating system, hopefully now you can appreciate it as something a little deeper than just some arbitrary rules for updating numbers. Back To Top