Tuesday, July 14, 2009

A grading system for comparing tournament stuctures (part 2)

This article is the second in a series on designing a tournament:
Part 2 - A Grading System (this article)
Part 3 - Calculation of Win Percentages

A Grading System
Although I set out what I think are the boundaries of tournament systems (that of single elimination and league) at the end of the previous article, in this section I want to come up with a meaningful method of comparing the disparate tournament structures in terms of its ability to rank players, the desirability from a player's and spectator's perspective, the resources that it uses, the inherent tournament bias, and the individual matchup bias. Some of these (like resources) are a known quantity and can be mathematically compared, but others may stray into subjective territory. If there is no hard number accessible, a rating system will rank the specific tournament on a scale of A to E against the other tournament systems, where A is the best for that particular section and E is the worst. Lets start with some easy ones first:

Even though location impacts on time eventually as a resource, there are still some benifits in having the 2 separate resources listed so that situations with ample amounts of one resource can choose appropriately. The first value is minimum time to complete the tournament. As I would like to compare the difference between one match and best-of-3 matches at a later date, the match length is used as the discrete unit of measure. An example will be an 8 man single elimination tournament has a minimum time of 3 match lengths, whereas an 8 man league has a minimum time of 7 match lengths (calculations formally defined later on).

Although it doesn't neatly cover the dilemma of limited locations, the number of matches in a tournament gives an indication of the size of the problem. An 8 man single elimination tournament is completed in 7 matches, but an 8 man league takes 28 matches.

Another way of demonstrating the time increases imposed by limiting locations is by recalculating the total time taken if only 1/2 the locations are available for a tournament of that size. This would give single elimination an increase from 3 to 4 match lengths for an 8 man tournament in contrast to an 8 man league going from 7 to 14 match lengths. [Maybe represent as a proportional increase?]

Tournament ranking ability:
A simple count of each unique rank given to participants should suffice. A 16 man single elimination tournament provides only 5 ranks(1st, 2nd, =3rd, =5th, =9th), whereas a 16 man league will rank all 16.

Inherent Tournament Bias:
Now it starts to get a bit tricky. A couple of years a go I developed an application to measure the inherent bias of a tournament structure against a competitor being ranked at his true skill. An extreme example of this is if the best player took on the 2nd best player in the first round of a 64 man single elimination tournament. The person who should have come 2nd will now come =33rd. The program takes a template of a tournament system and runs thousands of trials using random seeding and with competitors playing at their true skill to gain an average bias per player. An example would be a 64 player single elimination tournament having an average bias of ~0.59 levels after 100,000 trials.

[updated 15/7/09]
The unit chosen is a level of a single elimination tournament. This means that being misranked as 2nd to =3rd is the same importance as being misranked =17th to =33rd. This feels about right as the importance for accurately placing the top players has more perceived bearing on the bias of the tournament.

Using a Base 2 logarithm we can formalize the meaning of a 'level'. Log2(16) = 4, Log2(32) = 5, log2(64) = 6 etc. So someone who has a true skill that ranks them 32nd would receive an expected value of 5, but if they finish in 16th they receive an actual value of 4. The difference (1) is the same as finishing up a 'level'. So the formula for player bias is:
Player finishing bias = abs(log2(expectedPosition) - log2(finalPosition))
With a formula not reliant on the actual levels of a tournament, it allows the average tournament bias to compare totally different tournament structures as long as you know where players were supposed to finish compared to where they actually did.

Individual Matchup Bias
Although the bias should be able to be estimated once the situation is known, these factors by and large impact on all systems equally. I might leave this section as a special notes category to highlight specific independant matchup issues (such as robustness for home/away bias) until there's a standard way of presenting these types of bias across all structures.

Player satisfaction
This grade will need to be tempered with some subjectivity, but there are 2 areas that can be measured: average number of games played and the closeness of the competitor's skill levels. Average number of games played helps give an indication of how many rounds a player can expect to stay in the tournament and is a ratio to the minimum number of rounds of the tournament to normalize the result. Leagues would have 1.00 as players participate in every round (barring finals, leagues with finals will be analyzed seperately) whereas an 8 man single elimination tournament gives ~0.37.

The closeness in competitor skill should indicate both a greater potential for a close game and a greater potential for learning in a competitive environment. I'd like to collect some solid evidence (or even lots of circumstantial evidence) that this is the case, but it feels right from a what I've observed. Maybe it's a cutoff thing instead of proportionally based? I'll need to adjust the program to output this result anyway, so I'm open to suggestions.

Spectator satisfaction
Not sure what I can do in terms of measuring entertaining play, but close matches can glean off the closeness in competitor skill grade, with possible emphasis on the final games [logarithmic?]. The high degree of skill would emphasize tournaments that gave maximum opportunity to 1st and 2nd to play each other, and a logarithmic dropoff after that. [Very close to tournament bias?]

In the next couple of articles I'll look at the more common tournament structures and see how they stack up. I'm sure I'll be back onto this page at some point to tweak the grading process to more aptly fit the criteria. [should wikify it?]

Part 3 - Calculation of Win Percentages