Microsoft [through TrueSkill] makes another good point here that ONLY winning and losing can be allowed to affect these stats. You can't adjust the matchmaking stat by "experience points" or even by any skill-based stats such as headshots, number of kills, time to finish a lap in racing, etc. All those stats can be gamed, and you will end up trying to get more headshots or something instead of winning. Any formula that equates number of headshots (or any other stat besides wins/losses) with how likely you are to win or lose introduces a layer of imperfect simulation. If we want to know how likely you are to beat someone, we should only consider your wins and losses, and not any in-game stats.
When looking at a mathematical ranking system this makes sense, but it doesn't make sense if you are purely interested in efficient matchmaking. Here's an example:
Starcraft starts off and all players have no wins or losses and no stats. Legionnaire (pro player) gets drawn against a scrub in his first round and wins comfortably; the scrub had not even managed to get out of his starting area. Analysing the statistics after the game also indicate a cakewalk to even casual observers. Quick game time, aggressive economy expansion vs staggered economy graph, little waste vs high average waste, etc.
Another game is played between 2 average players which happen to be closely matched. The game includes a number of pushes, each player's expansion being demolished and culminates in a pitched battle at the 40 minute mark. The statistics show a variety of interesting observations for astute players, but a big dip in army count for the loser is probably the only giveaway as to who won.
In TrueSkill these 2 games would both have winners on 1-0 and be equally ranked. For their next match they would be as likely to be matched together as it would be for Legionnaire to be matched against another starting pro player. If there was a pool of 50,000 starting players (first day numbers), it would take ~10 matches before the top ten would even play each other, and many more before the win/losses settled into a nice 50/50 pattern indicating you're playing at your level.
If a human were in charge of looking at all the matches played and given the task of suggesting new evenly matched pairings for the next round, they would be easily able to break up the whole group into, say, 10 pools of varying skill level by looking solely at the stats. They may even pick up on the fact that the 2nd match documented looked to be pretty even, and would be worth a rematch immediately for added context to the players. Repeating this process only 4 times should have people within the top 10 getting matched up together to get right into those super-fun, closely matched games.
It would seem that additional statistics can be useful in terms of matchmaking, so why the intolerance from mathematical systems such as TrueSkill? To me it comes down to a couple of issues: differing goals of ranking vs matchmaking, ignorance of bias inherent in matches, and the need for objectivity.
Ranking vs Matchmaking
To me the main issue is that TrueSkill is both a ranking and a matchmaking system. If you replaced "matchmaking" with "ranking" in the above quote I'd agree wholeheartedly; you DO need a very rigorous way of ranking people. If you brought in anything else other than wins / losses you give advantage to those who play a certain way (finish quicker, headshot more, etc). As an aside this would also reduce the potential for innovation and metagames.
A good ranking system requires an independant, unbiased method of delineating between players of different true skill (the player's inherent skill that ideally would be on display in each match, but can be clouded by a number of factors) so that each player feels as though their attributed rank is in line with their perceived rank. There needs to be justification for your rank so that if someone questions why another player is higher or lower than them, there is evidence available to alleviate their concerns. Any biases in the ranking process detracts from the validity of the entire system.
Matchmaking, on the other hand, has a goal of maximizing entertainment to the competitors (and spectators). The premise is that closely ranked players are more likely to play epic battles with high engagement and player satisfaction. I have long supported this notion and have built it into a number of leagues and tournaments, however it's not the ONLY thing. Sometimes players enjoy playing above their rank for that underdog win feeling. Players may get more out of playing with/against friends. Players may enjoy certain matchups, or certain maps.
In one of the grand finals for Australian Warcraft 3 I had an interesting conversation with another tournament organiser. 2 of the favourites for the title who were long-standing rivals had been drawn in seperate pools, but were to meet in the semi-finals of single elimination rather than the finals. The pool allocation was seeded as much as possible through prior matches in the tournament, but as these players were from different states, there was no justification in placing them over and above other state winners. The other organiser was just trying to see the best match in the finals and wanted the pools reorganised. This is a classic case of conflict between ranking (through merit) and matchmaking (for entertainment).
Matchmaking for entertainment becomes a bit wishy washy. How do you know that someone is going to enjoy the match you just created? For a start you could give opportunity for the player to select certain traits before the game starts (initial team etc) that the player KNOWS they want, but you could also give opportunity for players to rate their enjoyment of a game after the game has finished (or after watching a replay for spectators). This needn't be an exhaustive analysis of the game, a score out of 10 or even a thumbs up / thumbs down would suffice as this would give rise to a new method of optimisation: to matchmake games that maximize player/spectator satisfaction. Could it be gamed? Sure, but the eventual loser would be the players themselves. The more accurate information you can give as to why it was fun, the quicker a system could deliver games that increase your entertainment.
Objective vs Subjective matchmaking
Putting the idea of matchmaking for entertainment to the side, let's revisit the intial premise; do you only need to wins/losses to adaquately matchmake? Yes, if you want to maintain objectivity. The example of using a human to view the game and statistics to more efficiently separate people requires having faith in a 'gut feeling'. A human can look at a short overruned game and see that the opponent was outclassed, but it's not just because of the game's length. A human can see that a better player can outmaneuver troops to maintain a winning advantage, but that's not a hard and fast rule for winning either. There is, in fact, many opportunities through the game that pro players can demonstrate their skill and these build up to give a general feeling of confidence that one player is superior over another.
When building back propagated AI networks in my uni days, we would continually use problems like these. You can't quite put your finger on what are the hard-and-fast rules to follow that led you to a decision on who is better, but you're pretty confident of giving a judgement either way. With the volume of statistics available after Starcraft matches, it should be achievable to devise an AI that delivered, say, 7 levels of superiority (totally pwnd, much better, better, same, worse, much worse, totally outclassed). After many games I'd hope that the system would be delivering 'same' players all the time, but the ability to recognise a large discrepancy in skill can help the initial setup as well as tracking large changes in behaviour (rapid improvement in play, returning from a long absence, etc). Would it deliver better matchmaking? yes. It would also be objective, but not justifiable. Without justification you'd have a hard time convincing others that the system is rigorous enough to provide a rank, but it might fly if rank and matchmaking were seperate beasts.
Another issue that a pure win / loss ratio overlooks are the biases inherent in the games played. If, for example, Terrans are heavily favoured in a TvZ matchup, then this should be reflected in the prediction of who is more likely to win. There are 3 different biases common to tournaments and ranked games, the player's playstyle (race choice and preference for certain strategies), the map bias (some maps are more suited to certains race / playstyles, or more well known than others) and the individual head-to-head bias (where someone just seems to have the wood over someone). In tournaments you also have a tournament bias where some players perform better or worse depending on the importance they place in a specific tournament (homeground advantage for international events). These biases are once again somewhat subjective, but could be deduced from large samples of games played. It is also a moving target as the biases could change through the metagame, direct patching, or sustained effort from players to eliminate weak points in their game.
Biases do not have to mean that the game needs fixing; biases can add strategic elements to play (zerg rush will always be faster, but weaker). It does mean that matchmaking needs to consider any biases if the goal is to ultimately produce epic, engaging games that are desirable for players and spectators. Placing a higher ranked player in a weaker position could still achieve this.
Wow, this went a lot longer than I meant and it feels like it needs a wrap-up. To me, matchmaking is about providing entertainment to the player and spectators. Players of close rank can LEAD to entertaining games, but I don't believe a rank based solely on wins and losses provides the most efficient method of matchmaking.