Improving Swiss Teams events
#1
Posted 2007-April-20, 08:42
This hypothesis was tested using a series of Monte Carlo simulations. A computer program generated 128 bridge teams with known strength. These teams competing against one in a Swiss Teams type event. At the conclusion of the event, the sample statistic the ranking produced by the Swiss Teams event - was compared with the population statistic (the objective/known ranking of the team strength). We consider event event formats in which the sample statistic closely mirrors the population statistic superior to formats in which the sample statistic deviates significantly from the population statistic.
Monte Carlo simulations can be used to test a variety of different hypotheses. For example, are tournaments with a large number of short rounds more accurate than tournaments with a small number of long rounds. (None too surprisingly, the answer depends on the fixed cost associated with the break between rounds) Alternatively, is there a relationship between the number of teams entering a tournament and the number of rounds necessary to accurately identify the winner. Our most striking result involved using a Strength of Schedule adjustment to the normal Swiss Team scoring system. We determined that a Strength of Schedule adjustment allows tournament organizers to significantly improve the efficiency of their events. Hypothetically, an event organizer could reduce the time required to stage an event without compromising the accuracy. Alternatively, an organizer could hold the length of an event constant and significantly improve the accuracy of the event.
Strength of Schedule adjustments can implemented in a variety of ways. For the purpose of this study, we used a very simple SoS adjustment.
1. Run a normal Swiss Teams event
2. Calculate the total number of Victory Points won by each team
3. Sum all of the Victory Points won by each team that team i played
against, excluding the head to head competition between team i and
team j.
4.The Team's final rank is determined by adding the Victory Points that
Team "i" won in head-to-head competition and some fraction of the total
VPs won by all the teams that team "i" competed against. (This fraction
is a function of the number of rounds in the tournament)
We certainly don't claim that the SoS adjustment just described is by an optimal implementation. However, even this very crude implementation has a dramatic impact on the accuracy of the event.
Consider the following tournament format:
* 128 teams competing in a Swiss format
* The event consists of "N" 20 board rounds
* The primary statistic used to measure the accuracy of the event is
the percentage chance that the strongest team will land in any of the
top eight places at the close of the event. (We used other metrics
including the Spearman rank coefficient and how many of the top eight
teams placed in the top eight slots. Results were consistent
across metrics)
With no SoS adjustment, tournament organizers need to run twelve 20 board rounds to have a 95% chance that the strongest team will place in any one of the top eight slots. If we add an SoS adjustment, tournament organizers can run nine 20 board rounds while still achieving a 94.9% chance that the strongest team will place in any of the top eight places. Tournament organizers can reduce the length of the tournament by 25% without impacting the integrity of the results. (In comparison, if the Tournament Organizers were to run an event with nine 20 board rounds without any SoS adjustment, the accuracy of the event would drop from 95% to 92.3%)
At this point in time, the primary value of this study is identify the fact that significant improvements can be made to the traditional Swiss Teams type format. Over time, we hope that it will be possible to make more concrete recommendations regarding the best implementation for an SoS correction as well as an executable that could be used to optimize events formats based on time constraints.
Steve Willner was responsible for the original insight that an SoS correction would have a impact the accuracy of the Swiss Team format. All of the coding and simulation work (read this as the "real" work) was done by Alex Ogan and Gerben Dirksen.
#2
Posted 2007-April-20, 09:16
What assumptions did you make on the distribution of strengths? (Without that, your figure of 95% by itself is meaningless, of course.)
#3
Posted 2007-April-20, 09:31
I have a question (Arend you too).
Wouldn't a smaller field combined with more variable skill levels and smaller rounds, like a sectional Sunday swiss, required a much stronger correction factor to be implemented so that it would both reward the strong teams' maintaining at or near "par" and the weaker teams that are "above par"?
#4
Posted 2007-April-20, 09:32
A couple of comments: my subjective experience suggests that there is one very significant factor, present in most swiss formats, not covered by your approach: randomness of the boards.
Take two equally skilled strong teams, who have drawn two equally inferior teams to play a 7 board match. One team has a match with 2 slams, 4 games, and difficult bidding and play problems. They clobber the opps by 56 imps. The other team, playing in an even more superior fashion, have to cope with a passout, 4 hands on which the auction goes 1N 3N on 29 hcp and so on... they eke out a victory by 6 imps.
I appreciate that over a large field with numerous matches, these issues fade to some degree, but in real life we rarely have huge fields outside of National events.
In practice, this is a self-correcting problem if it arises early in the event: the team that drew the flat boards will play teams with fewer VPs than the team that blitzed, and so, on average, will play inferior teams until it 'catches up' by blitzing these weaker teams. But your SoS approach undermines this catching up ability by discounting these wins against weaker teams. So an early flat match (or two) will substantially handicap a good team compared to its peers who have wild hands early.
The solution is, of course, to play duplicated boards but that is a logistical nightmare...given that entirely new boards must be put in play each round.
There is also a luck of the draw issue. Last week, on the Sunday Swiss, my team struggled early but finished strong: going into the last match we had a mathematical shot at winning, and ended 3rd. My strong suspicion is that we would have been demoted on the SOS analysis, because we never faced either of the teams that finished ahead of us.... nor did we play the 4th or 5th place teams.... .
While the SOS might afford a more reliable indicator of who was playing well that day, I can assure you that winning the 3rd highest number of VPs and then being awarded 5th place while never having a chance to play the teams that finished ahead of us while winning fewer VPs would rankle.... we would, I am sure, have felt that this wasn't fair... and that is nothing to the way we would have felt had our mathematical shot at winning it all come through....
Let's look at this in terms of a late-stage matchup. Three contending teams, all with the same number of VPs. Let's say they are leading.
Team 1 plays team 2 and battles to a draw. Team 3 gets to play a team several notches lower (in the late stages, teams often play other teams quite distant in the standings due to conflicts arising from earlier schedules).
If team 3 wins, but does so narrowly (perhaps due to the nature of the hands), it will end up 3rd, not first... because SoS factors reduce the impact of its win compared to the ties achieved by teams 1 and 2 when they played off.
Or consider that both teams 1 and 3 blitzed: now they are tied but team 1 wins the event on SoS factors: even tho the final match assignment was random, and team 3 played a perfect match. In both scenarios, team 3 would feel ripped off... not a situation that tournament promoters should encourage.
#5
Posted 2007-April-20, 09:56
mikeh, on Apr 20 2007, 04:32 PM, said:
All English Swiss events - even many locally organised ones - use duplicated boards.
Come on guys, join the 21st centruy
#6
Posted 2007-April-20, 10:01
cherdano, on Apr 20 2007, 10:16 AM, said:
What assumptions did you make on the distribution of strengths? (Without that, your figure of 95% by itself is meaningless, of course.)
Our view is that to be accepted by players, the scoring system must be transparant/simple. It's not clear that people would even accept something as simple as this -- I think that more complicated systems like Gerben's don't stand a chance.
We used normally distributed team strengths with standard deviation 1, with units of IMPs/Board.
#7
Posted 2007-April-20, 10:02
Or put it another way: most Swiss Teams events are played for fun. The really top teams events in any country are generally not Swiss, they are round-robin followed by a KO, round-robin in groups followed by a KO, straight KO, or double-elimination KO (or possibly have some form of repechage). People who play in Swiss Teams events are more interested in enjoying themselves, and winning masterpoints against fairly equal teams, than they are in having the best mathemtical chance of the best team winning. Anything that dilutes that pleasure - and I assure you that adjusting your VPs at the end of the event will dilute the pleasure - would be unpopular.
I preferred the idea of different match lengths.
#8
Posted 2007-April-20, 10:21
#9
Posted 2007-April-20, 10:22
FrancesHinden, on Apr 20 2007, 05:56 PM, said:
mikeh, on Apr 20 2007, 04:32 PM, said:
All English Swiss events - even many locally organised ones - use duplicated boards.
Come on guys, join the 21st centruy
All Norwegian Swiss events too. And all other events. Only at club-level you might come across non-duplicated boards. In fact about half of our clubs (I guess) use duplicated boards.
And that's not even 21st century Frances, it's late 20th century.
Harald
#10
Posted 2007-April-20, 10:24
FrancesHinden said:
Or put it another way: most Swiss Teams events are played for fun. The really top teams events in any country are generally not Swiss, they are round-robin followed by a KO, round-robin in groups followed by a KO, straight KO, or double-elimination KO (or possibly have some form of repechage). People who play in Swiss Teams events are more interested in enjoying themselves, and winning masterpoints against fairly equal teams, than they are in having the best mathemtical chance of the best team winning. Anything that dilutes that pleasure - and I assure you that adjusting your VPs at the end of the event will dilute the pleasure - would be unpopular.
Yep, I agree with Frances. It's interesting to look at "accuracy" from a mathematical point of view, but if I was actually going to play in an event I would prefer it to be scored by straight VPs.
(Though, SoS is a good tie-break for teams finishing on equal numbers of VPs. I think it may already be used for that purpose in some cases.)
#11
Posted 2007-April-20, 10:49
#13
Posted 2007-April-20, 11:47
Gerben42, on Apr 20 2007, 12:36 PM, said:
No, exactly what I said. Read the original post.
"We consider event formats in which the sample statistic [swiss team results] closely mirrors the population statistic [skill level or ability of the teams] superior to formats in which {this is not the case}."
The goal was a format in which the best teams win as often as possible. I am saying that I don't consider accuracy to be a superior format, in fact I consider too much accuracy much less appealing than the status quo.
I am not interested in accuracy as you state it either. The elements of luck (including regarding who you draw), randomness, and simplicity of the scoring system are all important. No one will play in something where they can't easily understand the scoring system.
It may be interesting to study, but going down this road is a huge mistake that would thankfully never happen. Anyone ever heard of the BCS?
#14
Posted 2007-April-20, 12:11
I like the ideas of duplicated boards in swiss, but security would need to be heightened.
#15
Posted 2007-April-20, 12:21
However, take an event like national team trials. There is a strong desire for the best team to win, in order to represent the country well and give them a chance at the Bermuda Bowl or Olympiad. It would be undesirable to have a highly random event used to select the national team, as the odds of inferior players "getting lucky" would be too high. On the other hand, simply selecting the team based on who some committee votes to be "best" is subject to a lot of arbitrariness as well, as players who are better known or better liked will often be selected (determining "how good" a particular pair or individual might be is extremely subjective in bridge, and it's easy to point to seemingly knowledgable individuals with wildly different opinions on this). In any case it creates the perception that no matter how good a pair might be, it's hard to break in because of the strong status quo in the selection process.
It seems desirable for a team trials type event to have a format where the team is selected by actually playing bridge (rather than by committee) while simultaneously minimizing the chance of an upset due to luck. This particular issue seems to crop up a lot for the US junior team trials, since (for whatever reason) these trials are held over only two days instead of a week. It seems that every cycle a new process is chosen for this selection.
As for swiss teams, while I understand Mike's point, wouldn't it be frustrating to be in first place with a round to go, having played virtually all the top teams and won, then play a tight match with the second place team ending in a draw, only to have the third place team pass you because they got an easy opponent in the last round (and never had to play any of the other teams in the top five)? I'd think this also would leave a bitter taste in the mouth of some competitors.
a.k.a. Appeal Without Merit
#16
Posted 2007-April-20, 12:25
jdonn, on Apr 20 2007, 06:47 PM, said:
errmm... only in the context of being the British Cohort Study, or the British Computer Society. I doubt either of those is what you meant.
#17
Posted 2007-April-20, 12:34
hrothgar, on Apr 20 2007, 03:42 PM, said:
* The event consists of "N" 20 board rounds
* The primary statistic used to measure the accuracy of the event is
the percentage chance that the strongest team will land in any of the
top eight places at the close of the event. (We used other metrics
including the Spearman rank coefficient and how many of the top eight
teams placed in the top eight slots. Results were consistent
across metrics)
Consider this tournament format:
*128 teams competing in an multiple teams event, scored as total IMPs
*The event consists 1 board rounds, 2 board rounds, or 3 board rounds organised so that it's as close to an all-play-all as you can manage given constraints on the number of boards (you could even play a combination of 1-board, 2-board and 3-board rounds to make sure it's an all-play-all; the teams you play more or less boards against picked at random)
How do the results on this format do compared to a Swiss?
The thing is, I'm not convinced by the 'intuitive' feeling that a Swiss is inherently more accurate. Yes, you get the teams in contention playing more boards against each other (which is good), but you also waste a load of boards during mismatches. I've played Swiss events where I've won 8-board matches by 60+ imps - I don't think we gained any useful information by playing so many boards against that one particular team rather than playing 2 boards against each of 4 teams.
I've seen it claimed that it's better to play an n-round 8-board match Swiss (say) than an all-play-all 2-boards a round. I don't know if that's true or not, and I'd be interested in finding out. I suspect that it may depend a bit on the distribution of strengths of teams present. With a very low variance, the all-play-all I'm sure is better.
#18
Posted 2007-April-20, 12:35
jdonn, on Apr 20 2007, 08:47 PM, said:
Gerben42, on Apr 20 2007, 12:36 PM, said:
No, exactly what I said. Read the original post.
"We consider event formats in which the sample statistic [swiss team results] closely mirrors the population statistic [skill level or ability of the teams] superior to formats in which {this is not the case}."
The goal was a format in which the best teams win as often as possible. I am saying that I don't consider accuracy to be a superior format, in fact I consider too much accuracy much less appealing than the status quo.
I am not interested in accuracy as you state it either. The elements of luck (including regarding who you draw), randomness, and simplicity of the scoring system are all important. No one will play in something where they can't easily understand the scoring system.
It may be interesting to study, but going down this road is a huge mistake that would thankfully never happen. Anyone ever heard of the BCS?
Few quick comments here
1. I think that most people would agree that tournaments need to contain elements of both luck and skill. If an outcome is deterministic and pre-ordained then there is no reason to hold a contest. Correspondingly, if there is no element of skill involved we might as well simply cut cards to determine a winner.
2. Intelligent people can differ regarding where one should draw the line between luck and skill. However, I would argue that that regardless of where one chooses to draw the line its desirable to be able to accurately describe one's design choice. In some ways, the value of this experiment has less to do with recommending any one specific format than being able to describe the various trade-offs that are inherent in the choice of conditions of contest. If one doesn't have appropriate vocabulary and methodology, one is reduced to blind platitudes about tradition....
3. From my own perspective, I prefer a tournament format that favors skill over luck. I think that its important to note the following: Consider some of the statistics generated here: If we run a tournament with twelve 20 board rounds, there is still a 5% chance that strongest team won't place higher than 9th. This tournament requires close to 4 days to run, however, the rub of the Green still plays an enormous role. The figures for a more traditional tournament with eight seven board rounds are horrific.
4. Simplicity of the scoring system was an explicit design criteria. I suspect that we could have (easily) arrived at some much more accurate SoS adjustments at the expense of adding significant complexity. The metric that we suggest is extremely simple.
#19
Posted 2007-April-20, 12:45
hrothgar, on Apr 20 2007, 01:35 PM, said:
It's simple to you. It's simple to me. Do you think it's simple to my grandmother, or even my mother?
If you believe so, I'll let you be the one to try and make her understand why she scored the most victory points and didn't win.
#20
Posted 2007-April-20, 12:46
FrancesHinden, on Apr 20 2007, 09:34 PM, said:
hrothgar, on Apr 20 2007, 03:42 PM, said:
* The event consists of "N" 20 board rounds
* The primary statistic used to measure the accuracy of the event is
the percentage chance that the strongest team will land in any of the
top eight places at the close of the event. (We used other metrics
including the Spearman rank coefficient and how many of the top eight
teams placed in the top eight slots. Results were consistent
across metrics)
Consider this tournament format:
*128 teams competing in an multiple teams event, scored as total IMPs
*The event consists 1 board rounds, 2 board rounds, or 3 board rounds organised so that it's as close to an all-play-all as you can manage given constraints on the number of boards (you could even play a combination of 1-board, 2-board and 3-board rounds to make sure it's an all-play-all; the teams you play more or less boards against picked at random)
How do the results on this format do compared to a Swiss?
The thing is, I'm not convinced by the 'intuitive' feeling that a Swiss is inherently more accurate. Yes, you get the teams in contention playing more boards against each other (which is good), but you also waste a load of boards during mismatches. I've played Swiss events where I've won 8-board matches by 60+ imps - I don't think we gained any useful information by playing so many boards against that one particular team rather than playing 2 boards against each of 4 teams.
I've seen it claimed that it's better to play an n-round 8-board match Swiss (say) than an all-play-all 2-boards a round. I don't know if that's true or not, and I'd be interested in finding out. I suspect that it may depend a bit on the distribution of strengths of teams present. With a very low variance, the all-play-all I'm sure is better.
Hi Frances:
I was wondering much the same thing (Its possible that my inspiration was a bit different).
One of the basic results that arose quite early in the study had to do with how make the most efficient use of a fixed amount of time. We found that tournaments that used a relatively large number of short rounds produced more accurate results than tournaments with relatively large small number of large rounds. The main countervailing force was the fixed cost associated with round breaks. The more time that people spend stretching their legs/drinking/smoking/peeing/what have you between rounds, the few rounds you want to have.
It occurred to me that a formal movement might be a better way to handle the whole situation. I suspect that a BAM type movement might be structurally more efficient. Its easier to force people through some kind of structured movement than run a barometer...
However, its unclear whether the advantages of having people play all of the other pairs outweigh the the (considerable) advantage that a Swiss / Danish teams event uses a Barometer type system.

Help
