BBO Discussion Forums: A Tale of Two Bootstraps OR How to significantly improve ACBL masterpoint allocations in a way that most everyone will hate… - BBO Discussion Forums

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

A Tale of Two Bootstraps OR How to significantly improve ACBL masterpoint allocations in a way that most everyone will hate…

#1 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,372
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2018-August-09, 08:31

Team Gawrys won the 2018 Spingold by a convincing 33 IMPS over 60 boards. While I am happy to join everyone else in congratulating Gawrys on their victory, I cannot help but note that I am only about 30% sure that they “should” have won.

There’s a lot of luck intrinsic to the game of bridge. Tournament organizers do as much as they can to minimize the role that luck plays in the game through creative ideas like “Duplicate” or “team” matches. Even so, I think that everyone reading this recognizes that some days the bridge gods smile on you. Others they don’t.

Here’s a practical example: Suppose that you are playing a strong cub system. You may very well expect that your board results will be better when you open your limited major suit openings than when you are forced to open a strong club. If you’re lucky and the card gods deal you a lot of 1M openings you might expect to score better than normal. Alternatively, if you get dealt way more strong club openings then you might expect your score to suffer.

For kicks and giggles, I decided to use a statistical technique called a bootstrap to analyze a couple matches from the 2018 Spingold. I chose the 60 board final that Gawrys played against Rosenthal because this had a fairly large spread between the two teams (Gawrys won by over .5 IMPs per board). I also chose the semi-final match between Rosenthal and Gupta which was very close. (Rosenthal won by 2 IMPs).

I used a bootstrap to run one million virtual matches that the same statistical properties as the two matches in question and analyzed the results. (I’ll discuss bootstraps at the close of this posting). I wanted to count the number of times each side “won” the match, the average margin of victory, as well as the standard deviation. For anyone who cares, the bootstrap took about 5 seconds to run on my MAC.

Gawrys versus Rosenthal

Gawrys win: 761,673 matches
Tie: 6,882 matches
Rosenthal win: 231,445
Mean result: +31.94997 Gawrys (Vugraph records seem to be off by 1 IMP in R1)
Standard deviation: 44.1962 IMPs

<Big take away, even when you have a “convincing” win as the one that we saw in the finals, we’d expect Rosenthal to win about 25% of the time and tie about 7%>

Gupta versus Rosenthal

Rosenthal win: 513,564
Tie: 7,646
Gupta Win: 478,790
Mean Result: 2.028112
Standard deviation: 50.98339

<Big takeaway, for all intents and purposes, that Rosenthal “win” was the result of a coin toss>

FWIW, here are a couple conclusions that I draw from this analysis

1. I’d argue that masterpoint allocations should be weighted by our certainty that the correct team won the match. For example, in the case of the Spingold final, Gawrys should receive ~77% of the 1st place award and ~23% of the second place award. Rosenthal should get the converse.
2. The Gupta / Rosenthal match did not run long enough. For KO type formats, we should insist on a statistically significant margin of victory.

Background information on the Bootstrap.

A bootstrap is a statistical technique that uses sampling with replacement to construct new datasets that are not identical to original, however, the expected moments are the same.

In this example, I entered the board results for the Rosenthal – Gupta match. I sampled with replace 60 times from the set of board results. I created a new dataset that consists of nothing but board results from the original match and had the same length. I repeated this process a million times and calculated summary statistics.


#Enter match results
foo = c(-7,0,0,6,-1,-3,-1,8,12,4,2,-5,-13,1,-7,0,-13,
-13,-1,0,6,1,-5,13,8,-2,5,8,12,-1,0,7,1,-9,0,0,
-1,-13,0,12,-13,0,0,0,13,-5,1,3,0,0,-1,-1,0,2,-15,10,
0,-2,-1,0)

Foo2 = c(11,0,-6,7,2,1,1,-3,0,7,0,12,-1,0,0,10,-1,-7,4,11,
10,0,-11,0,0,4,-7,-13,0,0,0,0,0,7,-12,0,5,-1,13,
-10,-3,-5,5,-7,0,0,-4,10,-3,0,0,0,4,-5,0,0,0,0,0,7)


#Check data entry
cumsum(foo)

#bootstrap

my_data = matrix(0,1000000)

for (i in 1:1000000)

{

boot = sample(foo, size = length(foo), replace = TRUE)
bar = cumsum(boot)

my_data[i] = bar[60]

}

mean(my_data)
sd(my_data)
length(my_data[my_data > 0])
length(my_data[my_data == 0])
length(my_data[my_data < 0])
Alderaan delenda est
0

#2 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 21,398
  • Joined: 2004-August-21
  • Gender:Male

Posted 2018-August-09, 09:12

View Posthrothgar, on 2018-August-09, 08:31, said:

1. I’d argue that masterpoint allocations should be weighted by our certainty that the correct team won the match.

Is this a variation on handicapping? It doesn't alter the raw scores like real handicapping does, but it alters the payoff based on similar criteria.

Is there any game or sport that tries to do something like that in high-level play? The closest thing I can think of is the payoffs in horse racing, and the "spread" in (American) football. But these are only used in betting, not awards to the participants. Statistical analysis is a natural fit there (indeed, much of the early research on statistics and probability was done by gamblers).

#3 User is online   Cyberyeti 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 13,898
  • Joined: 2009-July-13
  • Location:England

Posted 2018-August-09, 11:43

You could do something much simpler than this which I suspect would achieve similar results without the statistical rigour, and scale the masterpoint awards by number of IMPs the match was won by.

This would at least be understandable to the general population, and mean that if you won, you had a chance of making up the masterpoints you lost due to this in the next round even if you lost the next match.

While I can't think of a sport that does this, darts and some other sports if the game is really close play some more until somebody is 2 clear and only then have the sudden death leg, which would be the equivalent of saying that the margin was not statistically significant.
0

#4 User is offline   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,306
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted 2018-August-09, 12:53

This seems to assume that the results of various boards is independent. But we all know that's not the case -- teams that are behind tend to take higher variance actions in order to catch up, and teams with a big lead may try to play a safer game.

In any case, it seems like you're arguing that instead of awarding X points for a win, we should award X*"apparent win probability" with the rest going to the loser. But over a long time where you play hundreds or thousands of events, wouldn't your expected point total come to the same thing (but with a lot less complexity)?
Adam W. Meyerson
a.k.a. Appeal Without Merit
0

#5 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,372
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2018-August-09, 14:21

View Postawm, on 2018-August-09, 12:53, said:

This seems to assume that the results of various boards is independent. But we all know that's not the case -- teams that are behind tend to take higher variance actions in order to catch up, and teams with a big lead may try to play a safer game.

In any case, it seems like you're arguing that instead of awarding X points for a win, we should award X*"apparent win probability" with the rest going to the loser. But over a long time where you play hundreds or thousands of events, wouldn't your expected point total come to the same thing (but with a lot less complexity)?


Adam, lot of good points here. Let me (try to) address them...

1. There are ways to adjust the methods if you have a strong belief that the data is non stationary.

For example, I could construct a separate bootstrap for each of the 4 segments that comprise the match, and then add them together to regenerate the match itself.
This would adjust for the higher variance results in the later rounds.

(I considered doing this to begin with and would recommend doing so it this is done for real. Its a bit more complicated than I wanted to go when introducing the idea. Perhaps I was mistaken.

2. I agree that over very large numbers of events, things even out.

Its unclear whether teams advance to the Spingold happens frequently enough for this to be relied on
Alderaan delenda est
0

#6 User is offline   steve2005 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,148
  • Joined: 2010-April-22
  • Gender:Male
  • Location:Hamilton, Canada
  • Interests:Bridge duh!

Posted 2018-August-09, 16:12

Sounds like a lot of work that wont make much difference. Most of the time.
And who is going to audit the boostrap calculations

Sarcasm is a state of mind
0

#7 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 21,398
  • Joined: 2004-August-21
  • Gender:Male

Posted 2018-August-10, 09:33

Similar to what Adam pointed out, it seems like there can be all sorts of feedback effects from something like this, and maybe some gamesmanship. E.g. if you're playing a closely-matched team, so you're not going to get many points if you win, are you going to try as hard?


But I question the premise of this. Suppose I form a team of unknowns and we win the Spingold. Do we really deserve more masterpoints than anyone else, just because this result was so unlikely? Beating Meckwell takes the same amount of work whether you're already champions or not.

Or maybe I'm understanding this wrong. Is the claim that my team couldn't have won through skill, because our history shows that we're not that skilled, so it must have been a series of lucky results? And we don't deserve as many points just for getting lucky. If that's the case, why bother entering? We often hear about "Cinderella" teams that make it further than expected in major events; it seems like this premise extends the metaphor, implying that they really do have a Fairy Godmother responsible for the success, and the points should go to her instead of the team. :)

#8 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,372
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2018-August-10, 11:49

View Postbarmar, on 2018-August-10, 09:33, said:


But I question the premise of this. Suppose I form a team of unknowns and we win the Spingold. Do we really deserve more masterpoints than anyone else, just because this result was so unlikely? Beating Meckwell takes the same amount of work whether you're already champions or not.



You fundamentally misunderstand what I am suggesting.
Please try again
Alderaan delenda est
0

#9 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,372
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2018-August-10, 11:51

View Poststeve2005, on 2018-August-09, 16:12, said:

And who is going to audit the boostrap calculations


Anyone who wants...

Look at the bleeding code.
Anyone could write this and its not like the information is secret...
Alderaan delenda est
0

#10 User is online   PrecisionL 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 941
  • Joined: 2004-March-25
  • Gender:Male
  • Location:Knoxville, TN, USA
  • Interests:Diamond LM (6700+ MP)
    God
    Family
    Counseling
    Bridge

Posted 2018-August-11, 09:09

View Posthrothgar, on 2018-August-09, 08:31, said:

Team Gawrys won the 2018 Spingold by a convincing 33 IMPS over 60 boards.

Here's a practical example: Suppose that you are playing a strong cub system. You may very well expect that your board results will be better when you open your limited major suit openings than when you are forced to open a strong club. If you're lucky and the card gods deal you a lot of 1M openings you might expect to score better than normal. Alternatively, if you get dealt way more strong club openings then you might expect your score to suffer.



My expectation playing a strong club system is to expect better board results no matter what we open and to expect worst results when we play defense a lot.
Ultra Relay: see Daniel's web page: https://bridgewithda...19/07/Ultra.pdf
C3: Copious Canape Club is still my favorite system. (Ultra upgraded, PM for notes)

Santa Fe Precision published 8/19. TOP3 published 11/20. Magic experiment (Science Modernized) with Lenzo. 2020: Jan Eric Larsson's Cottontail . 2020. BFUN (Bridge For the UNbalanced) 2021: Weiss Simplified (Canape & Relay). 2022: Canary Modernized, 2023-4: KOK Canape.
0

#11 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 21,398
  • Joined: 2004-August-21
  • Gender:Male

Posted 2018-August-12, 20:46

View Posthrothgar, on 2018-August-10, 11:49, said:

You fundamentally misunderstand what I am suggesting.
Please try again

I think that's quite likely. Maybe you could dumb it down for those of us not well versed in statistics.

#12 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,372
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2018-August-13, 01:41

View Postbarmar, on 2018-August-12, 20:46, said:

I think that's quite likely. Maybe you could dumb it down for those of us not well versed in statistics.


Assume for the moment that we're playing a KO.

  • Treat each board as if it were a sample drawn from a distribution.
  • Each time you play a round, use the set of scores from that round to assess the statistical certainty that you have correctly identified the winner*.
  • Weight the allocation of MP between the two teams based on the degree of certainty


This has nothing to with who is an underdog or who is expected to win a priori.


* I suspect that some of your are already saying "We know who the winner is. Its whomever was ahead at the end of board #60."

Here's my reply:

Think of all those matches where the lead was swinging back and forth erratically leading up to the final board.
Suddenly, board 60 is complete and Team Foo is crowned as "THE VICTOR".
However, if the tournament was only running for 59 board Team Bar would have won and if we let things run for 62 boards its highly likely that Team Bar would be back in the lead.

(The key problem here is that the event isn't being run for long enough to conclusively identify a winner)

As such, I am suggesting that the masterpoints that get awarded for this match should also be divided

We should add together the number of masterpoints that foo would receive with the number of masterpoints that bar would receive and assign a convex combination based on our degree of certainty regarding the result.
Alderaan delenda est
0

#13 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,372
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2018-August-13, 01:43

View PostPrecisionL, on 2018-August-11, 09:09, said:

My expectation playing a strong club system is to expect better board results no matter what we open and to expect worst results when we play defense a lot.


Even if you expect all your openings to have a positive expected value, do you really expect to do every bit as well after a strong club opening as after a 1M opening?
Alderaan delenda est
0

#14 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 21,398
  • Joined: 2004-August-21
  • Gender:Male

Posted 2018-August-14, 09:02

View Posthrothgar, on 2018-August-13, 01:41, said:

(The key problem here is that the event isn't being run for long enough to conclusively identify a winner)

If the teams are really evenly matched, is there really any length that will provide a decisive winner? One could easily imagine the lead swinging back and forth forever.

Lots of competitions end with photo finishes. Basketball games sometimes end with a player sinking the winning basket at the buzzer; if the game were a minute or 2 longer, the other team could have overcome this. Football games have been decided by a hail Mary pass at the last minute. Games end when they end, and whoever is in the lead then is the winner. No one considers these close wins to be lesser wins.

You're basically indicting all win-loss scoring as not necessarily proving which contestant is "better" -- close wins are more likely to be due to luck. In bridge we use Victory Points in Swiss Teams to make the margin of victory significant (although doing this with short matches is probably statistically wrong, since one swing board can have unwarranted significance).

#15 User is online   PrecisionL 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 941
  • Joined: 2004-March-25
  • Gender:Male
  • Location:Knoxville, TN, USA
  • Interests:Diamond LM (6700+ MP)
    God
    Family
    Counseling
    Bridge

Posted 2019-June-11, 19:07

View Posthrothgar, on 2018-August-13, 01:43, said:

Even if you expect all your openings to have a positive expected value, do you really expect to do every bit as well after a strong club opening as after a 1M opening?


YES.
Ultra Relay: see Daniel's web page: https://bridgewithda...19/07/Ultra.pdf
C3: Copious Canape Club is still my favorite system. (Ultra upgraded, PM for notes)

Santa Fe Precision published 8/19. TOP3 published 11/20. Magic experiment (Science Modernized) with Lenzo. 2020: Jan Eric Larsson's Cottontail . 2020. BFUN (Bridge For the UNbalanced) 2021: Weiss Simplified (Canape & Relay). 2022: Canary Modernized, 2023-4: KOK Canape.
0

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users