BBO Discussion Forums: Improving Hand Evaluation Part 3 - BBO Discussion Forums

Jump to content

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

Improving Hand Evaluation Part 3 simple yet accurate

#21 User is offline   mikestar 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 913
  • Joined: 2003-August-18
  • Location:California, USA

Posted 2004-June-04, 18:02

Zar,

If I read your equations correctly, you are miscounting TSP points. Tysen counts 1-3-5 for shortness and 1 for each card over 4 in any suit,

For example, you have TSP counting a 5-5-3-0 shape as 5 while Tysen counts it as 7.

If I am misreading your equations please ignore this, but I think I am correct. I really doubt that a method that uses the same high card count as Zar and never differs more than 1 point in its distribution count (after normalizing the 8 point difference in the 4-3-3-3 valuation) will be that radically off--in particular it has fewer discrepancies from Zar than Goren 1-3-5 does, yet is rated worse.
0

#22 User is offline   tysen2k 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 406
  • Joined: 2004-March-25

Posted 2004-June-04, 18:21

Zar, on Jun 4 2004, 05:02 PM, said:

if (  TSN    // HCP + CTRL
+ 2*( max( 0,  (L[0][fitCol] +  L[1][fitCol] -8) )  ) // FIT points
+ dN123  + cN123 + dS123  +  cS123  // 1-3-5 for N and S
> 53 )  TSNfit++;  // check for Grand

I don't know how to interpret this either. Don't forget about the 1 point for having 2+ honors in the same suit. That's usually about 4 points on most game hands and even more for slammish hands.

Once again Zar's tests are really only testing aggressiveness, not accuracy. If I said to bid a grand every time I have 0+ points, I'd score perfect on Zar's test.

If you're forgetting the 2+ honors rule, no wonder the TSP hands are falling short of slam so often.
A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.
0

#23 User is offline   Zar 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 153
  • Joined: 2004-April-03

Posted 2004-June-04, 20:50

*** mikestar wrote: "and 1 for each card over 4 in any suit ...
<

You are correct - I missed this one, so here is the new calc:

if ( TSN // HCP + CTRL
+ 2*( max( 0, L[0][fitCol] + L[1][fitCol] -8) ) // FIT points
+ max( 0, getAbcd("N", "a") -4) // Karpin Points a N
+ max( 0, getAbcd("N", "b") -4) // Karpin Points b N
+ max( 0, getAbcd("S", "a") -4) // Karpin Points a S
+ max( 0, getAbcd("S", "b") -4) // Karpin Points b S
+ dN123 + cN123 + dS123 + cS123 // 1-3-5 for N and S
> 53 ) TSNfit++; // check for Grand

and TSN indeed went above Goren 5-3-1 as you predicted due to the HCP + CTRL.

================Overall Results ============================

GOREN 3-2-1 ( HCP+3-2-1> 36 ) got 1427 contracts
The WTC ( number of tricks > 12) got 1543 contracts
GOREN 5-3-1 ( HCP+5-3-1> 36 ) got 2913 contracts
Fit TSN Points ( fit points >53) got 3616 contracts
Basic Zar Points (no fit points>66) got 3753 contracts
Fit +3 Zar Points(+3 extra trmp>66) got 5729 contracts

So still this combination of HCP + CTRL + FIT + Karpin + 1-3-5 is worse than BOTH the basic Zar Points and the Fit Zar Points.

How come we cross-post "magic" results with no explanation showing:

HCP
HCP+321
HCP+531
Zar
BUMRAP+321
BUMRAP+531
TSP
Binky


What kind of "calculation" was made to "suddenly" put the combo-method WAY above when it manifests 3600 against 5700 on the Standard GIB boards?

And the "score" is 0.21 vs. 0.8, almost 3 TIMES better when it is almost 2 times worse? What's the "magic"?

ZAR
0

#24 User is offline   inquiry 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 14,566
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Amelia Island, FL
  • Interests:Bridge, what else?

Posted 2004-June-04, 21:10

Hi Zar,

The similarities between Zar and TSP in base counting is very close. For instance, AKQJ scale is identical. And if you look at zar points for distribution versus TSP points of distribution, they are essentially identical with a subtraction for the base count of 8 points for 4333 distribution using ZAR scale. The majority of the distributions seem to come up with 9 points different, but there are plenty of 8 point differences too. Some of the wilder distributions (like 13=0=0=0), the difference gets smaller.

There is other similarities, like 5 points (either system) per level. So if you stop and think about it, ZAR's 52 points/game = TSP's 34 for game. Simply subtract the 9 point base from each parter hand (52-9-9)=34. So, it seems from an intial count stand point, TSP and ZAR can be converted between each other by adding or subtracting 9 points from each hand. It is not that simple, because TSP addes in +1 for each suti with two or more honors and subtracts big points for singleton honors (I think this is one of the flaws).

Let's examine four hands from Zar's web-document

Page 6 Axxxx Kxx KJx Ax Zar=26, TSP = 18 (count DKJx as 6 pts) 26-9 = 17, close
Page 7 Qx AKxxx Jxxxx x Zar=27, TSP = 20 (count AKxxx as 12pts), 27-9 = 18, off by 2
Page 8 AQxx Jx Axxx xxx Zar = 25, TSP = 17, 25 - 9 = 16
Page 11 KJxxx AKx xx Txx Zar = 27, TSP = 17, 27 - 9 = 16

As you can see, the "point" for combined honors tend to make the distributional correction off by a point of two.

But let me show you where I think the flaw in TSP is, this hand from Challenge the Champs, August 1980 will do nicely for this purpose.


West by TSP = 14 hcp, 5 cp, 1 pt for S-honors, 2pts for 6 spades, 2pts for two doubletons = 24 pts.
East by TSP = 10 chp, 3 cp, 1pt for H=honors, 3pt for seven hearts, 5 pts for void, 1pt for short spade = 23 pts. Once hearts are raised, EAST gets 4 more points for long hearts (more than 8), bringing his total to 27. 27+24=51, enough for 6H.

By ZAR, West is 32, EAST is 31. Once heart fit is found, west gets plus 1 for heart J, and east gets plus 6 for void and two extra hearts. 32+31+1+6 = 70, more than enough for grand slam.

So if we subtract 18 from 70 we bet 52, which is basically what TSP point showed. How come TSP is not as accurate as ZAR here? It has to deal with the way fit points are calculated I think. Zar got 7 fit points, TSP got 4 fit points. If we add 3 more pointst to the TSP score (51+3 = 54), it would just scrape together enough for the grand. It also seems to me some times TSP adds fit points for more than 8 card fit (at two points each), that do not contribute to the trick taking power at all.

So I think two of the flaws are obvious, (discount singleton honors outright, incorrect fit adjustments). I will look to see if I can find the third.

Ben
--Ben--

#25 User is offline   MickyB 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,290
  • Joined: 2004-May-03
  • Gender:Male
  • Location:London, England

Posted 2004-June-05, 03:34

Game is 39 points in TSP, not 34, that gap of 5 points is closed up by the combined honours adjustment.

What's wrong with subtracting points for singleton honours? It seems reasonable to me.

About that hand: If you switch the minor suit holdings around, then you can't go past the 5 level. I think I am correct in saying that when calculating their respective evaluation scales, Tysen assumes that the hands are bid to the level recommended by their point count, regardless of two 1st round controls/2 quick losers in a suit; And Zar assumes that you manage to check for controls and stay out of slam without controls. Thus you would expect ZAR to be more aggressive in the slam zone - is that correct?

They then both carry out their simulations based on the same idea. So you would expect Tysen to show that TSP is better, and Zar to show that ZAR is better!

Assuming I haven't misunderstood something so far, the question arises - which of these methods is more useful at the table, the one that tells you when you are likely to have slam as long as you have the controls required, or the one that tells you when you are likely to have slam regardless of controls? Sometimes you won't be able to check for controls. Sometimes you will go down at the 5 level after a slam investigation. (Zar, if the ZAR points tell you to bid slam, but you are missing two fast tricks, do you assume the hand is then played in 5M or 4M?) TSP, on the other hand, will sometimes underbid on hands where you have got the controls, because of the hands used in creating the evaluation method that didn't have controls. So you would expect the optimum to be somewhere between the two methods.

Sorry if I've got something wrong early on and continued to base the rest of this on something completely wrong :unsure:
0

#26 User is offline   MickyB 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,290
  • Joined: 2004-May-03
  • Gender:Male
  • Location:London, England

Posted 2004-June-05, 04:51

Zar, on Jun 4 2004, 09:50 PM, said:

You are correct - I missed this one, so here is the new calc:

if ( TSN // HCP + CTRL
+ 2*( max( 0, L[0][fitCol] + L[1][fitCol] -8) ) // FIT points
+ max( 0, getAbcd("N", "a") -4) // Karpin Points a N
+ max( 0, getAbcd("N", "b") -4) // Karpin Points b N
+ max( 0, getAbcd("S", "a") -4) // Karpin Points a S
+ max( 0, getAbcd("S", "b") -4) // Karpin Points b S
+ dN123 + cN123 + dS123 + cS123 // 1-3-5 for N and S
> 53 ) TSNfit++; // check for Grand

---------------------------------------------------

How come we cross-post "magic" results with no explanation showing:

HCP
HCP+321
HCP+531
Zar
BUMRAP+321
BUMRAP+531
TSP
Binky


What kind of "calculation" was made to "suddenly" put the combo-method WAY above when it manifests 3600 against 5700 on the Standard GIB boards?

And the "score" is 0.21 vs. 0.8, almost 3 TIMES better when it is almost 2 times worse? What's the "magic"?

ZAR

Could you explain that calculation please? Does 'dN123+cN123' really equate to 1-3-5 evaluation? Have you included TSP's addition of one point for having two honours in a suit?

Tysen's results did have an explanation. He said:

ERROR is the average # of tricks there is in difference between how many tricks we think we can take and how many we actually take.

SCORE is an estimation of the IMPs/board we expect to gain against a team that uses a simple HCP+321 evaluation method. It’s a measure of how much payoff there is for using a better evaluation system.

What has this got to do with magic? What was sudden about it? It is his extension of BUMRAP+531, which he has always claimed to be superior to Zar. You are comparing Zar and TSP using different methods, and his method is more sound. Please check that your calculation of TSP is correct, then rerun your simulation on all of GIB's boards, seeing how many games are correctly bid, how many are missed, how many games are correctly stayed out of and how many part-score hands are overbid.
0

#27 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,396
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2004-June-05, 05:54

Zar, on Jun 5 2004, 05:50 AM, said:

*** mikestar wrote: "and 1 for each card over 4 in any suit ...
<

You are correct - I missed this one, so here is the new calc:

if ( TSN // HCP + CTRL
+ 2*( max( 0, L[0][fitCol] + L[1][fitCol] -8) ) // FIT points
+ max( 0, getAbcd("N", "a") -4) // Karpin Points a N
+ max( 0, getAbcd("N", "b") -4) // Karpin Points b N
+ max( 0, getAbcd("S", "a") -4) // Karpin Points a S
+ max( 0, getAbcd("S", "b") -4) // Karpin Points b S
+ dN123 + cN123 + dS123 + cS123 // 1-3-5 for N and S
> 53 ) TSNfit++; // check for Grand

and TSN indeed went above Goren 5-3-1 as you predicted due to the HCP + CTRL.

================Overall Results ============================

GOREN 3-2-1 ( HCP+3-2-1> 36 ) got 1427 contracts
The WTC ( number of tricks > 12) got 1543 contracts
GOREN 5-3-1 ( HCP+5-3-1> 36 ) got 2913 contracts
Fit TSN Points ( fit points >53) got 3616 contracts
Basic Zar Points (no fit points>66) got 3753 contracts
Fit +3 Zar Points(+3 extra trmp>66) got 5729 contracts

So still this combination of HCP + CTRL + FIT + Karpin + 1-3-5 is worse than BOTH the basic Zar Points and the Fit Zar Points.

How come we cross-post "magic" results with no explanation showing:

HCP
HCP+321
HCP+531
Zar
BUMRAP+321
BUMRAP+531
TSP
Binky


What kind of "calculation" was made to "suddenly" put the combo-method WAY above when it manifests 3600 against 5700 on the Standard GIB boards?

And the "score" is 0.21 vs. 0.8, almost 3 TIMES better when it is almost 2 times worse? What's the "magic"?

ZAR

Zar, the "magic" is nothing more than basic statistics.

I very much admire your enthusiasm for your point count method and all of the effort that you are making to promote it. However, to perfectly blunt, I have enormous difficultly taking your work credibly because of your repeated failures to apply or apparantly even understand basic statistical analysis.

I strongly suggest that spend some time learning how Tysen is measuring the accuracy of hand evaluation methods.

Its useless to argue about the relative accuracy of hand evaluation systems until we are able to agree on how this should be measured.
Alderaan delenda est
0

#28 User is offline   inquiry 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 14,566
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Amelia Island, FL
  • Interests:Bridge, what else?

Posted 2004-June-05, 06:53

To Mike,

Neither ZAR nor Tysen correct their data for being off two quick tricks. Having looked at ZAR's hands he is "point-and-shoot", if the total is so many zar points, then he assumes the contract is at that level. Same for Tysen. This is easy enough to program for them guess. Real world, however, we will BID THE HANDS, and we will evaluate if slam is a good idea based not only upon "point count" (whatever point count system we use), but also quick losers. Since i am not a computer programmer, I have to look at the hands, and one thing I usually see is if i am off two cashable aces, and if so, I assume none-of-the systems would end up in grand slam or small slam (subtract the nubmer of aces the opponents hold from 8, and don't bid beyond that level... :-) )

As far as discounting singleton honors, let me paraphrase ZAR, which of the following hands would rather have?

A) xxxxx xxxx K AQJ, or

:unsure: Kxxxx AQJx x xxx

You would probably answer B, but there are plenty of hands your partner could hold, where hand B would be worthless, and hand A would be golden, for instance, which hand above would you like to have if your partners hand was

Partner A void AQxxxx Kxxxxx

To Richard,

The "magic" here is that tysen presents a lot of statistics without presenting the hands. He did publish a small subset of hands on one web page. While I am more than willing to believe that "ERROR" data, I find the "score" component, to be frank, totally unbelievable. This is based upon both my own, albeit, limited research looking manually at a number of hands that are probably in the few hundreds (including for instance of this year's cavendish hands, I will need to post more of those), and Zar's on data published on his website. The difference between ZAR's data and Tysen's is that Zar's is publically available, anyone can look at it, anyone can confirm Zar's conclusions (well anyone who can program computers, or with a lot of time on their hands to wade through them manually).

If one stops and thinks about the GIB database for even a second, and compared Goren to Zar, you will also see that ZAR is much better than Goren at imps. In the data Zar just posted, his method bids 36% more games, and 415% more grand slams than Goren. The slam level was also much better, but I don't have the numbers in front of me.

But here is what ZAR means by "magic". His database of hands is publically available,and his evaluation criteria is shown. I will admit that tysen has published a fraction of his hands, and the data he did publish shows he is able to analyze the hands in a manner that he says he is for points and tricks. But the "score" part is not there. I think this what ZAR means as magic, no set of hands. No doubt Zar would be "happy" to run his metrics on Tysen's database if it was available. One knows that tysen could run his on Zar's if he wanted to, because Zar's in available. But the latest evalation method of Tysen now approaches Zar's in many ways (he counts same for controls, hcp, he is counting both long and short suits, etc) so there is getting to be less difference between them.

Ben
--Ben--

#29 User is offline   mikestar 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 913
  • Joined: 2003-August-18
  • Location:California, USA

Posted 2004-June-05, 09:45

A suggested methodology for evaluting counting methods.

1. Use an agreed upon database--GIB seems like a good choice.

2. For each hand determine the number of tricks double dummy in the optimum denomination.

3. For each hand determine the point count for the hand and which contract would be bid based on the method's target counts.

4. The hands are divided into classes: partscore, game small slam, grand slam. These should be subdivided into suit and NT classes for separate statistics.

5. Count a hand as a success for the method if the it predicts the proper level, count it as a failure if it does not. Thus a hand on which 12 tricks are taken must be predicted as a small slam--a prediction of partscore, game or grand slam is a failure. (This is Zar's method, I believe).

6. Also translate the point count into a predicted number of tricks for the hand and compute the error--the difference between this prediction and the actual number of tricks for the hand. (This is Tysen's method, I'm quite certain).


Now we have a comparision than can be given a fair degree of trust.

A question for Zar:

You display how many making grand slams (for example) are bid by each method, but where are the figures for each method where a grand slam would be bid but goes down? This is critically important data.

If (hypothetically, I don't have any reason the believe it is true or false) TSP bids fewer making grands than Zar but also stays out of more grands that go down this would be quite important in evaluating the relative merits of the methods.

It may well be that Zar is superior in staying out as you have asserted, but please show me the numbers.
0

#30 User is offline   Zar 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 153
  • Joined: 2004-April-03

Posted 2004-June-05, 12:51

*** hrotgar wrote: “I very much admire your enthusiasm for your point count method and all of the effort that you are making to promote it.
<

I am not promoting anything – I just reply to questions.

I have started 0 threads out of the 15 or so discussing different aspects of Zar Points here on the BBO forum. Neither have I started any thread on any of the other forums where Zar Points are discussed.

> Zar, the "magic" is nothing more than basic statistics.
<

So you are the one that is going to explain to us (since there are no other volunteers) the “statistics” that Zar Points “score” almost 3 times worse – 0.08 vs. 0.21. MikyB started and I thought we finally will have something, but ...

So go ahead – you have my undivided attention:

ZAR
0

#31 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,396
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2004-June-05, 15:30

Zar, on Jun 5 2004, 09:51 PM, said:

So go ahead – you have my undivided attention:

ZAR

I will make the same suggestion that I have several other times.

Start with your database of hands.

Using Zar Points ONLY, making no manual adjustments for hands where you are missing two aces or what not, sort the hands into buckets based on the predicted number of tricks that should be taken.

Bucket 1 = hands where you predict that you take 13 tricks
Bucket 2 = hands where you predict that you will take 12 tricks
Bucket 3 = hands where you predict that you will take 11 tricks
...

Next, use the double dummy solver to determine the number of tricks that should actually be taken and provide summary statistics.

For each bucket, provide the mean and the standard error.
[If you prefer, provide the mean and the standard deviation]

Now, replicate this same procedure for each of the hand evaluation systems
that you are measuriing.

----------

If you prefer, you could invert this technique.

Sort the hands based on the number of tricks that the double dummy engine is able to take and then calcuate the Zar point total for each hand.

Once again, report the mean and the standard error.

-----------

There are arguments in favor of either methods:

Tysen used the first method. If you duplicated using your own database, you should be able to replicate his results. If not, than we know that there is some difference in implementation.

On the other hand, its unclear that people can produce tables that state:

If you hald a combined BUM-RAP count of X, then we expect you to take Y tricks. Indeed, technique 2 is a mechanism to produce just such a table.
Alderaan delenda est
0

#32 User is offline   Zar 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 153
  • Joined: 2004-April-03

Posted 2004-June-05, 16:06

*** hrothgar wrote: " will make the same suggestion that I have several other times. Start with your database of hands.
<

It is a sound advice, but I am afraid you are answering a question that was not asked. The kind request was to explain the "STATISTICAL METHOD" that was used to determine the claim which basically sais:

"Here is a method that is 3 times better than anything known to man"

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR
0

#33 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,396
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2004-June-05, 16:53

Zar, on Jun 6 2004, 01:06 AM, said:

*** hrothgar wrote: " will make the same suggestion that I have several other times. Start with your database of hands.
<

It is a sound advice, but I am afraid you are answering a question that was not asked. The kind request was to explain the "STATISTICAL METHOD" that was used to determine the claim which basically sais:

"Here is a method that is 3 times better than anything known to man"

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

Zar, I'm not responsible for posting that set of statistics.
I didn't do the analysis that produced that set of statistics.
I am not going to defend that set of statistics.

The issue that I am raising is one of methodology. The metrics that you are using to evaluate the Zar points aren't valid. As I have stated before, I don't care how "aggressive" a hand evaluation system is. However, I am very interested in how accurate Zar points are. In particular, I want to understand how accurate Zar points are in comparison to BUM-RAP, "Work" HCP, etc.

Tysen identified a statistically valid method to evaluate how accurate different bidding systems are and has reported his results. Furthermore, he has correctly identified some short-comings in the analytical techniques that you are using to evaluate different systems.

From my perspective, the most useful thing that you could do to promote your evaluation system would be to switch over and start using a more accurate set of metrics. Replicate Tysen's methods with your own database and see whether your results match his own.

Until you start using statistically valid techniques to measure relative performance you are wasting enormous amounts of time/effort.
Alderaan delenda est
0

#34 User is offline   MickyB 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,290
  • Joined: 2004-May-03
  • Gender:Male
  • Location:London, England

Posted 2004-June-05, 17:09

Zar, on Jun 5 2004, 05:06 PM, said:

*** hrothgar wrote: " will make the same suggestion that I have several other times. Start with your database of hands.
<

It is a sound advice, but I am afraid you are answering a question that was not asked. The kind request was to explain the "STATISTICAL METHOD" that was used to determine the claim which basically sais:

"Here is a method that is 3 times better than anything known to man"

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

Zar,

Tysen's method was, for each evaluation method:

Compare the predicted number of tricks with the actual number of tricks on each hand. The difference between these two is the error.

Take the mean of all these errors.

This number worked out at 1.07 for HCP+321, 1.05 for Zar, and 1.02 for TSP. In other words, on average, Zar is 0.02 tricks more accurate than HCP+321, and TSP is 0.05 tricks more accurate. Hence the amount of improvement gained from switching from HCP+321 to TSP is 2.5 times as much as the improvement gained from switching from HCP+321 to Zar. The 0.08 and 0.21 are irrelevant really; they were calculated from the 0.02 and 0.05.

While just claiming that one method is better than another doesn't make it true, which are we more likely to believe - a claim based on sound methods or flawed methods? It is quite worrying that you do not consider yourself a "Statistical Man", as creating and comparing evaluation systems is totally based on Statistics!
0

#35 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,396
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2004-June-05, 17:10

Zar, on Jun 6 2004, 01:06 AM, said:

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

As I noted earlier, I have no way to evaluate whether or not the statistics that tysen produced are accurate. With this said and done, I found it relatively easy to read Tysen and understand what "Score" measures.

Read Tysen's original post and note the following quote

>SCORE is an estimation of the IMPs/board we expect to gain against a
>team that uses a simple HCP+321 evaluation method. It’s a measure
>of how much payoff there is for using a better evaluation system.

Please note that I have never talked to Tysen about any of this, so I might get this wrong, however, I suspect that Tysen did something like the following:

Take a set of X hands.

Use the total HCPs to assign an appropriate contract.
Use a double dummy engine to calculate the number of tricks that can be taken.
Score the hand.

Next, perform the same analysis using a second metric.
Once again, using this metric to assign a contract. Compare this contract to the number of tricks taken by the double dummy engine and score the hand.

NOW, compare the two scores and calculate the number of IMPs won/lost.

Repeat for X hands and then calculate the average.
The "Score" metric is the expected gain/loss per board.

I'll note in passing that HCP scores a 0.0 against HCP, which is exactly what this methodology would require.
Alderaan delenda est
0

#36 User is offline   Zar 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 153
  • Joined: 2004-April-03

Posted 2004-June-05, 19:49

*** mikeb wrote: “It is quite worrying that you do not consider yourself a "Statistical Man", as creating and comparing evaluation systems is totally based on Statistics!”

Thanx for the lesson :-) People learn every day :-)

*** hrothgar wrote: “As I noted earlier, I have no way to evaluate whether or not the statistics that tysen produced are accurate.”
<

You are not alone here, that’s the point.

NOBODY knows anything, yet “that’s the thing!” ... It’s “statistics” we are talking about here, not blah-blah-blah ... Real science ... Don’t you dare to think – it’s whatever I say :-) I say it’s 0.21 vs. 0.08 – almost three times better, period. No more discussions :-)

And what is really amazing, nobody cares – the important thing is the claim.


BTW, I just finished the statistical analysis – it showed that Goren has 0.23- so we are back in square 1. “4-3-2-1, let’s play bridge for fun” :-)

ZAR
0

#37 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,396
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2004-June-05, 20:23

Zar, on Jun 6 2004, 04:49 AM, said:

*** mikeb wrote: “It is quite worrying that you do not consider yourself a "Statistical Man", as creating and comparing evaluation systems is totally based on Statistics!”

Thanx for the lesson :-)  People learn every day :-)

*** hrothgar wrote: “As I noted earlier, I have no way to evaluate whether or not the statistics that tysen produced are accurate.”
<

You are not alone here, that’s the point.

NOBODY knows anything, yet “that’s the thing!” ...  It’s “statistics” we are talking about here, not blah-blah-blah ... Real science ... Don’t you dare to think – it’s whatever I say :-) I say it’s 0.21 vs. 0.08 – almost three times better, period. No more discussions :-)

And what is really amazing, nobody cares – the important thing is the claim.


BTW, I just finished the statistical analysis – it showed that Goren has 0.23- so we are back in square 1. “4-3-2-1, let’s play bridge for fun” :-)

ZAR

What do you mean by "Goren has .23" ???
Are you talking about the Error term, the score or what?

Regard the accuracy of Tysen's statistics.
Unless people demonstrate otherwise, I tend to trust them. In this case, I trust that Tysen calculated the statistics properly.

If I had doubts regarding the accuracy of his statistics, I would perform the same set of calculation using my own data and seek to verify his numbers.

If I were unable to reconcile his figures, I would then attempt to clarify methodology.

I don't understand why this notion is so complicated.
Alderaan delenda est
0

#38 User is offline   inquiry 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 14,566
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Amelia Island, FL
  • Interests:Bridge, what else?

Posted 2004-June-07, 09:39

hrothgar, on Jun 5 2004, 09:23 PM, said:

What do you mean by "Goren has .23" ???
Are you talking about the Error term, the score or what?

Regard the accuracy of Tysen's statistics.
Unless people demonstrate otherwise, I tend to trust them. In this case, I trust that Tysen calculated the statistics properly.

If I had doubts regarding the accuracy of his statistics, I would perform the same set of calculation using my own data and seek to verify his numbers.

If I were unable to reconcile his figures, I would then attempt to clarify methodology.

I don't understand why this notion is so complicated.

He means he tested it and Goren came ouit 0.23 imps better per board than the other systems. He used statsitics to proof it, and he wants now to throw out ZAR points and other systems as being innaccurate.

You accepted Tysen's 0.28, etc, so why are you know questioining Zar's 0.23? Do find typen's 0.00 for Goren and Zar's 0.23 at odds? Maybe one of them is wrong? Maybe both of them? Why do you accept when you are a programmer and could test this is a short period of time by yourself?

I don't accept either of these. Clearly zar is just making a point. So the goren 0.23 is a joke. Tysen is more serious, and i ahve no doubt he thinks his evaluation is correct. Expeience, however, clearly shows to me that ZAR pointsi is much better than Goren. I have looked at a lot of hands, and this is easy to confirm. So I seriously doubt the small difference Tysen shows between them.

Second, i have begun evaluating Tysens TSP points, and find it very similar to ZAR points in many ways. The differences are rather mild, but they are there. But I find it equally unlikely htere will be as huge a difference between ZAR and TSP as shown and furthermore, i think if there is a difference, ZAR will score better (but here i have only a few hands to compare, as I do this the old fashion way, which will be unacceptible to all sides).

Richard, you are a computer programmer, and bridge player. You have the expertise to quiz this stuff yoursefl. Why not give it a go, and report back?

Ben
--Ben--

#39 User is offline   tysen2k 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 406
  • Joined: 2004-March-25

Posted 2004-June-07, 11:33

A flurry of activity over the weekend that I wasn't able to participate in. :blink:

Let me just highlight and comment on a few things on the last few posts.

Zar keeps pointing out "0.24 vs. 0.08" and saying that I'm claiming TSP is 3x better than Zar. I've never said such a thing in my life. As I explained many times before, this was the predicted number of IMP's improvement over HCP+321. All I'm saying is that TSP scores 0.16 IMPs per board better than Zar when compared to a team playing HCP+321. This is a far cry from a 3x better system. These evaluators are very very similar. Since the two methods bid the same thing over 90% of the time, who could claim such a vast difference? This is one reason why I suspect Zar's tests since they produce such different results with practically identical evaluators.

And again, I'm echoing the fact (as others are pointing out too) that Zar's tests are really just picking up agressiveness, not accuracy. I bet this is why Zar sees such a difference between our evaluators. I've said many times that if my system says to bid a grand on 0+ points I'd score perfectly on Zar's tests. Zar has never had a reply to this. The point is that I could be wrong about the number of TSP points needed to bid a small slam or grand. One of the strengths of my tests is that it only looks at accuracy of the system, not accuracy of the "steps."

About the fact that TSP doesn't add as much for a fit. TSP was designed to require the minimum amount of "post adjustment" as possible. Since sometimes the bidding won't let you know everything about partner's hand, it's an attempt to adjust before the bidding starts. I try to be more accurate initially so that you won't have to change as much later.

As those who have read my rgb posts know, my main interest is not really in finding the perfect evaluator, but in studying how valuation changes during the bidding. How does our evaluation change when partner opens 1? How does it change again when RHO overcalls 2? These points are actually very complicated and not easy to put into rules. Let me give you an example:

In my original TSP article at the top of this thread, I hinted at the fact that adding 2 points for each trump over 8 was very simplified, since the real answer was complicated. I've been finding in my studies that the values for honors change a lot depending on how distributional partner (and the opponents are). For example, if partner shows a 5+ suit, he is much more likely to be unbalanced than an "unknown" hand. Our shape becomes more important and our high cards lose importance. Everyone "knows" this, but we don't really have a quantitative feel about how much of an adjustment to make. If I wanted a more accurate evaluator after partner opens 1, I would actually subtract 1/3 of all TSP points outside of spades and then add in a constant of 4 points. Weak hands become stronger and strong hands weaker. The value of those high cards outside of trumps becomes less. However, if I'm going to do this, I'll likely have to lower the requirements for my slams by a few points since it's going to be harder to have two strong hands together. I could do this now at the table (thirds are easy to round off) but there's more. The amount that our high cards change depends on how distributional the other 3 hands are. If partner has a balanced hand, our high cards are now worth more, not less. Let's say partner is balanced with 4 spades, our valuation with 5 spades is going to be different than if partner is unbalanced with 4 spades. So the value of the extra trump not only depends on our shape, but on partner's shape as well. (and the opponents too!) No system takes this into consideration yet. I'm working on it. So you can see that the 2 points for an extra trump is just a placeholder for now.

Tysen
A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.
0

#40 User is offline   inquiry 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 14,566
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Amelia Island, FL
  • Interests:Bridge, what else?

Posted 2004-June-07, 11:54

If you make it worht 0 points with no shortness, 1 point with a doubleton, two points with a singleton and three points with a void, TSP will almost (not quite), but almost be Zar points.. Maybe you will discover this relationship in a few days.

Zar, I found two of the critical flaws, for the life of me can't find the other one. Help me out.. private message is ok, if you want to leave it as a puzzle for everyone else..hehehehe

Ben
--Ben--

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users