Evaluating ZAR points a simulation
#1
Posted 2005-February-19, 13:23
I'm just programming a simulation to test ZAR points. I'm using a double-summy solver to check the result. If you think I should modify something speak up now.
Here is what I'll do:
Dealer is always N, if the north hand is a ZAR opening and south holds an answer, i'll analyse the hand, otherwise it will be counted as a dropout.
To analyse it, i look for the longest fit available. If it is 8+ cards, the hands will be reevaluated using the fit information. The sum of of both players is used to find proper level. Then the double-dummy solver will try to make the contract. If the contract is made, i increment the good results, otherwise i'll increment the bad ones.
Sections are levels 2,3,4 and 5.
There is an extra section for 6/7, that is satisfied, when 11 tricks can be made. 11 is enough, because i expect any decent partnership to check for missing keycards before bidding a slam. To do this the 5 level must be save.
If the best fit has only 7 cards, the contract will be:
2 some suit, if the level is 2.
NT if the level is 3+. Since 3NT needs the same points as game in a major, i put (n NT) to the (n+1) level. So if the level says 3, the contract will be 2NT.
Every contract is played by south!
(1) Some strange settings are due to the fact that i don't want to write a bidding engine. People have invested much more brain and time to do that, than I'm willing to put on this project. And I not convinced, by most of the results.
(2) If i'd implement a bidding engine, the results would depend on the given bidding system.
(3) EW cards or possible bids are not taken into account.
#2
Posted 2005-February-19, 14:10
If your willing to go to this much trouble, then I strongly suggest that you look at some of the earlier posts in which Tysen and I suggested various methodologies for testing the accuracy of different hand evaluation metrics. This is a complex subject and if you design an inappropriate test then there is a very real possibility that you'll waste a lot of time...
As I noted in the past, I'd recommend an approach like the following:
1. Generate 1000 hands using any one of a variety of Dealer programs
2. Define a set of 13 buckets. Each bucket defines the maximum number of tricks that can be taken on a double dummy basis.
3. Sort the hands into buckets
4. For each hand in a given bucket, Let X = the sum of the Zar points for Declarer and Dummy
5. Calulate the Mean and Standard Deviation
The relative accuracy of different metrics can be determined from these two statistics, so a c"complete" analysis would need to compare Zar Points to an alternative schema like Bum Rap.
If you prefer, you could invert this entire proceed. Your initial buckets would measure the ccombined Zar Points of the two hands. You could then calculate average number of tricks taken for two hands with X combined Zar points.
#3 Guest_Jlall_*
Posted 2005-February-19, 16:03
#4
Posted 2005-February-19, 17:17
#5
Posted 2005-February-19, 19:23
I made 2 test runs yet. One counting K, KQ, Qx, Jxx, QJ with their full load.
There where more bad's than good's as expected. In the second run i put all those to 0 (exept KQ = 3+1).
This is the result of the second run.
Droped: 207 Level: 1 Good: 1 Bad: 0 Level: 2 Good: 13 Bad: 3 Level: 3 Good: 22 Bad: 10 Level: 4 Good: 20 Bad: 3 Level: 5 Good: 17 Bad: 3 Level: 6 Good: 6 Bad: 0 Level: 7 Good: 2 Bad: 0
There are problems with non fit hand, most of the bad 3 Level contract's are missfit NT's. Although I treat the misfit NT's as one level lower, they still go down.
Up to now i only use the HCP + ControlPoints + 2*longest suit + 2nd longest suit - shortest suit.
I'm going to implement the following extra's:
+1 if 15+ hcp concentrated in 3 suits or +1 if 12+ hcp in 2 suits
KQ,QJ, K, Q, J each -1 for unsave honors
For the fit reevaluation I intend to implement:
+1 for each trump honor (incl. T) with a maximum of 2 (both sides ?)
I'll look for the combined shortest suit and downgrade honors by one
Since i have no bidding taking place, I still thinking about the second suit.
So I'm not sure, if and how i will implement the extra points for the second suit.
Additionally I don't know "how many trump" were promised, because i counted the combined length, and must deside when to add the 3 HC for additional trump length.
Since it's middle of the night here, I'll take a break now.
#6
Posted 2005-February-20, 06:54
Jlall, on Feb 19 2005, 10:03 PM, said:
Maybe so, but it can analyse a thousand boards, much faster than i could.
You will usually not play that good, but on the other hand you won't get the perfect defence either.
#7
Posted 2005-February-21, 05:30
Here's my list:
-4 K
-3 QJ
-2 AQ, AJ, KJ, Q, Qx
-1 A, AKJ, KQJ, Jx, Jxx
Upgrades:
11-14 HCP with more than 11 in 2 suits +1
15+ HCP with more than 15 in 3 suits
So i think i have the pre-bidding evaluation done.
Anyone interested, can get a csv-Files to be read with Excel or Open Office containing a list auf deal, Zar_points for each hand, the selected fit and the number of tricks the double dummy solver made.
#8
Posted 2005-February-21, 05:53
Is it: "If I add the Zar points of the hands and select a contract, the contract will make" ?
I'm very interested in these results, if you could send me the files I'd be very grateful.
Email: gerben AT t-online DOT de
To save time you might want to use the deals from the GIB Double Dummy library (see the GIB research page).
Gerben
#9
Posted 2005-February-21, 07:56
Gerben47, on Feb 21 2005, 11:53 AM, said:
Is it: "If I add the Zar points of the hands and select a contract, the contract will make" ?
This is one of the questions, the others are:
How good is the prediction beween 3/4M?
As we know vul @ imps you start gaining, if your game/down ration is better than 38%.
How good is the prediction fo 3m, because if it is accurate 5m may be a good defence.
Weak Zar openings need controls to open, are they worth 2 defence tricks?
If i find time again, i'll try with other evaluation methods, too.
#10
Posted 2005-February-21, 07:56
Jlall, on Feb 20 2005, 01:03 AM, said:
I'd be very interested to know what this assertion is based on?
"Everyone" knows that double dummy analyzers do not provide a perfect approximation of single dummy play, let alone the behaviour of "falliable" wetware systems like the human brain.
With this said and done, double dummy solvers are orders of magnitude faster than alternative approaches and there is an awful lot to be said for substituting brute force and massive numbers of repition for elegance. As an analogy, consider the way that high end pharaceutical scales are now developed. The circuits built into high end scales are actually quite innaccurate. The scales themselves achieve their accuracy by weighing a samples tens of thousands of times and the averaging the results. Since the "noise" is randomly distributed, it will cancell itself out.
From my perspective, a similar approach is more than appropriate in measuring the accuracy of hand evaluation systems.
It should be noted that there can be problems with this approach. Most notably, if the double dummy analyzer introduces systemic bias, there could be problems. For example, assume that the double dummy analyzer was biased in favor of declarer this bias function was a function of the algorithm being evaluated... In this case it would be extremely difficult to differentiate the two error sources.
To date, I've never seen a good analysis that suggests that double dummy analyzers introduce systemic bias. I'd be interested in seeing anything to the contrary.
#11
Posted 2005-February-21, 08:47
Droped: 492 = no opening at N or S
Misfit: 224 = no 8+ Fit (might be source of bad results)
Level: 1 Good: 73 Bad: 65
Level: 2 Good: 153 Bad: 125 42-46
Level: 3 Good: 283 Bad: 168 47-51 ZAR
Level: 4 Good: 247 Bad: 130 52-56 ZAR
Level: 5 Good: 134 Bad: 52 57-61 ZAR
Level: 6 Good: 40 Bad: 28 62-66 ZAR
Level: 7 Good: 5 Bad: 5 67+
NT contracts are shifted one level e.g.: 3NT = 52-56.
#12
Posted 2005-February-21, 08:52
hrothgar, on Feb 21 2005, 01:56 PM, said:
I think I have seen statistical analysis of Word Championship hands that showed that declarers there would on average get more tricks than they should on a double dummy basis. The deviation was s.th. like a third or half a trick.
Sounds plausible to me, given how many tricks are lost on the opening lead alone.
Arend
#13
Posted 2005-February-21, 10:50
My guess is that this pro-declarer bias at higher levels helps bring DD results closer to table results. DD may be a bit unfair to Zarpoints at the partscore level--when the strength is fairly equally divided, DD info will be useful to both sides and that will be a gain for the defense vs. table results.
By the way, Zar points could be quite useful for suit contracts while being worthless for NT (compare the LTC) so the NT results will be of limited utility.
#14
Posted 2005-February-21, 10:51
cherdano, on Feb 21 2005, 05:52 PM, said:
hrothgar, on Feb 21 2005, 01:56 PM, said:
I think I have seen statistical analysis of Word Championship hands that showed that declarers there would on average get more tricks than they should on a double dummy basis. The deviation was s.th. like a third or half a trick.
Sounds plausible to me, given how many tricks are lost on the opening lead alone.
Arend
Thanks for the data point: One addition "quick" comment.
Its still unclear the extent to which any such bias would impact the analysis in question.
Assume for the moment that Single Dummy play is .3456 tricks "better" than double dummy play. Furthermore, assume that this bias is the same regardless of the relative strength of the hands in question.
In this case, the bias would adjust the mean number of tricks taken but would NOT effect the relative variance. And, since the accuracy of the hand evaluation technique depends on the variance this really doesn't effect the methodology...
#15
Posted 2005-February-22, 13:37
Also about the accuracy of DD data compared to real world declarers. Peter Cheung did an extensive study of 383,000 okbridge hands (25 million plays) and found that on average there is only 0.1 tricks difference. A DD declarer has the advantage in slam contracts, but the DD defenders have the advantage at partscores. Around game, DD is very accurate.
Tysen
#16
Posted 2005-February-22, 15:34
those are interesting links.
But there is something about wheeles, some have spokes, some have rims, and if they don't match in form or size, you need to get your own.
hotShot