BBO Discussion Forums: Rating Players - BBO Discussion Forums

Jump to content

  • 6 Pages +
  • « First
  • 3
  • 4
  • 5
  • 6
  • You cannot start a new topic
  • You cannot reply to this topic

Rating Players Basic theory

#81 User is offline   dwar0123 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 749
  • Joined: 2011-September-23
  • Gender:Male
  • Location:Bellevue, WA

Posted 2012-May-09, 21:39

View Postjdgalt, on 2012-May-09, 19:49, said:

I like the idea of a rating system, but I see several problems that it would need to overcome. Just offhand:

(1) Suppose two very unequal partners pair up. Assume ratings something like ACBL masterpoints: Alice with 1200 and Bob with 50 partner up against Charlie and Doug with 500 each. If Alice and Bob win, does it mean that only Bob gains rating points since Alice was better than their opponents? Or do we count them both as their average of 625 points, so that neither gains anything?

(2) How to deal with a pair that used to be much better than they are now. I like the way WBF masterpoints decay over time; something like that might be called for.

(3) How to deal with players who avoid joining the league so that their points aren't counted. I have run up against some very good players in this category at clubs. In the US you could do the same thing by joining one of ACBL/ABA and playing at the other.

(4) For that matter, a person could have multiple 'nyms on BBO. I'm sure this is against the rules but I'm not at all sure it can be caught. Even if there are 5 BBO login names on the same PC, maybe they're a family or housemates.

My feeling is that rating systems are a good idea for clubs (though (1) through (3), at least, need to be dealt with) but online games should not award points of any kind, unless they're the online forum's own private points, because it's impossible to police adequately. Only noticing cheaters if they consistently get "too good" results will catch only the stupidly greedy.

1. There are many rating systems used across my different games and this problem is essentially a solved one. There are multiple solutions, but the basic idea is very different then the point system you see with the acbl. Unlike the acbl, where the rating only goes up over time, each player has a rating that goes up when they do better then expected and down when they do worse then expected. The expectation for a pair is the average of each persons rating compared to the average of their opponents. If you want more details on this you can google lehman rating for a bridge version.

Most complaints against ratings rest the negative impact it can have on some players who become defensive of their rating. They will get much angrier about bad results and sometimes even refuse to play with or against others because of perceived imbalance in ratings.

There are some other problems, such as separate groups not being compared accurately because there isn't much cross over play and it tends to intimidate poorer players into not playing.

2. Good rating systems will always lag a little behind a player whose skill is changing, but they generally catch up.

3. Most good rating systems work off results, if the results are not done within the system then the question is moot. If they play in the system, then they will have results and be rated, doesn't matter if they join something else or not.

4. Then each login would have their own rating. I am almost 100% sure it is in fact ok to have multiple logins and I know several high profile people that do. They have a public persona and a hidden one when they want to be left alone. I am indifferent to the practice as long as they are not cheating by having multiple logins at the same time while playing.

As for your final point, that brings to mind an interesting idea, what if bbo could enable private clubs to introduce a rating system. That still leaves the main bridge club that the vast majority of people are steered towards free of it, while filling the need that is constantly being asked for. It might also allow different private clubs to experiment with different rating methods to their own determent or benefit.

As for cheating, shrug, some people cheat, both in person and online. Not much you can do about it but catch it when you can and go on with your life. No reason to let them win by spoiling it for everyone by deciding we can't do anything because some people abuse it.
0

#82 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-09, 22:56

These are solved problems - the basic solution is that 'accumlation' style ratings are bad, and modified ELO systems are good. Bridge's Masterpoint system is total toss because it's accumulation based. So, starting with ELO: http://en.wikipedia....o_rating_system

1) You have a variety of options, but rating gain is different from the average score. Typically in the modified ELOs used in team games a weighted average is used for the team's ELO, and players who are significantly variant from the team average have a much lower k-factor and thus a much smaller change of rating.

2) Rating decay is tricky. It's generally a good idea if the player has not played for a long time. If they have played a lot in the interim their rating will reflect their performance accurately. Assuming regular play though it is not required.

3) Not relevant. Everyone has joined BBO, and their ELO can be caculated whether they keep it secret or not.

4) Doesn't matter - you just have to be careful when people are playing new accounts. The two ways to deal with the problem are:

Wait until the new account has played a lot of games (say 50 or 100 hands), then calculate it's rating once (not changing anyone else), then repeat the process starting from the basis that it's rating after 100 hands was it's rating at the start, except this time you do change everyone elses rating as well. Alternatively, you just make people's ratings change fast when they are new, and people who are playing against a new account have a slower rate of change.

Both processes handle the 'rodwell and meckstroth create new accounts and play Fantunes' case. In both options, Fantunes will have minimal or no rating change and Meckwell's will skyrocket to the correct level.

Basically ELO is good and has solved 95% of problems. The only real issue is how do you caculate the ELO for the partnership which is not trivial. Alternatively you can use a Glicko rating system: http://en.wikipedia....o_rating_system which are well proven for 2, 3 and 5 a side games in competitive gaming.
0

#83 User is online   Antrax 

  • PipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,263
  • Joined: 2011-March-15
  • Gender:Male

Posted 2012-May-09, 23:30

Quote

Basically ELO is good and has solved 95% of problems.
Ever played at a chess server? They use ELO ratings and there's a lot of gaming the system going on.
0

#84 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-09, 23:41

View PostAntrax, on 2012-May-09, 23:30, said:

Ever played at a chess server? They use ELO ratings and there's a lot of gaming the system going on.


Yes - and people don't play to protect their ratings too (the one advantage MPs has). However, this is solved to - only control ranked matches and control the formation of ranked matches to an automated matchmaker. It may not be desirable to do this however.
0

#85 User is offline   zenko 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 165
  • Joined: 2006-April-26

Posted 2012-May-10, 13:55

I think Mycroft is spot on.

The only way to do it is to go Wonderlic route, a 10-20 questions test that will, properly devised, give acurate enough picture of your skill. Nobody really need to know is he precisely in 90th or 92nd percentile.

Its quite easy to create that kind of test(s), in tennis you can often tell how good somebody is just by the way he bounces the ball, same goes for bridge.
0

#86 User is offline   dwar0123 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 749
  • Joined: 2011-September-23
  • Gender:Male
  • Location:Bellevue, WA

Posted 2012-May-10, 16:01

View Postzenko, on 2012-May-10, 13:55, said:

I think Mycroft is spot on.

The only way to do it is to go Wonderlic route, a 10-20 questions test that will, properly devised, give acurate enough picture of your skill. Nobody really need to know is he precisely in 90th or 92nd percentile.

Its quite easy to create that kind of test(s), in tennis you can often tell how good somebody is just by the way he bounces the ball, same goes for bridge.

http://lmgtfy.com/?q...ll+test+answers

To create an automated test that is substantially immune to cheating in an online environment is not easy at all. It would be trivial to create a website that can give a perfect result to an online test unless the test is both randomized and timed. Creating a highly randomized test that covers the entire swath of bridge skill from beginner to world class that can fit into a timed format would be incredible difficult to construct. Even in such a case it would still be possible to get someone else to take the test or just create multiple login's until you randomly get a better result then you would otherwise deserve.

With that said, in an online environment, people cheat and it doesn't pay to give up on good ideas just because some people will find ways to abuse it, however as this idea is aimed primarily at solving a problem of people abusing the existing self reporting system, it really wouldn't be worth creating a simple test because the same people would trivially get around it.

Creating the randomized timed test would greatly reduce the occurrence, as circumventing it would actually be a bit tedious, but such a test is anything but simple to create and hardly worth it to solve this problem, though it might be fun for its own sake.
0

#87 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-10, 18:57

View Postzenko, on 2012-May-10, 13:55, said:

I think Mycroft is spot on.


Why? The inherent thesis - that you cannot rate individuals in pairs - is upside down.

For example: http://research.micr...PS2006_0688.pdf

This has been robustly tested for pickup teams environments, but is not as accurate (less perfectly predicts draws), for pre-arranged teams, because the whole may very well be greater than the sum of the parts. That said, it's still quite accurate. Another advantage of implementing a matchmaking approach like this on BBO is that you are more likely to be matched with a partner of equivalent skill against equivalent opponents.
0

#88 User is offline   mycroft 

  • Secretary Bird
  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,719
  • Joined: 2003-July-12
  • Gender:Male
  • Location:Calgary, Canada

Posted 2012-May-11, 10:05

And that differentiates from what I said - that my skill playing with my regular partner is much higher than my skill playing with an equivalent-strength pickup, and that it would be easy to game the system, how?

If one plays pickup all the time, then your skill level will be accurate for a pickup game. I don't - the frustration playing pickup (especially as the pickup pool gets poisoned, as the better partners (note, not the better players) find regulars and select out of the pool, but the bad partners stay in) drives me out of the game. I'd rather not even play in games where I'm playing against pickups because of that frustration. So, my skill level will be skewed, if you do use my rating as a pickup decision marker.

And that paper you cited, while fascinating, did say that one of the major failings was that people would protect their rating by, among other things "not playing", "carefully choosing their opponents, and "cheating". Really. Can't imagine that happening.
0

#89 User is offline   zenko 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 165
  • Joined: 2006-April-26

Posted 2012-May-11, 11:14

But why would do you want to cheat on the test, to not disturb your own dilusion? The point of the whole exercise is to give you fair and objective feedback where do you stand in comparison with other players, not to make you look cool, after all like Hamman correctly noted: "we all play bad, some of us just bit less lousy than the rest", or something like that. If cheating is on your mind you will always find a way to game any system.
0

#90 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 11,663
  • Joined: 2004-August-21
  • Gender:Male

Posted 2012-May-11, 11:59

View Postzenko, on 2012-May-11, 11:14, said:

But why would do you want to cheat on the test, to not disturb your own dilusion?

Isn't that why people cheat on BBO in general? They feel the need to stroke their egos, and a high rating will do that just as much as a high finish on the leaderboard of a tourney.

#91 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-12, 10:35

View Postmycroft, on 2012-May-11, 10:05, said:

And that differentiates from what I said - that my skill playing with my regular partner is much higher than my skill playing with an equivalent-strength pickup, and that it would be easy to game the system, how?

If one plays pickup all the time, then your skill level will be accurate for a pickup game. I don't - the frustration playing pickup (especially as the pickup pool gets poisoned, as the better partners (note, not the better players) find regulars and select out of the pool, but the bad partners stay in) drives me out of the game. I'd rather not even play in games where I'm playing against pickups because of that frustration. So, my skill level will be skewed, if you do use my rating as a pickup decision marker.

And that paper you cited, while fascinating, did say that one of the major failings was that people would protect their rating by, among other things "not playing", "carefully choosing their opponents, and "cheating". Really. Can't imagine that happening.


I'm not sure we are on the same page as to the objective of a rating system. My view is that the only reason to rate players is to facilitate a match-made game that is a good match, where 'good' means 'approximately even chance of either side winning.' With that said, then your objection is irrelevant. If you solely play with your partner and that boosts your rating artificially, that's good! It means you will be matchmade against pairs like yourself of equivalent skill, or stronger pickup players.

As for your second objection - while you have quoted the paper, you missed the relevant part of the sentence. Players are motivated by the skill display, not the rating in and of itself. If the objective of the rating system is to improve the quality of games, why do we need to display the rating? If the 'take me to a table' function worked like a real matchmaking service, profile listed skill is irrelevant, the system will match you with and against players of equal level.

Heck, you could have the matchmaking service seperate individuals from pre-arranged pairs, but it works for Halo so why not here. If it's a serious concern though, it is easily fixed. Just assign players two ratings, one for individual and one for pre-arranged pairs.

If skill display is a much have feature (I don;t think it is, but obviously YMMV), there are a number of implementations in the wild to draw inspiration from. Take a leaf from PGR3's true skill implementation which always displays your skill as the floor of your estimated strength range so your rating practically never decreases (Because while you may lose a race, it decreases the uncertainity banding of your skill which means your displayed skill may go up), or just band displayed skill really widely (Intermediate, Advanced, Expert, with no other classifications), or both.
0

#92 User is offline   zenko 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 165
  • Joined: 2006-April-26

Posted 2012-May-14, 08:29

Rating players can help make BBO experience much better when it comes to random partnerships, which most of us avoid because we do not get matched up to players with adequate skill level. For that purpose we would like to focus on their playing technique and knowledge of commonly used system(s). The rating system can be completely discrete, so no hurt feelings.

To put it simply, if there is a way to be paired up with a player of my ability (or close) I would play much significantly more often than I do now, and I am sure many share this sentiment. It is clearly in BBO's best interest to do something about it.
0

#93 User is offline   Zelandakh 

  • PipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 6,341
  • Joined: 2006-May-18
  • Gender:Male

Posted 2012-May-14, 08:57

View Postzenko, on 2012-May-14, 08:29, said:

It is clearly in BBO's best interest to do something about it.

Sure about that? Would you play more often if a rating system said you were not of an "adequate skill level" to be playing with your current partners? Perhaps it is in BBO's interest to stick with what has been a very successful formula and not to upset too many. Other sites have tried rating systems and not been anywhere near as successful. As I have said before, I would prefer a finer gradation for intermediates within the self-rating system but I can well understand the management not wanting to change anything at all which is not an obvious upgrade.
(-: Zel :-)
0

#94 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-14, 18:41

View PostZelandakh, on 2012-May-14, 08:57, said:

Sure about that? Would you play more often if a rating system said you were not of an "adequate skill level" to be playing with your current partners? Perhaps it is in BBO's interest to stick with what has been a very successful formula and not to upset too many. Other sites have tried rating systems and not been anywhere near as successful. As I have said before, I would prefer a finer gradation for intermediates within the self-rating system but I can well understand the management not wanting to change anything at all which is not an obvious upgrade.


Why does any rating system have to say that? Or do that? PGR will let me party with anyone I choose regardless of skill. DOTA 2 doesn't try and tell me I shouldn't play with Kilthix and Gladius because they are much better than me. It juts tries to get a good 5v5 together.

I do not understand why putting a matchmaking system behind the 'take me to a table' and 'take me to a table I have a partner' would be anything other than an obvious upgrade. Implementing a matchmaker requires ratings. Ratings does not require skill display or any other controls.
0

#95 User is offline   zenko 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 165
  • Joined: 2006-April-26

Posted 2012-May-14, 20:23

Exactly,
If I play with my friends thats precisely why I play with them, I can not care less whats their rating. The rating comes in play only with random partners, I would of course love to play only with better players than me, but I will take anybody who is worse than me but not too much, I am never in a mood for teaching strangers how to play, especially for free.

I truly believe that people rarely overstate their skill level on purpose. Bridge is one of those games where is hard to grasp how much you do NOT know untill you become quite good. I can not really blame somebody who managed to execute their first intentional squeeze to feel like expert after that (its been long ago, so I cant quite remember it, but I surely felt like I am up there, one step below Bermuda Bowl champs).

This is how it should work: when I click on "take me to the next available seat" the program should pair me up with open seat across the player with "rating" (or however you want to call it) closest to mine. If a new table forming all 4 players should be selected the same way, closest possible to each other. Also you can have a tool to set a range of pards I am intereset playing with, that I can limit from both (or just one) sides, say in 20 percentile range around mine. Also tagging somebody as a friend overides the feature, so it can be used by table host to stear away inadequate opponents. Ratings should not be visible to anybody and should completely reset fairly often, say every 50-100 boards, that would make them bit less reliable, but thats small price to pay to keep everything friendly.
0

#96 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-14, 21:25

View Postzenko, on 2012-May-14, 20:23, said:

Also you can have a tool to set a range of pards I am intereset playing with, that I can limit from both (or just one) sides, say in 20 percentile range around mine. Also tagging somebody as a friend overides the feature, so it can be used by table host to stear away inadequate opponents. Ratings should not be visible to anybody and should completely reset fairly often, say every 50-100 boards, that would make them bit less reliable, but thats small price to pay to keep everything friendly.


This is bad - the matchmaker needs to control +/- skill differentials permitted to optimize queuing times. Also, if your rating system is functional why reset it ever? I agree masking of the actual rating is fine.

If you have a bizarre compulsion to include a displayed rating, avoid using the underlying ELO or TrueSkill, band them up and display the lowest grade in the uncertainty range. Do not use real world terms either. 'Intermediate' 'Advanced' etc are just going to get on people's nerves.

If you held a gun to my head and demanded displayed ratings, I'd use the Starcraft 2 model, and probably with less bands. Starcraft is VERY clever in that it seperates the displayed rating from the matchmaking rating(!).

So basically the player base is sorted into buckets:

    Grandmaster: Top 200 players
    Master 2%
    Diamond 18%
    Platinum 20%
    Gold 20%
    Silver 20%
    Bronze 20%


Within each bucket, players are grouped into divisions (of 100 players), and you get 'Masterpoints' (actually Blizzard points) when you play games and this number never decreases, so you generally steadily climb your division rank. Plus instead of being rated against the totally meaningless pool of BBOers, you actually see how you are doing against a pool of roughly equal players. Also you get to be in more top 10 lists.

But this is totally separate from the MatchMaker rating. Just if you matchmaking rating creeps into the range of values expected for another bucket and stays there, Blizzard will give you a bump.

If BBO was going to do this, chop out all tiers under gold and fold them into 'gold' and let people assign themselves a Beginner rating for their first 50 matches (probably 300 boards) if and only if they want to.
0

#97 User is offline   Zelandakh 

  • PipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 6,341
  • Joined: 2006-May-18
  • Gender:Male

Posted 2012-May-15, 00:58

View Postzenko, on 2012-May-14, 08:29, said:

because we do not get matched up to players with adequate skill level.

To put it simply, if there is a way to be paired up with a player of my ability (or close) I would play much significantly more often than I do now



View PostCthulhu D, on 2012-May-14, 18:41, said:

Why does any rating system have to say that? Or do that?

I was responding to the above points in the previous post. The poster claimed they were not matched up with players of an "adequate skill level" and would play more if they were. I made the point that a rating system might say that they were actually not of an "adequate skill level" for the current partners. Then they would perhaps not play more. I was also making the point that, contrary to the part of the text that I quoted, it is quite likely not to be in BBO's best interest at all to "do something about it".
(-: Zel :-)
0

#98 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-15, 02:57

View PostZelandakh, on 2012-May-15, 00:58, said:

I was responding to the above points in the previous post. The poster claimed they were not matched up with players of an "adequate skill level" and would play more if they were. I made the point that a rating system might say that they were actually not of an "adequate skill level" for the current partners. Then they would perhaps not play more. I was also making the point that, contrary to the part of the text that I quoted, it is quite likely not to be in BBO's best interest at all to "do something about it".


I strongly disagree with your assertion here that a completely random match-up of 4 players around a table is likely to be more enjoyable than 4 match made players for any given player.
0

#99 User is offline   Zelandakh 

  • PipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 6,341
  • Joined: 2006-May-18
  • Gender:Male

Posted 2012-May-15, 04:38

View PostCthulhu D, on 2012-May-15, 02:57, said:

I strongly disagree with your assertion here that a completely random match-up of 4 players around a table is likely to be more enjoyable than 4 match made players for any given player.

Could you quote the specific part where I asserted that? I find it extremely insulting when people tell me what I said (or thought) by writing (or saying) something completely different.
(-: Zel :-)
0

#100 User is offline   Cthulhu D 

  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 794
  • Joined: 2011-November-21
  • Gender:Not Telling
  • Location:Australia
  • Interests:Overbidding

Posted 2012-May-15, 05:32

View PostZelandakh, on 2012-May-15, 04:38, said:

Could you quote the specific part where I asserted that? I find it extremely insulting when people tell me what I said (or thought) by writing (or saying) something completely different.


In post #92.

You state

Quote

The poster claimed they were not matched up with players of an "adequate skill level" and would play more if they were. I made the point that a rating system might say that they were actually not of an "adequate skill level" for the current partners. Then they would perhaps not play more. I was also making the point that, contrary to the part of the text that I quoted, it is quite likely not to be in BBO's best interest at all to "do something about it".


There are a number of possible interpretations - you pick a very weird way of saying things with phrases like 'the rating system will say' (why will it say anything?), so it's quite hard to work out what you mean, but this is my guess:

A) He will be blocked from playing with pre-arranged partners because the skill differential in the skill partnership is not addressed. I initally thought this was what you meant, but you corrected me in post #97. So I'm left with

B) He has an overinflated perception of his own skill - Zenko self rates as 'expert' but the hypothetical is he's actually intermediate. In this scenario he will be matched with lower rated partners against equivalent opposition than in the current random scenario. Your hypothesis is presumably then he will play less because he's exposed to bad players sitting opposite, rather than being able to repeatedly leave matches until he gets an 'expert partner.' Note that this scenario is flawed. While Zenko may enjoy being able to partner with with better partners, the 'real expert' version of Zenko is going to be frustrated playing with an intermediate that claims he is an expert. This is a net neutral position - if people don't enjoy playing with players rated worse than them, as you hypothese here, someone is getting screwed. Additionally, because the ratings are self applied, it's functionally random. I self rate as beginner and am pretty damn bad, but I'm better than some of the advanced pards I've played with (e.g. the guy who hectored me for opening a balanced 19 count 1C playing a 15-17 NT).

So anyway, analysing the hypothesis:

  • The base case is the current system of pressing the 'take me a table' button. You get 3 random players + you.
  • The proposed scenario is a matchmaker where you get 4 players of approximately equivalent skill levels.


You state that he will play less when in a match-made game rather than a random game. As the 'cost' of playing wouldn't change between the two scenarios (same time investment, BBO is free), the only reason for him to play less is that he would derive less utility from playing. The only utility you can derive from playing a random match made game on BBO is personal enjoyment. Therefore the only logical interpretation I can reach of your statement is that matchmaking = reduced enjoyment.

If you meant something else, I do not understand the point you are trying to make.
0

Share this topic:


  • 6 Pages +
  • « First
  • 3
  • 4
  • 5
  • 6
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users