Mig 
Greengard's ChessNinja.com

Ruso-Sino Relations

| Permalink | 54 comments

Hey, you got your egg roll in my borscht! Hey, you got borscht on my egg roll! That must mean it's time for the annual match between chess powerhouses China and Russia. Host city Nizhny Novgorod has been a hotbed of Russian political opposition activity and counter-activity in the past year, btw.

It's ten rounds, Scheveningen style. It's really two matches, five boards for men and five for women. They combine the scores, however. ChessBase has the clearest breakdown and photos but it's a few days old. TWIC has the latest score and games (27.5-22.5 China after today's fifth round). The official site has a bunch of funny writing.

It seems like every major team event China plays in they bring a few new underrated young female talents who are then replaced by yet others a year or two later and are never heard from again. Their women would be favored in a 20-board match against the rest of the world. The men are hardly slouches of course. You'll remember that they won the silver in the 2006 Olympiad while the Russian team failed to medal. The Russians don't have their top-rated players in action, most of them are playing in the Mexico City world championship in a few weeks. The team of Inarkiev, Timofeev, Jakovenko, Tomashevsky, and Alekseev isn't exactly cold leftovers though. China brought its A-Team of Bu Xiangzhi, Ni Hua, Wang Yue, Wang Hao, and Zhang Pengxiang. The Russian men are leading by a point after going 3.5-1.5 in the fifth round. They won last year by a three point margin. The Chinese women lead 15.5-9.5. And Hou Yifan isn't even playing.

Russian champ Alekseev is out to a 4/5 start to lead the individual scores. (Zhao Xue is also 4/5 on the women's side.) That includes a very sharp counterattacking effort against Ni Hua's Maroczy Bind in the fifth round. For balance, Ni Hua's 15.e6!? against Tomashevsky in the first round deserves an eyeball. Alekseev beat Wang Yue in the crazy Slav line we saw recently in Berkes-Harikrishna and Bluvshtein-Miton. Both of those games followed the computer recommendations on the 12th move (0-0 or Ke2) but Alekseev instead sacrificed a full rook for a dominating knight, menacing passed pawn, and the bishop pair. It quickly paid off. In hindsight, Black should have left the exchange alone and taken the knight on move 14. A nice piece of analysis by Alekseev.

54 Comments

It looks like they put up the A-team of Russian youth to play against Chinese youth (China's youth being their A-team overall, having no males good over age 25, whilst the Russian youth make that country's B-team.)

No Grischuk, Moro, Svidler, or Kramnik.

Isn't it remarkable how effortlessly the Chinese government is able to motivate its citizens to excel in virtually any sport that takes the politburo's fancy?
In the old cultural revolution days, ping pong was about the only game they played. Now they have gold medals in almost everything.
There is a lot to be said for fascist dictatorships, no? "Lost with Black? Off with his head, fellow workers!"

Incidentally, on the women's side it's the Chinese A- team versus Russia's A- team.

I'm sure it also helps that China's population is greater than North America's and Europe's combined.

You guys aren't being fair. The big difference maker is the motivation. The Chinese always take these types of events very seriously and generally perform much better at team events than they do at anything else. The government makes sure the players are focused and prepared both through direct involvement and through creation of mentality which says representing its country is important. These guys are going to go for the wins. These guys are going to treat this like one of the most important events of the year. Russia, on the other hand... A while back, Russia put an All-Star hockey team together, which outshined the competition on talent level, but did so-so result-wise. They put an all-star team in the last Chess Olympiad and finished medal-less. The soccer national team traditionally underperforms. This event is scheduled for a period of time when four best players in the country will be unavailable and others will be serving as seconds for them.

I believe the true difference is the women. I don't believe they're underrated because of the many international events they play in, but for some reason they completely dominate in team events.

The men do well, but not as well as the women, even though here the Russian ladies might even slightly outrate them.


The average rating in events like these is misleading. My guess is you will points facing several players that slightly outrate you and one who is far below you than five players who are slightly below you, even if the average is the same. (for example, 5 2620s vs 4 2640s and one 2540) Based on ratings alone, the Russians should have an advantage.

i don't understand statements(at chessbade.com)like"the chinesse team is leading by 5 points at the halfmark.."since then they mixed the results of alekseev lets say with korbut's results?????what's next,italy is not football world champion anymore because their women team undeperform??? lets get real they are 2 events:male event where russia team B(youth)is a little better than the ambitious china A team;and women's event where china A is trashing russia A(even if kosteniuk would play for russia the results wold be the same,cause the difference is big)male match is interesting in the absents of the 'mexicans".

Yuriy, your guess is absolutely wrong :)

Dany the style they are playing, combining the scores, was agreed to. It isn't unfair at all.

kgd agreed by whom?most of the sites(specially the russian sites)are taking the event like two separate matches.even at the olimpics(or world teams or european teams)they might play in the same hall but they don't mix the results(male and female).that is ridiculous..

Agreed by the players (they DID show up) and by whoever is organizing the match.

Come on, pj, who would you rather face, five players within 20 points of your rating, or four within 20 points of your rating and one 80 points below it?

Yuri Kleyner's speculation needs empirical-evidence backing. Otherwise we must accept the null hypothesis of a linear rating-scoring correspondence.

YK,
The difference is that I blindly believe in mathematics, while you fighting against it, almost as a hobby:) Average rating is average rating, regardless how it is formed, statistically I am expected to perform in the same against the same rating. You too, by the way :)

That's not mathematics, that's taking blind faith in one number (average rating) and ignoring everything else. There is a world of difference between playing people your own class and a game against a player 100 points below you. In a long match, sure, maybe the ratings will average out, but in a short 2-game match, you have to round the score to the nearest half-point and I am afraid it doesn't round favorably for the guy with the 100-point gap. (that's without even asking whether chess ratings can correctly predict the player's strength to within 20 points, my guess is, considering how much ratings vary from one listing to the next, they can't)

SH, I am pretty sure what you describe is not a valid null hypothesis or the right one for this scenario...it should be that the performance score is the same regardless of opposition rating average, no?

What wonders me is that Chess as we know it in the West is not known at all by anyone in the general population of China, they all play Chinese Chess like we here in the West all play (Western) Chess and not Chinese Chess. How did the Chinese get so good at (Western) Chess? The two games are similar, but there are substantial differences as well. What is happening to (Western) Chess these days is comparable to what happens if Western males all of a sudden make inroads into the top 100 of Chinese Chess and Western females all of sudden make inroades into the top 20 of Chinese Chess.

What wonders me is that Chess as we know it in the West is not known at all by anyone in the general population of China, they all play Chinese Chess like we here in the West all play (Western) Chess and not Chinese Chess. How did the Chinese get so good at (Western) Chess? The two games are similar, but there are substantial differences as well. What is happening to (Western) Chess these days is comparable to what happens if Western males all of a sudden make inroads into the top 100 of Chinese Chess and Western females all of sudden make inroades into the top 20 of Chinese Chess.

I am speachless, well almost.
So in one game it rounds against weakest player but in a match it averages out?
You really need to read how ELO works.

It's a pretty pointless argument since you it's impossible to put a number on a human - at best it's an educated guess.

How many players in the Russia-China match will perform exactly at their ratings? I'd be surprised if even 1 did.

Take a simple rating calculating method and follow this scenario:

Two players, one of whom is expected have a 2/20 score against the other in a 20-game match.
According to their ratings, he would also be expected to score a 0.2/2 in a 2-game match.
But it's impossible to score a 0.2, because you can only score in increments of 0.5 points. Your most likely outcome is 0/2. And if you are even weaker and your true strength is say 0.5/20, then you scoring a 0/2 and having a performance lower than your rating is almost a certainty.
A significantly weaker player therefore in a short match would most likely end up with a rating that underestimates his true strength. That's what I mean by the gap not rounding favorably for the weaker player.
By the same token, if you are expected to score anywhere from 8/20 to 9.5/20, then in a short match, you would most likely end up with a score of 1/2 and a rating that overestimates your true strength.

Fired, you are right, of course. My original point was that the number of 100-point gaps will probably end up more significant in a match like this than average rating.

So in 2 games against -100 opposition you are going to perform above your rating, but when you play 2*10=20 games you are going to perform equal to yor rating.
How?

The 100 is just a number I threw out there. I don't know what the probabilities are for that particular gap. It's usually impossible to get a rating that accurately reflects your score, since your score changes in increments of 0.5 (which are very large ratings gap for a 2-game match)
Because of this if you only play 2 games against an opposition that your rating says you should score a 2/20 against, you will definitely end up not with 0.2/2 but, most likely a 0/2.

Ok Yuriy, you feel that something is wrong in what you have written? :))

Do you feel that there is? Want to share it?

Jeff Sonas claims the Elo-system is actually slightly unfair to the higher-rated player when two players with a 100 point difference play each other:
http://www.chessbase.com/newsdetail.asp?newsid=562

Kleyner doesn't quite understand the concept of expected score.

Anyway, regarding Sonas' finding, it makes more sense because a pairing of closely rated players (say 50 pts and less), the weaker player doesn't have a tendency to play for draw (to gain pts, to hold off stronger player).

As the rating gap increases the greater tendency of the lower player to seek draw. (Versus actual score if they played "blind", without knowing the other person's rating.)


Sorry, I'm a mathematician and this confusion about expected score and the like brings out the math professor in me.

Okay, let's assume that two players A and B are playing a 2-game match and the rating difference is such that A is expected to get 2/20 in a 20-game match. (Let's also disregard Sonas's finding and assume that the Elo system works like it's supposed to.) Yuriy is absolutely correct that A will most likely score 0/2 and will thus most likely lose rating points. However, this does not mean that A has a negative expected rating change from this 2-game match. While performing better than 0.2/2 is less likely than performing under 0.2/2, when the former happens A will gain more points than he or she will lose when the latter happens. The rating formula is designed so that, if we had, say, a million people of A's rating play 2-game matches with a million people of B's rating, the more frequent smaller losses incurred by the A side would balance out with their less frequent larger gains and thus the average rating of all those A players would remain (approximately) the same.

Just for reference, "expected value" is given by VALUE * PROBABILITY_OF_THAT_VALUE summed over all possible values. For example, I'll make up some numbers and say that, in a single game, A beating B has probability 5%, A drawing B has probability 10%, and A losing to B has probability 85%. Let's say that A gains 20 points for beating B, gains 10 points for drawing, and loses 2 for losing. Then the expected rating change of A after playing B is 20 * 0.05 + 10 * 0.1 - 2 * 0.85 = 0. (Well, actually it's 0.3 and not 0, but I wanted to choose 20, 10, and 2 to have nice numbers instead of choosing something nasty that would come out right. Anyway, the point of this example is just to show what an expected value is and how to calculate it.)

Hope this helps and that I didn't just confuse things more.

LazyNinja, exactly.
Yuriy says that it is better to play against 4 people who are +20 against you and one is -100 rather than 5 people that are equal to you. In your terms, he just weights the game against -100 as a sure win (100%), because win is the most likely outcome.

Yuriy,

Exactly. Most times you will end up with 0/2 i.e. underperfoming a bit, but in the case you actually end up with 1/2 (not unlikely, just 20% probable) then you will end up overperfoming A LOT. Therefore your expected performance is again the same.

I will ask you to play with me the following game:

throw a dice of 100 results. If it doesn't bring the number (say) 13, I give to you 1£. If it does you give me say 1,000,000£.

We will play only 2 times this game.
According to you, since you almost always win, in a 2 games match you will actually win 2£. So why not try? (of course you won't because your expected score is negative, since it is 1% to lose only but in the case you lose you lose much more)...

No, I understood all this but we were talking about the probability of an indvidual 2-game match being an under-performance or an over-performance of a "vastly lower-ranked player" rating. If we take a 18/20 player and put him in a 2-game match against a 2/20 player, the 2/20 player's RATING may improve vastly if he scores above his expectations, but it will still be only a .5 gain in points for his team score-wise. And this gain is rather unlikely (because chances of him scoring a 0/2 are far greater than of him scoring a 0.5/2). In derida's terms, if he wins, I give him $2.

playjunior, the under-20 vs the under-100 is a different argument, maybe this scenario (which is far more drastic) will help explain what I am getting at.

Take 2 groups of players:
1:4 players ranked 2700 and 1 ranked 1200
2:5 players ranked 2400

Now, the average is the same, 2400. But do you expect a 2700 to do the same against both groups? A 2400? A 1200? My guess is 2700 scores a +2 against the first and better than that against the second, a 2400 scores negative against the first and even against the second, and the 1200 scores 1 point against the first and no points against the second.

playjunior, you are misinterpreting. All Yuriy is saying is that the probability of overperforming, i.e., achieving a positive rating gain of any amount, is larger for the higher ranked player when the number of games is small. This is absolutely correct. As the number of games increases, the variance of either player's net rating change is reduced as it tend towards the mean (which should be 0 assuming ELO works correctly and the games can be considered as independent events).

An 1800 has almost zero chance of scoring on a 2400, so pushing it down to 1200 means basically "giving" 600 for nothng.

So comparing those of altogether different "ballparks" is strawman.


Cynical Gripe said:
"All Yuriy is saying is that the probability of overperforming, i.e., achieving a positive rating gain of any amount, is larger for the higher ranked player when the number of games is small."

This is not his original thesis: Yuri having stated plainly: "Based on ratings alone, the Russians should have an advantage."

That is false as LazyNinja explained, for rating predictions assumes you're sampling from an infinite number of events.

Cynical Gripe, Yuriy:
LazyNinja, me and derida explained everything in the clearest possible way. Probability of rating gain SHOULD be weighted with the amount of gain. See derida's example with lottery once more to see the difference.
A higher rated player playing a lower rated player is not expected to play better than his rating. He is expected to perform EQUAL to his rating.

playjunior,

lol, c'mon dude. LazyNinja was exactly right, and I never said otherwise. Was my statement incorrect? I don't think so. Everyone understands expectation, even kids in grade school.

SH,

"An 1800 has almost zero chance of scoring on a 2400, so pushing it down to 1200 means basically "giving" 600 for nothng."

So comparing those of altogether different "ballparks" is strawman."

Don't you mean he has almost zero chance but that the gain is so high in case he does score that it outweighs the smaller loss in most likely scenarios? If not, you are proving my other point.

The very idea is what happens when you play somebody in a different ballpark rating-wise. If you go SOLELY by average opposition rating, a 2700, a 2400 and a 1200 should perform the same as well against Group 1 as they do against Group 2 above. Clearly, they wouldn't.

"This is not his original thesis: Yuri having stated plainly: "Based on ratings alone, the Russians should have an advantage.""

Read the entire post. If you think that what I meant was "BASED ON AVERAGE RATING Russians should have a higher expected performance," then you are wrong. I was not at all talking about what the elo should be, I was talking about what the most likely outcomes are based on sense and how much chess changes when facing a far stronger player--not blind ratings numbers. Originally I even counted the number of times Russian GMs would face an opposition 50 points below versus the number of times Chinese GMs would--that part I edited out later, unfortunately :(

Actually, looking over the list of players it does look like Russians have a higher rating on average, so I wasn't even wrong in that sense :)

playjunior,

"LazyNinja, me and derida explained everything in the clearest possible way. Probability of rating gain SHOULD be weighted with the amount of gain."

Yes, but we are not talking about ratings gain or expected rating performance. In a 2-game match between the two, an underperformance by the considerably weaker player is more likely than an underperformance by the considerably stronger player. LazyNinja agrees with me on this. Do you disagree?

My earlier quote of YK refers to the claim regarding expected score wrt to rating variance.

Let's try different numbers: two players a True 300 and a True 1300 (not underrated). Expected score of either against 2400 is 0.00%.

This Theshold Effect occurs beyond some point (400pts? 600pts?) where direct comparison isn't possible with Elo(only through results against third parties).

That is why averaging across players doesn't work through a Threshold. (All of the participants in the male competition lie within ~100pts.)

With thresholds it's better to sum the expected scores *per matchup*.


Actual rating systems incorporate the concept that SH just described. When rating a tournament, instead of computing each player's expected score based on opponents' average rating, an expected score is compted for each game and those are summed. This insures that no one ends up gaining points due to a loss or losing points due to a win, which otherwise could result from what SH calls the "threshold effect" above. (When a 2400 beats an 1100, the downside impact of the 1100 on the average rating of all opponents, would outweigh the positive impact of the +1 score. Likewise, if opponents' ratings were averaged, the 1100 would come out of the tournament better rating-wise that if he hadn't played and lost to the 2400.)

My memory is that the USCF and FIDE systems handle this differently, but I'm not sure.

Yuriy, now you are into some other discussion. Your initial point was:
Quote:
"The average rating in events like these is misleading. My guess is you will points facing several players that slightly outrate you and one who is far below you than five players who are slightly below you, even if the average is the same. (for example, 5 2620s vs 4 2640s and one 2540) Based on ratings alone, the Russians should have an advantage."
End Qoute.
And
Qoute:
"Come on, pj, who would you rather face, five players within 20 points of your rating, or four within 20 points of your rating and one 80 points below it?"
End Qoute.

As for 2 game match, consider such an example:
Probability of winning of weaker player is 19%, draw is 60%, win of stronger player is 21%.
They play two games. I guess the probability of weaker player overperforming is more than stronger player overperforming(bacause 1/2 is overperformance for the weaker player, and is the most likely result by a margin). Indeed you can build examples when the opposite is true, but this is not what we were arguing at the beginning.

*sigh*

What could have been an interesting (for me, anyway) debate about how chess performance results differ once you are playing people from a different skill level and what aspects of ratings are most useful in predicting results, instead turned into a meaningless discussion over whether I at any point said or meant that expected elo rating performance is based on any aspect of opposition rating, but its average (I believe that it's not, largely). Part of the fault lies with me--I wish I had phrased my original "thesis" more clearly to explain what I did not mean, but ultimately I have to give a big who cares to a prolonged debate over what something I wrote three days ago means. If you really think I meant something else and am now trying to cover it up, fine. I don't have much desire (or energy) left to argue otherwise. However, there is a difference between "expected elo rating performance" and "expected performance," because the two words in the latter combination have a definite meaning in the English language that has little to do with elo ratings and everything to do with what kind of results you expect from somebody and are based on a variety of factors, many of which have nothing to do with rating. For example, according to my calculations and FIDE ratings calculator, Kramnik's expected performance in Mexico would be a +2, but when I say that I expect Kramnik to perform well in Mexico, finishing with a +3 or so, it does not at all mean that that's what I think his expected elo rating performance is, such are simply my expectations of Kramnik based on the man's experience, understanding of chess and results against other 2700-plus players in the past year. So when I say that "based on ratings somebody has an advantage" or "somebody would be expected to lose points against certain type of competition" it does not necessarily refer to what somebody's performance should be based on the FIDE rating calculator.

SH, that's an interesting example of a situation where the average rating of opposition is not alone sufficient to figure out expected result (even expected elo rating performance result). Can you tell me more about what the actual numbers are that FIDE uses for its thresholds?

Yuriy - I think it's 400 points.

So Yuriy what you write is "best before" three days only?

Also,
After putting some opinion on performance, you face some arguments, get proven wrong practically in all points. First you were arguing actively with examples and so on, but in the end you say it was all meaningless.
I think it's somewhat childish.

lol@kleyner
cant face the music?
ur laughable arguments(!) were completely pulverized... now what? *sigh*?

Yuri,

Live and learn that most people who post here and hypothesize/argue ad-nauseum with you on insignificant points, in an attempt to feel or seem intelligent themselves, offer very little original thought whatsoever, insomuch as offering no basis for constructive critical thinking.

It would make more sense for you to 'tell' them the sky is not, in fact, blue, but *another* color that you cannot describe without using the term 'blue' so they can visually understand what you are talking about.

[Sigh]

Back to chess.

What I write does not have an expiration date. However, my desire to attempt to explain what I wrote does: roughly 3 days :) At which point I assume the person I am talking to either doesn't want to understand what I am saying or can't. In which case I let him have the satisfaction of thinking I've been "proven wrong practically in all points" (or is it "in practically all points"?) Even though they aren't my points at all, and SH is in fact arguing my side of the original disagreement in his most recent posts.

Looking back, though, my example of rounding up nearest score is not particularly good one when a gap is a 100 points. According to Sonas, in such a situation the predicted elo outcome in a 2 game match is 1.3-0.7, which means that while it still favors the 2600, (most likely outcome; 1.5-0.5) it doesn't quite fall into the vastly outranked category that I was talking about, which for a 2-game match would be 1.75-0.25 predicted outcome or greater.

...ahem... my arguments, *if read correctly* do not support any of YK's hypotheses in the least. The opposite.


SH, according to one side it was: "Average rating is average rating, regardless how it is formed." Your posts, *if I read them correctly* suggest that not even elo believe this to be the case and hence introduce the concept of a threshold.

I said this:
1. Expected scores are evaluated *per matchup*.
2. Averaging across players (which is what YK did) doesnt work because of thresholds.
3. *Within thresholds* you must assume expectations hold *unless empiricial evidence demonstrates otherwise*. And we do have that empirical evidence: Jeff Sonas', which contradict YK's original hypothesis.

At no time did I make the argument "average rating is average rating no matter what".

You didn't make that argument. Playjunior did, which is what got me started making examples of situations where that's not the case: 2-game match between two vastly differently rated opponents, two groups of players (2700 & 1200) vs (all 2400) whose composition is vastly different, but average is the same, etc.

I also have a question for you about thresholds, but I want to phrase it well, so I will type it up later.

So even after making examples and proven wrong on all of them doesn't convince you?
Is there a rational procedure of proving you something Yuriy? Does it need to be done in less than 3 days because afterwards you lose interest(in your own posts?).


"3. *Within thresholds* you must assume expectations hold *unless empiricial evidence demonstrates otherwise*."

Isn't it the other way around? The expectation model is true, because the empirical evidence supports it?

Do I understand correctly that thresholds only apply when dealing with a group? Is there a threshold for how low an opponent you can face and still have your result count in rating calculation?

playjunior, I think I understand YOUR standard for proving something. Once you say over and over again that something has been proven wrong, then it's been proven wrong. You will forgive me if I don't respond to your next post if it's in the same vein--other people's arguments while less "rational" are a bit more interesting.

FIDE seems to use a regression-line model which best fits *overall* data.

*Localized* irregularities can arise, as Sonas detected.

If I understand Sonas's paper correctly (and it is fairly complex) it's more than just localized irregularities--in fact there is a model which best fits overall data. His main idea seems to be that the ratings are spread too far apart--that is, a 2500 would overperform against a 2600, because the 2500 is really a 2520.

Twitter Updates

    Follow me on Twitter

     

    Archives

    About this Entry

    This page contains a single entry by Mig published on August 24, 2007 5:34 PM.

    NH Tournament 2007 was the previous entry in this blog.

    Youth Movement is the next entry in this blog.

    Find recent content on the main index or look in the archives to find all content.