Cook’s consensus: standing on its last legs
A bird reserve hires a fresh enthusiast and puts him to do a census. The amateur knows there are 3 kinds of birds in the park. He accompanies an experienced watcher. The watcher counts 6 magpies, 4 ravens and 2 starlings. The new hire gets 6 magpies, 3 ravens and 3 starlings. Great job, right?
No, and here’s how. The new person was not good at identification. He mistook every bird for everything else. He got his total the same as the expert but by chance.
If one looks just at aggregates, one can be fooled into thinking the agreement between birders to be an impressive 92%. In truth, the match is abysmal: 25%. Interestingly this won’t come out unless the raw data is examined.
Suppose, that instead of three kinds of birds there were seven, and that there are a thousand of them instead of twelve. This is the exact situation with the Cook consensus paper.
The Cook paper attempts validation by comparing own ratings with ratings from papers’ authors (see table 4 in paper). In characteristic fashion Cook’s group report only that authors found the same 97% as they did. Except this agreement is solely of the totals – an entirely meaningless figure
Turn back to the bird example. The new person is sufficiently wrong (in 9 of 12 instances) that one cannot be sure even the matches with the expert (3 of 12) aren’t by chance. You can get all birds wrong and yet match 100% with the expert. The per-observation concordance rate is what determines validity.