I posed this puzzle a few days ago, partly as an exercise in experimental design.
What is different between the two test conditions?
One thing that is different is that the bottle shape is visible in the first condition, but not visible in the second. Even though we carefully covered the label to conceal its identity, we were naive to the fact that the bottle shape can carry information about the region from which the wine comes.
The following picture shows the variety of shapes.
From the left, bottles of this shape usually come from Bordeaux; the next bottle shape from Burgundy; the next from Rhône; the next from Champagne; and the next from Côtes de Provence. In our dataset then, the bottle shape is confounded with its region. We thus inadvertently trained our experts not to recognize the region from which the wine comes based on the contents of the bottles, but instead on the shape of its container. Hence, when Team Småg can no longer see the bottle, they must guess.
I came up with this example to show how easy it can be to believe that one’s learning experiment is valid, and that the figure of merit reflects real-world performance, but that is actually invalid due to an independent variable of which the experimenter is unaware. This fact is directly discussed as limitations by Ava Chase in her article, “Music descriminations by carp (Cyprinus carpio)“:
Even a convincing demonstration of categorization can fail to identify the stimulus features that exert control at any given time, especially if the stimuli are complex. In particular, there can be uncertainty as to whether classification behavior had been under the stimulus control of the features in terms of which the experimenter had defined the categories or whether the subjects had discovered an effective discriminant of which the experimenter was unaware. The diversity of S+/S- pairings presumably rules out the possibility that the fish could have been relying on only a single discriminant, such as timbre, but a constant concern is the possible existence of a simple attribute that would have allowed the subjects merely to discriminate instead of categorizing. (emphasis mine)