I’m happy to be speaking at the 6th Swedish Workshops on Data Science at Umeå University, Nov. 20-21 2018.
Abstract: A “horse” is a system that is not actually addressing the problem it appears to be solving. The inspiration for the metaphor is the real-life example of Clever Hans, a horse that appeared to have great skill in mathematics but had actually learned to respond to a prosaic cue confounded with the correct answer. Similarly, a model created through the statistical treatment of a large dataset and wielded by a data scientist can also appear successful for solving a complex problem, but actually not be. In this talk, I take a critical look at past applications of data science — exemplifying contemporary practices — and identify where issues arise that affect the validity of conclusions. I argue that the onus is on the data scientist to not stop at describing how well a model performs on a given dataset (no matter how big it may be), but to go further and explain what they with their models are actually doing. I provide some examples of how researchers have identified and tamed “horses” in my research domain, music informatics.