Horses in Umeå!

I’m happy to be speaking at the 6th Swedish Workshops on Data Science at Umeå University, Nov. 20-21 2018.

Title: Be a responsible data scientist: Identify and tame your “horses”

Abstract: A “horse” is a system that is not actually addressing the problem it appears to be solving. The inspiration for the metaphor is the real-life example of Clever Hans, a horse that appeared to have great skill in mathematics but had actually learned to respond to a prosaic cue confounded with the correct answer. Similarly, a model created through the statistical treatment of a large dataset and wielded by a data scientist can also appear successful for solving a complex problem, but  actually not be. In this talk, I take a critical look at past applications of data science — exemplifying contemporary practices — and identify where issues arise that affect the validity of conclusions. I argue that the onus is on the data scientist to not stop at describing how well a model performs on a given dataset (no matter how big it may be), but to go further and explain what they with their models are actually doing. I provide some examples of how researchers have identified and tamed “horses” in my research domain, music informatics.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s