I’m pleased to advertise my new PhD course (7.5 credits) called, “Critical Perspectives on Data Science and Machine Learning”, which will be offered by Electrical Engineering and Computer Science at KTH this winter (Nov–Dec 2019). This course prepares students for critical reflection upon developments in the disciplines of data science and machine learning, within both the commercial and academic spheres. The course can be taken by PhD students with sufficient experience in statistics, data science, and/or machine learning and artificial intelligence.
Upon successful completion of this course, the student will be able to:
- describe and explain problems and pitfalls when interpreting standard experiments performed in these disciplines
- interpret existing work based on fundamental principles (e.g., no free lunch, bias-variance tradeoff, information theory, etc.)
- identify weaknesses and limitations of an existing work, and assess the claims made from the evidence presented
- analyse the reproducibility and replicability of an existing work, and propose improvements
- think broadly about the ethical implications of specific applications of machine learning and data science.
The main content of the course is through the presentation of a series of articles (new and “classic”) that reflect upon research in data science and machine learning, and related disciplines, e.g., applied statistics. Student groups will select and present papers, and help lead discussion about the topic.
The scheduled topics are roughly the following, with several example articles:
- Introduction, e.g., a review of the aims and fundamental theory of machine learning (generalisation, supervised/unsupervised, etc.) and data science (inferring patterns, etc.). Required reading: Chapters 1 and 2 in: T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2 ed., 2009; D. Donoho, “50 years of data science,” in Keynote of John W. Tukey 100th Birthday Celebration at Princeton University, 2015.
- Questions of Ethics, e.g., what is good and bad, and how do we know, etc.? Key point: These technologies are not ethically neutral. Readings: A. Holzapfel, B. L. Sturm, and M. Coeckelbergh, “Ethical dimensions of music information retrieval technology,” Trans. Int. Soc. Music Information Retrieval, vol. 1, no. 1, pp. 44–55, 2018; J. Bryson and A. Winfield, “Standardizing ethical design for artificial intelligence and autonomous systems,” Computer, vol. 50, pp. 116–119, May 2017; D. J. Hand, “Aspects of data ethics in a changing world: Where are we now?,” Big Data, vol. 6, no. 3, pp. 176–190, 2018.
- Questions of Performance, e.g., how does one measure success, learning, etc.? Key point: These measures may not be as objective, relevant, reliable, or meaningful as they first appear. Readings: F. Provost, T. Fawcett, and R. Kohavi, “The case against accuracy estimation for comparing induction algorithms,” in Proc. Int. Conf. Machine Learning, pp. 43–48, 1998; E. Law, “The problem of accuracy as an evaluation criterion,” in Proc. Int. Conf. Machine Learning: Workshop on Evaluation Methods for Machine Learning, 2008; B. L. Sturm, “Classification accuracy is not enough: On the evaluation of music genre recognition systems,” J. Intell. Info. Systems, vol. 41, no. 3, pp. 371–406, 2013; F. M.-Plumed, R. B. C. Prudˆencio, A. M.-Us ́o, and J. H.-Orallo, “Making sense of item response theory in machine learning,” in Proc. ECAI, 2016.
- Questions of Learning, e.g., what is it learning? When is that important? Key point: “Learning” is a suitcase word that must be unpacked. Readings: O. Pfungst, Clever Hans (The horse of Mr. Von Osten): A contribution to experimental animal and human psychology. New York: Henry Holt, 1911; D. J. Hand, “Classifier technology and the illusion of progress,” Statistical Science, vol. 21, no. 1, pp. 1–15, 2006; R. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, vol. 11, pp. 63–91, 1993; E. Keogh and J. Lin, “Clustering of time series subsequences is meaningless: Implications for past and future research,” in Knowledge and Information Systems, Springer-Verlag, 2004; E. R. Dougherty and L. A. Dalton, “Scientific knowledge is possible with small-sample classification,” EURASIP J. Bioinformatics and Systems Biology, vol. 2013:10, 2013.
- Questions of Data, e.g., what problems can arise, bias, how to collect data, etc.? Key point: Data collection has major impacts that limit conclusions. Readings: S. Tolan, “Fair and unbiased algorithmic decision making: Current state and future challenges,” JRC Technical Reports, JRC Digital Economy Working Paper 2018-10, vol. arxiv:1901.04730, 2018; T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai, “Man is to computer programmer as woman is to homemaker? debiasing word embeddings,” in NeurIPS, pp. 4356–4364, 2016; B. L. Sturm, “The state of the art ten years after a state of the art: Future research in music information retrieval,” J. New Music Research, vol. 43, no. 2, pp. 147–172, 2014; M. J. Eugster, F. Leisch, and C. Strobl, “(Psycho-)analysis of benchmark experiments: A formal frame- work for investigating the relationship between data sets and learning algorithms,” Computational Statistics & Data Analysis, vol. 71, no. 0, pp. 986 – 1000, 2014.
- Questions of Statistics, e.g., there’s a zoo of statistical tests we can use. Key point: Do not reach for something because it is convenient. Readings: A. W. Kimball, “Errors of the third kind in statistical consulting,” J. American Statistical Assoc., vol. 52, pp. 133–142, June 1957; D. J. Hand, “Deconstructing statistical questions,” J. Royal Statist. Soc. A (Statistics in Society), vol. 157, no. 3, pp. 317–356, 1994; C. Drummond and N. Japkowicz, “Warning: Statistical benchmarking is addictive. kicking the habit in machine learning,” J. Experimental Theoretical Artificial Intell., vol. 22, pp. 67–80, 2010; S. Goodman, “A dirty dozen: Twelve p-value misconceptions,” Seminars in Hematology, vol. 45, pp. 135– 140, 2008.
- Questions of Experimental Design, e.g., looking at the design of machine learning experiments. Key point: The design is an essential component for making valid conclusions, and it takes a lot of thought and effort to do it well. Readings: E. Alpaydin, Introduction to Machine Learning, ch. Design and Analysis of Machine Learning Experi- ments, pp. 475–515. MIT Press, 2010; A. Chase, “Music discriminations by carp “Cyprinus carpio”,” Animal Learning & Behavior, vol. 29, no. 4, pp. 336–353, 2001; C. Dwork, V. Feldman, M. Hardt, T. Pitassi, O. Reingold, and A. Roth, “The reusable holdout: Preserving validity in adaptive data analysis,” Science, vol. 349, no. 6248, pp. 636–638, 2015; T. Hothorn, F. Leisch, A. Zeileis, and K. Hornik, “The design and analysis of benchmark experiments,” Journal of Computational and Graphical Statistics, vol. 14, no. 3, pp. 675–699, 2005.
- Questions of Sanity, e.g., unreasonable sensitivity to irrelevant changes to an input. Key point: Do not be persuaded that good performance means sane models. Readings: C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. ICLR, 2014; B. L. Sturm, “A simple method to determine if a music information retrieval system is a “horse”,” IEEE Trans. Multimedia, vol. 16, no. 6, pp. 1636–1644, 2014; I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. ICLR, 2015; A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in Proc. CVPR, pp. 427–436, 2015.
- Questions of Sabotage, e.g., how machine learning research can eat itself, dirty data activism. Key point: Your carefully constructed algorithms present opportunities for unintended uses, exploitation, etc. Readings: N. Collins, “Composing to subvert content retrieval engines,” ARRAY ICMA Online Journal, 2007; N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” KDD, pp. 99– 108, 2004; J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” arXiv, vol. 1710.08864, 2017; T. B. Brown, D. Man ́e, A. Roy, M. Abadi, and J. Gilmer, “Adversarial patch,” arXiv, vol. 1712.09665, 2017.
- Questions of Interpretability, e.g., how do we understand why a system behaves the way it does, etc.? Key point: Drawing a line from cause and effect in these models can be extremely difficult, but is important/necessary in many applications. Readings: B. Kim, R. Khanna, and O. O. Koyejo, “Examples are not enough, learn to criticize! criticism for interpretability,” in Proc. NIPS, 2016; Z. Lipton, “The mythos of model interpretability,” in Proc. ICML Workshop on Human Interpretability in Machine Learning, 2016; M. T. Ribeiro, S. Singh, and C. Guestrin, “”Why should I trust you?”: Explaining the predictions of any classifier,” in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016; L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal, “Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning,” ArXiv e-prints, May 2018; B. L. Sturm, “What do these 5,599,881 parameters mean? an analysis of a specific lstm music tran- scription model, starting with the 70,281 parameters of its softmax layer,” in Proc. Music Metacreation workshop of ICCC, 2018.
- Questions of Methodology, e.g., how do we do this kind of research., etc.? Key point: The health of the discipline is the responsibility of its researchers. Readings: E. R. Dougherty, “On the epistemological crisis in genomics,” Current Genomics, vol. 9, no. 2, pp. 69–79, 2008; C. Drummond, “Making evaluation robust but robust to what?,” tech. rep., AAAI Press Technical Report WS-07-05, 2007; C. Drummond, “Finding a balance between anarchy and orthodoxy,” in Proc. Int. Conf. Machine Learn- ing: Workshop on Evaluation Methods for Machine Learning III, 2008; Z. C. Lipton and J. Steinhardt, “Troubling trends in machine learning scholarship,” in Proc. ICML, 2018; N. Lavesson and P. Davidsson, “Approve: Application-oriented validation and evaluation of supervised learners,” in Intelligent Systems (IS), 2010 5th IEEE International Conference, pp. 150–155, July 2010; J. Lin and E. Keogh, “Finding or not finding rules in time series,” in Applications of Artificial Intelligence in Finance and Economics (T. B. Fomby and R. C. Hill, eds.), vol. 19 of Advances in Econometrics, pp. 175–201, Emerald Group Publishing Limited, 2004;
- Questions of Application, e.g., (mis)applications of data science and machine learning. Key point: Beware the hype, beware the dangers. Readings: M. Fern ́andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” Journal of Machine Learning Research, vol. 15, pp. 3133–3181, 2014; B. L. Sturm, O. Ben-Tal, U. Monaghan, N. Collins, D. Herremans, E. Chew, G. Hadjeres, E. Deruty, and F. Pachet, “Machine learning research that matters for music creation: A case study,” J. New Music Research, vol. 48, no. 1, pp. 36–55, 2018; Z. Wallmark, “Big data and musicology: New methods, new questions,” tech. rep., American Musicolog- ical Society National Meeting, Pittsburgh, PA, Nov. 2013; K. L. Wagstaff, “Machine learning that matters,” in Proc. Int. Conf. Machine Learning, pp. 529–536, 2012.