On Wednesday, May 4, our department hosted Dr. Rolf Bardeli of the Competence Center NetMedia of Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), in St. Augustin, Germany. He gave a great talk here summarizing his impressive work in speech and audio processing over the past eight years. I initially became aware of his work when I reviewed his paper, “Similarity search in animal sound databases,” IEEE Trans. Multimedia, vol. 11, pp. 68-76, Jan. 2009. In his talk, Rolf discussed four main areas, all centering upon the issue of making searchable the content in massive amounts of data: the “controlled domains” of music search and retrieval, and content extraction and alignment of annotation of news broadcast archives, and the “wild domains” of bioacoustic archives, and humanistic archives of speech and language studies. The former are “in the wild” because for bioacoustics the recording conditions are difficult, and for aligning text to speech recordings of dying languages there is not enough data to build accurate models for speech recognition.
What I like about this work is that it takes a formalized approach to similarity search in terms of group theory. In only a few lines, it is not only clear what is meant by comparing two pieces of music, and the conditions under which they are thought to be similar, but also we find an efficient search strategy for large databases of documents.
This latter bit is extremely important for creating and deploying useful applications.
Rolf showed four such applications that are or soon will go public: searching animal sound archives, making the content of video news and other videos searchable, finding and comparing the content of speeches by particular politicians (an extremely useful tool for what the staff do manually at The Daily Show with Jon Stewart), and software for the automatic annotation of field study videos.
Rolf makes available his slides here.