Hello, and welcome to Papers of the Day (Po’D): Finding or Not Finding Rules in Time Series Edition. I found today’s papers by accident when I was searching for texts to use in my forthcoming class, “Artificial Intelligence Programming.” In our tubular library I found the once-opened book, “Advances in Econometrics, vol. 19 (2004): Applications of Artificial Intelligence in Finance and Economics.” I thought this would provide some keen examples, since kids these days are interested in making money. (I have written one lab for the students where they apply linear prediction to modeling stock prices.) In this book, however, I found the paper: J. Lin and E. Keogh, “Finding or Not Finding Rules in Time Series.” I was hooked from one of its first sentences: “In this work we make a surprising claim. Clustering of time series subsequences is meaningless!” What follows is what I call, “time series analysis cage fighting at its best.”

In the interest of extreme brevity, here is my one line description of the work in this paper, in my own words:

You want to know what’s meaningless? Clustering subsequences of time series, that’s what.

The basic idea of subsequence time series (STS) clustering is simple enough. We have some streaming time series (stock prices) that we wish to index, classify, or from which we want to predict, and discover rules. So we make an assumption on the duration of the phenomenon of interest, and slide a little window over the streaming time series to generate subsequences. To get some structure out of all these STS we will cluster them together hierarchically, or by k-means, with respect to some defined similarity measure, and an assumed number of clusters. Then we use the cluster centers, or the hierarchy, to discover rules and patterns, detect anomalies, index the series, or classify new STS. If anything sounds reasonable, that does. And it inspired work documented in dozens of papers if not a hundred claiming interesting results and what not. Now it turns out, as shown by the authors here, that STS clustering (whether by k-means or hierarchic or anything) will produce the same results independent of the input data. Hence their appropriately applied label “meaningless.”

Above I reproduce a graph from the paper, which looks like a pleasant place to live. The z-axis is in units of meaninglessness, with zero being very small meaningless to 1 being pretty darn meaningless. They quantify this by comparing the clusters between subsequent runs of the k-means algorithm on the real data, with those clusters found using STS of random data (in this case a random walk). The authors test over several numbers of clusters and subsequence sizes (100 tests for each), and from the results we see that STS clustering has a much higher level of meaninglessness than does whole clustering — which is where the subsequences do not overlap. This means that if someone pays you to do STS on each new dataset they measure, just keep using any random dataset. The authors show the same result for hierarchical clustering, and other datasets and algorithms. AND they do the same using two datasets that are extremely different from one another, after which they claim, “In our view, this experiment sounds the death knell for clustering of STS time series. If we cannot easily differentiate between the clusters of these two extremely different times series, then how could we possible find meaningful clusters in any data.”

So what happens when you take several different time series and find the best STS cluster center for each? The figure below tells the story: you get no differentiability between any of them! The authors go on to strike more blows by proving that the only time series for which STS clustering will produce meaningful results are stupid time series that no one really cares about: there must be k approximate repetitions of a length w pattern and the weighted mean of these patterns must sum to a horizontal line, and each of the k repetitions must be self similar in the same way (trivial matches). Not content with the knelling death knell of STS clustering, the authors then pick the most cited paper out from the STS clustering crowd and replicate its results using many realizations of random walk data instead of stock prices.

Other papers related to this work by the authors are the following.

- E. Keogh and J. Li., “Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research.” Knowledge and Information Systems (KAIS), Springer-Verlag, 2004.
- E. Keogh, J. Lin and W. Truppel, “Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research,” in Proc. IEEE Int. Conf. Data Mining, pp.115-122, Melbourne, FL., Nov. 2003.
- E. Keogh, J. Lin and W. Truppel, “(Not) Finding Rules in Time Series: A Surprising Result with Implications for Previous and Future Research,” in Proc. Int. Conf. Artificial Intelligence, Las Vegas, NV. June 2003.

Also see A Briefly Annotated Timeline of Similarity Search in Time Series for some of the work that these papers call into question.

Eamonn Keogh has before roundly criticized work in the field of time series similarity search (when it was nearly a decade old) for being biased, inconclusive, and in general distracting from progress: E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: a survey and empirical demonstration,” in KDD ’02: Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 102-111, New York, NY, USA, 2002.

Now, I wonder, what effect does transforming the subsequences have on these results? For instance, let’s take the Fourier transform of each subsequence, and then keep the magnitude of the first few coefficients as the vectors with which we cluster, a la Agrawal et al. 1993?

Nobody I know is ‘making’ money. Quite a few, however are interested in transferring money from OPM to their pots :-)

Igor.

LikeLike

Nice reference, especially the part about the clusters degrading into basis functions.

We can look at fixed-length subsequences as a specific case (with lag of one sample) of delay coordinates, a technique used to characterize nonlinear dynamics of time series:

http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.1/docs/chaospaper/node6.html

Anyway, if the embedding parameters are good and we have a sequence that has some kind of determinism, then we end up with some kind of trajectory in state-space. There may be some centers of mass… but the whole thing is connected, hence no natural clusters.

Being more selective on which subsequences are represented (the approach in the paper) seems a reasonable way to confront the problem.

LikeLike