Here is a story I rather like to tell in front of a cozy fire place that shows that when the disciplines of computer science and signal processing meet, the interaction is not always complete.

In the paper, R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” in Proc. Int. Conf. Foundations Data Org. Algo., (Chicago, IL), pp. 69-84, Oct. 1993, the authors propose a method of

similarity search in time series of the same size (“whole matching” instead of “subsequence matching”).

They use the discrete Fourier transform (DFT) to transform each time series,

and approximate its shape by keeping the first 3-5 complex DFT coefficients,

which they justify by the observation that many real world signals have power spectral densities that drop off like \(f^{-\alpha}\), with \(\alpha > 1\).

They define the similarity between two time series as the Euclidean distance between their frequency-domain features.

And the Parseval equality guarantees that by comparing subsets of elements of the DFT instead of the entire transform, we can only make two time-series appear more similar, not less. (Thus, no false dismissals.)

This approach reduces the amount of computation in comparing signals,

and turns “exact” search to “similarity” search by looking at shapes instead of details.

Then, four years later, the paper, D. Rafiei and A. Mendelzon, “Efficient retrieval of similar time sequences using DFT,” in Proc. Int. Conf. Found. Data Org., (Kobe, Japan), pp. 249-257, Nov. 1998, the authors show:

“… an improvement of the known DFT-based indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. … We show analytically that this observation can accelerate the search time of the index by more than a factor of two.”

Well, that bit about the conjugate symmetric transform is nearly right. By assuming the time-series are all real valued, then we don’t have to search over the larger domain of analytic signals. It is amazing that it took about five years to come to that discovery; but hey, maybe in 1993 the DFT was still very exotic and not generally well-understood.

### Like this:

Like Loading...

*Related*