Hello, and welcome to Paper of the Day (Po’D): Single-channel and multi-channel sinusoidal audio coding using compressed sensing Edition. Today’s paper is the first to present the first audio coding system using compressed sensing (CS): A. Griffin, T. Hirvonen, C. Tzagkarakis, A. Mouchtaris, and P. Tsakalides, “Single-channel and multi-channel sinusoidal audio coding using compressed sensing,” IEEE Trans. Acoustics, Speech, Lang. Process., vol. 19, pp. 1382-1395, July 2011. Also see its subreferences:
- A. Griffin and P. Tsakalides, “Compressed sensing of audio signals using multiple sensors,” in Proc. European Signal Process. Conf., (Lausanne, Switzerland), Aug. 2008.
- A. Griffin, T. Hirvonen, A. Mouchtaris, and P. Tsakalides, “Encoding the sinusoidal model of an audio signal using compressed sensing,” in Proc. IEEE Int. Conf. Multimedia Expo, (Cancun, Mexico), June 2009.
- A. Griffin, C. Tzagkarakis, T. Hirvonen, A. Mouchtaris, and P. Tsakalides, “Exploiting the sparsity of the sinusoidal model using compressed sensing for audio coding,” in Proc. SPARS’09, (St. Malo, France), Apr. 2009.
- A. Griffin, T. Hirvonen, A. Mouchtaris, and P. Tsakalides, “Multichannel audio coding using sinusoidal modelling and compressed sensing,” in Proc. European Signal Process. Conf., (Aalborg, Denmark), Aug. 2010.
Griffin et al. present the first CS coding system for
the stationary sinusoidal parts of sampled audio signals.
First, their system finds a sparse representation of a frame of multichannel audio (20 ms with 50\% overlap)
using a dictionary of windowed sinusoids and psychoacoustic adaptive matching pursuit, i.e., R. Heusdens and S. van de Par, “Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustic matching pursuits,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Orlando, FL), pp. 1809-1812, May 2002; S. van de Par, A. Kohlrausch, R. Heusdens, J. Jensen, and S. H. Jensen, “A perceptual model for sinusoidal audio coding based on spectral integration,” EURASIP J. Applied Signal Process., no. 9, pp. 1292-1304, 2005 (relevant Po’Ds here).
Then their system forms a whitened 25-sparse time-domain approximation,
and then senses the synthesized sinusoidal signal by a sensing matrix of
randomly-selected rows of the identity matrix.
By using a known sparsity, I see this as functioning the same as a
lowpass filter before uniform time-domain sampling prevents aliasing.
Their system then uniformly quantizes these compressive measurements,
and transmits them with side information for error correction and whitening.
In reconstruction, the system uses varied forms of SL0 (Po’D here),
and an error correction routine in case recovery is not successful.
The authors report that the listening tests show that reconstruction is often quite good.
This work by Griffin et al. presents several interesting questions for further investigation.
First, since the system whitens the signal component, this essentially
makes the amplitudes of the sparse representation distributed
with nearly constant amplitude.
Within the realm of CS,
such sparse signals have been known to be
the least favorable when it comes to recovery,
i.e., requiring more measurements for a given sparsity (see my technical report, for instance).
Even though the signals are exactly sparse,
SL0 does not perform very well for such signals (see again my technical report, for instance).
Instead, BP or AMP can provide much better recovery guarantees.
Otherwise, by retaining the dynamic range of the elements,
one might be able to use a low-complexity greedy recovery approach,
e.g., OMP, to provide a significant boost in
the likelihood of good recovery (see again my technical report, for instance).
Second, the benefits of applying CS in this context are unclear.
In the coding of a sparse representation (SR) of an audio signal (see the results in S. van de Par, A. Kohlrausch, R. Heusdens, J. Jensen, and S. H. Jensen, “A perceptual model for sinusoidal audio coding based on spectral integration,” EURASIP J. Applied Signal Process., no. 9, pp. 1292-13042, 2005.
which is essentially the first part of the system proposed by Griffin et al.,
we clearly see a significant benefit in coding the SR of the signal itself,
and with excellent results shown by listening tests —
even when the the residual error is not added back, as done in the tests by Griffin et al.
Thus, it is left to answer what benefits, if any,
CS provides to compressing and coding acoustic signals.
Or is it?
In V. K. Goyal, A. K. Fletcher, and S. Rangan, “Compressive sampling and lossy compression,” IEEE Signal Process. Mag., vol. 25, pp. 48-56, Mar. 2008, the authors definitively show that “compressive sampling combined with ordinary quantization is a bad compression technique” for audio signals. Indeed, compressive sampling of audio leads to extra lossy compression since the probability of recovery quickly goes to zero no matter the number of measurements taken.
It seems to me then that the CS within the audio codec presented by Griffin et al. is an unnecessary and dangerous component since one can just code the SR itself and achieve the same or better bit rates without teasing the demons of recovery.