# Maybe the fourth time will be the charm

If you are coming from Nuit Blanche, or even if you aren’t, I want to make clear that my aims with this post are not to complain about peer review, or to claim I have been wronged, or to poke at what I really believe is good work from nine anonymous and competent reviewers. Peer review is an awesome privilege; and I try to take it as seriously as I can when I review. With the ICASSP deadline more than a few months away, I have time to put more thought into the next revision, and to address all the comments that have been raised.

Just to be clear on my overall intentions, I use my blog as a hypertext record of my research and ideas, a public account to the Danish tax payer, and a personal experiment in presenting in near real-time how my research unfolds. (When I collaborate with others on work though, I do not discuss it here unless we all agree it is fine to do… which is why it has been quiet here for a while.) Research is rough, and yet so much fun. It is not as clean as the final journal paper appears — which took me time to appreciate during my PhD.

What follows is a synopsis of the submission and review process of particular work of mine I have been trying to publish for over a year. Since I will submit it again, I am reviewing all of the helpful comments by the nine reviewers so that I make sure the fourth time is the charm!

First, in March 31, 2011, I submitted it to IEEE Signal Processing Letters. My submission was a summary of the outcomes of several experiment I performed in which I measure signal recovery performance by eight different algorithms from compressive measurements of sparse vectors distributed in seven different ways. I only considered noiseless measurements, and all my experiments were run with an ambient dimension of 400. I tested a range of problem sparsities and indeterminaces, and looked at transitions from high to low probability of recovery. The “big” result, I assumed, is that things change dramatically based on how the sparse vector is distributed. All of a sudden, my world was changed when I saw error constrained $$\ell_1$$ minimization is sometimes not the best. It can be significantly beaten by something as simple as a greedy approach. (I also proposed an ensemble approach, where the best solution is selected from those produced by several algorithms. I wanted to see what how much better all the results could be.) On June 4, 2011, my submission was rejected.

In general, the reviews were good:

Reviewer 1:

Recommendation: R – Reject (Paper Is Not Of Sufficient Quality Or Novelty To Be Published In This Transactions)

It is a really nice paper. However, I suggest the author submits this
paper to IEEE Transactions on Image Processing as a Correspondense.
Because of the page limits of the Signal Processing Letters, the author
has removed most of the simulation results from the paper. For example
there are no figures regarding the TST or the thresholding algorithms
and the author just claims several things about those algorithms and the
reader is not able to evaluate the correctness of these statements. So,
again I suggest the author submits a longer version of this paper to IEEE Trans. on Sig. Proc.

Reviewer 2:

Recommendation: R – Reject (Paper Is Not Of Sufficient Quality Or Novelty To Be Published In This Transactions)

The author considers the sparse linear inverse problem under the prism
of a class of sparse approximation algorithms. The aim of the paper is
to study how the signal’s underlying distribution affects the
performance of each algorithm in this class; in particular, the author
briefly describes 7 sparse recovery algorithms, well-known and
well-studied in the literature. The author provides some experimental
results along with some comments on the behaviour of two algorithms, OMP
and BP.

Motivation:
According to the paper, the motivation of this work is to better
understand how prior information of the signal can help the recovery
procedure by picking the most robust algorithm from an “ensemble” of
methods.

1. Novelty: The contribution of the paper can be considered relatively
small. The author, based on well-defined algorithms, compare the
recovery performance of various signal distributions only
experimentally. The comments that accompany the experimental results are
obvious from the figures and no further analysis on the subject is
provided.

2. According to the text, the author describes 7 different algorithms
but gives experimental results for only two of them: OMP and BP. To
support the premise that different algorithms have varying performance
depending on the distribution of the signal, a more complete set of
figures and data-tables should have been provided.

3. A “multi-model” system is quite obvious that outperforms any of the
algorithms under consideration alone, thus this idea cannot be
considered as novelty. In addition, as the author states, this model has
an huge computational overhead since a series of (potentially
computationally expensive) algorithms are employed for the recovery of a
single signal.

Conclusion:
As a conclusion, this paper considers the well-known observation that
prior signal information affects the performance of the recovery
algorithm used; but, apart from giving some illustrative examples, it
does not provide any analysis about the reasons beneath this behavior.
This work is slightly novel and needs more experimental data and a more
elaborate analysis of the results.

I completely agreed with all these reviewer comments (and now that I
look at them again, I agree even more). However, the dependence between signal distribution and recovery algorithm performance does not appear so well-known to me. I still see many papers just testing a proposed algorithm with a single distribution. Maybe it is well-known among the experts; but certainly not new people coming into the field.

So, I set out to expand the
experimental work, and to add a small bit of analysis. (At least an
amount with which I was comfortable.) I rewrote the entire document,
expanded the number of algorithms I test to 15, included many figures,
and saw the paper grow to 31 pages. On July 15, 2011, I submitted this version
for consideration of a special issue in the EURASIP Journal on Advances
in Signal Processing devoted to compressed sensing. But on November 24,
2011, it was rejected.

In general, the reviews were not as good as before. I seemed to have derevised the paper. :/

Reviewer 1

The paper attempts to compare the signal recovery
performance of 15 existing techniques. The codes used in this paper are
the novelty of the work is very minor.

The presentation of the paper has also been quite poor, with unusual
English presentation style for a technical paper, and many grammatical
errors.

What is more important is that no explicit conclusions have been reached
after the comparisons. Some observations, such as “I do not know at
this time what causes this behavior”, are not acceptable. This makes the
comparisons less convincing throughout the paper. The author needs go
through these observations carefully to make sure the comparisons are
correctly done. Also, how does the choice of different parameters affect
the results.

Some figures, are difficult to read, for example, Figure 6.

Symbols are not always defined such as delta in equation (20), and lamda has been used twice with different meanings.

Reviewer 2

The author proposes a large numerical analysis of recovery of sparse vectors from random measurements. Fifteen
algorithms, seven distributions are tested and two ways to measure the
quality of the results are studied. As far as I know such a work has not
been done yet.

Beyond several weaknesses, the article suffers from a critical flaw.

The transition phase of Basis Pursuit for Gaussian or USE matrices is
theoretically known for years and studied especially by Donoho. It was
proved that there exists a limit ratio s/m depending on m/N ensuring
recovery of all (or most) sparse vectors. The existence of such a limit
ratio s/m depending on m/N ensuring recovery (more precisely it is an
asymptotic result) is not obvious but it is proved for BP. Author
generalizes this result to all other algorithms. There is no reason to
think that such a limit ratio is a function of m/N. It may depend on m
and N but not only on the ratio. Author makes the assumption is true and
since only one value of N is used there is no way to know if it is
correct or not. Even 2 or 3 values of N were tried, it would be not
sufficient to guarantee the correctness of this strong assumption. I
would be impressed if author can prove the correctness of this
assumption for the 14 other algorithms.

The fact that this question does not appear in the paper is surprising
and problematic. For this reason, I do not recommend this paper for
publication.

(I would be impressed too if I could prove the correctness of that assumption. :)

Reviewer 3

of different “compressed sensing” algorithms according to the
probabilistic distribution of the coefficients. Such a comparison is an
interesting work, but the actual version of the paper suffers of a lack
of precision in the presentation.

Major Compulsory Revisions

The author should write the math of the studied model more clearly. The
generated vectors are “true” sparse vectors. Then, entries of a vector
are distributed according various distributions. Consequently, the
resulting distribution is a mixture of a Bernoulli and a continuous pdf.
The author should clearly write this distribution. And then, it appears
more clearly that one of the studied distribution is the classic
Bernoulli-Gaussian model.

Once the model is written, choosing a distribution is at end choosing a
“prior” in a Bayesian setting. Even if such a point of view can be
discussed, a link with the Bayesian interpretation would be welcome.

The author compares experimentally how an algorithm is able to recover a
sparse vector, depending of its underlying distribution. Still from a
Bayesian point of view, it would be very interesting to compare these
results with a classical MCMC with the “true” prior, as reference. Then,
this would give an interesting answer of how the algorithms are able to
approximate the different prior in comparison with the Bayesian
reference. I know that demands a lot of work (and mainly computational
time. . . ) but the answer would be very interesting.

The author never explains why he choose all this family of distribution.
Some model are common in signal processing (such as the Bernoulli-
Gaussian model). Can the author gives a word on the different priors?

Description of some algorithm missed important information. For instance:

Iterative soft thresholding algorithm solves the Basis Pursuit Denoising
problem. And, for a sufficient small parameter $$t_k$$, it solves the Basis
Pursuit. This is never said. Consequently, ISTA can then be described in Section D Recovery by Convex Relaxation. Furthermore, the
choice of the parameters in ISTA is driven by the theory: we must choose
$$k < 2/\|\Phi\|^2$$ in order to ensure the convergence. This
condition is necessary (and I have myself observed the non convergence
if this condition is not satisfied).

AMP only works with random matrices. It is the case in this paper, but
someone looking for an algorithm in order to solve a general BP problem
can misunderstood this. The author must stress this condition.

In ALPS, the author says that $$\mu_k$$ is chosen according to FISTA.
However, FISTA gives this $$\mu_k$$ after a theoritical studied of the
speed of convergence. Then, there is no reason to chose the same μk in
this algorithm. I would prefer that the author uses the “true” FISTA,
and presents it as an acceleration of ISTA. Then FISTA and ISTA solves
the same BP(DN) problem, but FISTA is in practice much faster than ISTA.
The experiments made by the author can however show a difference of
robustness of this two algorithms.

I insist: (F)ISTA should appear in Section D, with GPSR.

Smoothed l0 is not at all convex, and then should not appears in Section D Recovery by convex relaxation.

Why the author choose to do “Debiasing” after convergence of the algorithm?

In addition, why performing hard thresholding after debiasing?

The paper is an experimental paper. The author has chosen to studied a
“true” sparse model. However, “true” sparse vectors do not really exist
in real application (even if they are “compressible”). Then I expect a
word from the author on how the results can be interpreted for real
application? From this point of view, the (Rl2) error seems more
appropriate, and additional figures with the evolution of this error
with respect to s/m could be informative (as Fig. 9 and 10 with the
probability of recovery).

The author should explain more precisely what is the empirical phase transition for people who are not familiar with.

Indeed, with most of these comments, I agree! I think I overextended myself by adding so many algorithms to the mix that I couldn’t fairly treat the ones that require some finesse. So I cut out much of the paper at the last minute to make the submission deadline of EUSIPCO 2012. I just wanted to keep the bare bones moral of this story: simulate your proposed CS recovery algorithm with several sparse vector distributions. I submitted this paper on March 6, 2012; and on May 29, 2012, I received a third rejection.

In general, the reviews were better than before. I seemed to have brought the paper back from its derevision.

Reviewer 1

I agree with the opening tenant of the author that CS recovery is often a function of the distribution of the coefficients and that much work does not provide an exhaustive comparison between methods.

This is old ground well known in the community. When discussing robustness to signal pdf there should be some mention of minimax type performance (which L1 roughly exhibits). Otherwise if looking at “tuned” algorithms then Bayes based techniques should have been considered (e.g. Bayes optimal AMP).

The results in the paper are not novel and provide very limited insight into the real issues of sparse recovery and compressed sensing.

Reviewer 2

The paper provides a detailed empirical evaluation of several CS algorithms and studies their performance under different non-zero-coefficient distributions.

I found figures 2 and 3 somewhat difficult to read, maybe zooming in on the y axis would help. For example, in fig2(a) I can’t see which curve is OMP and which is IHT. Maybe use different line styles, you should have 4 available and using these together with different grey levels should improve readability.

Reviewer 3

– This is an extensive study of the performance of CS reconstruction algorithms for various coefficient distributions.
– The gathered data are definitely useful for CS researchers or people interested in using CS for a specific application (although a noisy model would have been more relevant there).
– There is a good variety of distributions.
– The main interest of the paper is perhaps that it stresses how much assessing the performance of an algorithm on 1 or 2 distributions is insufficient. Unfortunately, this is not uncommon in the CS literature.
– The paper is generally well-organised and well-written.

– It would have been interesting to see some results in larger dimensions. N=400 is extremely small compared to e.g. applications in image processing. I understand that the computational cost could be prohibitive though, especially with iid Gaussian matrices and non-optimised implementations of the algorithms (CVX for l1 seems a bit of a performance killer).
– typo in 2.2: The uniform bimodal distribution is apparently in [-4,-3]U[3, 4]
– For following work (in the conclusion), it could be interesting to also consider “compressible” distributions (e.g. generalized Gaussian distributions with appropriate parameter values) instead of only sparse signals.

– It is very difficult to get any insight from the results or generalize to other distributions.
– There are few conclusions beyond “it (sometimes significantly) depends on the distribution” (and perhaps “forget about ROMP”).
– These results may only be useful for people doing research on CS algorithms.
– For CS applications, the performance may be very different since signals are rarely exactly sparse and measurements usually contain some noise. This could be stressed in the paper, as people trying to apply CS might be tempted to pick the algorithm that performs the best in this paper and be sorely disappointed…

Reviewer 4

The author empirically compares 11 CS recovery algorithms as a function of signal distributions. The conclusion is that a recovery algorithm should be carefully chosen to match the distribution of the underlying sparse signals.

The theme of this paper is a numerical comparison of different algorihtms. I don’t see much novelty. The conclusion is not significant in the sense that it has been a common sense in the literature.

So, what to do? Talking with several practitioners about the difficulty I have had in publishing this work, they have all acknowledged that the fact distribution affects recovery results is a “folk knowledge;” but is important to make known to new people.

Thus, I am writing an ICASSP paper discussing “cautionary tales” of testing compressed sensing recovery. For instance, I will show experimentally that three things can have a significant affect on the results of a simulation: 1) the criterion for exact recovery; 2) the distribution of the sensed sparse signal; 3) the stopping conditions of algorithms. These might be common sense, but I still see a lot of work that does not properly take them all into account, or worse, explicitly state the criteria used. I am thinking this would make the paper more acceptable, and useful in general.

## 3 thoughts on “Maybe the fourth time will be the charm”

1. Anonymous says:

Poor boy, I didn’t read your paper, but the following one might be interesting for you!
“Compressible distributions for high-dimensional statistics” by Gribonval, et. al.

Like

2. Thank you anonymous commenter! I am aware of that interesting work, and just heard Davies talk about it in London. It is a nice contradition that Laplace distributed signals are not really compressible signals. Anyhow, I discussed my empirical work while visiting one of the authors last year, and he encouraged me to refine and submit it again … which I will continue to do. Just have to find the right slant. :)

Like

3. Hi, Bob,
I feel sorry for your papers. I do like these papers and learn a lot from them. I hope you are lucky in your future submission.
Below are my several concerns:
1. A reviewer said that your claimed some points but didn’t show the figures. How about presenting these missed figures in complementary material? Many journals allow authors to submit complementary material, even some conferences.
2. The reviewers of my recent two papers asked me to give some experiments on real-world data. And in recent conferences, when I talked with people in the field, I strongly feel that people start to realize/doubt how good some famous compressed sensing algorithms are in real-world applications. In this sense, I feel maybe you want to add some experiments on real-world datasets (at least in case your future reviewers ask so). But if you use images, I do not suggest to use the images in Matlab or the simple medical images used in the compressed sensing literature. Images with natural views (such as those used in sparse coding) are more representative for image compression (they are not sparse enough in transformed domains, which is consistent with what a reviewer commented, and may satisfy the reviewer).
3. For the figures, I suggest you use color lines with different line styles, instead of gray/black lines. This makes the figures more easy to see.
4. According to your goal in this study, is it needed to consider some Bayesian algorithms?

Like