Greedy Methods, Dynamic Range, and Coherence, pt. 2

From my post yesterday I have received several helpful comments:

  1. Igor at Nuit Blanche provided some analysis (Thanks Igor!) of my sampling setup with respect to the Donoho-Tanner phase transition. We see from the image below that with an “oversampling ratio” of \(\delta = 50/250 = 0.2\) (number of measurements over dictionary size), and an “undersampling ratio” of \(\rho = 13/50 = 0.26\) (sparsity over number of measurements), recovery by BP will not happen in most cases. (Under the solid blue curve live sparse vectors that can be recovered by BP from random projections onto sensing matrices that satisfy some restricted isometry property… I think.)

  2. A colleague has highlighted the connection between this phase transition and the “thresholding effect” observed in estimation in parametric models, i.e., the complete breakdown of an estimator when some assumption in a model is slightly broken.
  3. A commenter (Alejandro from the Colorado School of Mines — hey Alejandro, I will be visiting my folks in Evergreen next week. Maybe we can get a cup of coffee in Golden?) has suggested that OMP can in fact correct itself through its reprojection.
    This is something that I must explore since I am astonished by it.


Yesterday, I also promised some figures showing the distribution of successful and failed sparse signal recovery by OMP.
From Phil’s comments about dynamic range, and Tropp’s work on coherence on recovery success using OMP, I wanted to see how these two variables are related for failed and successful recoveries depending on the distributions of the sparse vectors.
To quantify “dynamic range” of a sparse vector, I take and sort its non-zero magnitude values from high to low. Then I divide by the maximum value so it begins with 1.
Then I find the slope of the best line in a least squares sense, which I equate with the dynamic range. Thus, a sparse vector with consistent large differences between its non-zero elements will have a large dynamic range.

To quantify the “confusion coherence” (as I call it), I take from Tropp and compute:
$$\max_{\vd \in \MD\backslash\MH(m)} \left | \left| [\MH^H(m)\MH(m)]^{-1}\MH^H(m) \vd \right |\right|_1$$
where \(\MH(m)\) are the actual atoms present in the measurement, and \(\MD\backslash\MH(m)\) are the rest of the atoms in the dictionary.
(Actually, I need to make sure that \(\MH(m)\) is always full rank. Done.)
When the confusion coherence is small, then the atoms present in the measurement are quite distinguishable from the rest of the atoms; and vice versa when it is large.
For each of the 13-sparse vectors of length 250 dimensions, and measurements of 50 dimensions created using some iid standard normal sensing matrix with unit norm columns, I plot these joint values for each successful and unsuccessful recovery by OMP.

Below we see the relationships for sparse vectors with non-zero elements distributed standard Normal. OMP is quite successful here, but what do you know! I can’t see any distinction.

StdNorm.jpg
Now, below are the results where the sparse vector is distributed as a sum of two unit-variance Gaussians, one with mean +3, one with mean -3. OMP is not so successful in this scenario, with near 30% recovery rate; but what do you know! I can’t see any distinction.

Norm+3.jpg

Finally, below are the results where the sparse vector is distributed as uniform over \([−2, −1] \cup [1, 2]\). Here OMP is bordering on 10% success; but what do you know!!!! I can’t see any fracking distinction.

Unif12.jpg
Why why why? I have tried many permutations of the “confusion coherence”, and still I cannot get any good separation between the two clods. Does coherence not play any role in the failings or successes of OMP?
This lack of separation is so excruciating that today I am introducing the “coherence confusion adjustment” defined as
$$\Xi := \begin{cases}
-1, & \textrm{OMP success} \\
+1, & \textrm{OMP fails}
\end{cases}$$
and thus I redefine the “confusion coherence” as
$$\Xi + \max_{\vd \in \MD\backslash\MH(m)} \left | \left| [\MH^H(m)\MH(m)]^{-1}\MH^H(m) \vd \right |\right|_1.$$
Norm+3joke.jpg
YES! That makes me feel much better. I call this technique, “informed sparse recovery.”

Advertisements

One thought on “Greedy Methods, Dynamic Range, and Coherence, pt. 2

  1. Bob,
    FWIW, wrt the threshold alluded to, you and your colleague might be interested in Donoho and Tanner’s paper
    Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing
    or
    # Breakdown Point of Model Selection when the Number of Variables Exceeds the Number of Observations, Victoria Stodden, David L. Donoho
    # Model Selection with Many More Variables than Observations, presentation by Victoria Stodden
    featured in: https://sites.google.com/site/igorcarron2/donohotannerphasetransition
    Igor.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s