NMF and Multiplicative Updates

Consider that, given the non-negative vector \(\vu \succeq 0\) and non-negative matrix \(\MPsi \succeq 0\), we want to build the model
$$
\vu \approx \MPsi \vx
$$
with the restriction of a non-negative solution \(\vx \succeq 0\).
Thus, we might want to solve
$$
\min_\vx \frac{1}{2} \|\vu – \MPsi \vx \|_2^2 \; \textrm{subject to} \; \vx \succeq 0
$$
using the Euclidean cost function,
or instead, with the generalized Kullback-Leibler divergence (KLD),
$$
\min_\vx \vu^T \log(\textrm{diag}(\MPsi\vx)^{-1}\vu) – \|\vu – \MPsi\vx\|_1 \; \textrm{subject to} \; \vx \succeq 0
$$
where \(\log\vs\) applies the logarithm to each component of \(\vs\).
Finding such solutions is a portion of non-negative matrix factorization (NMF).


To solve these problems, Lee and Seung (“Algorithms for non-negative matrix factorization,” in Proc. Advances in Neural Information Processing, 2000), propose multiplicative updates as an alternative to the slow additive approach of gradient descent, or Newton update methods.
They show that the problem posed with the Euclidean cost can be iteratively found by first initializing \(\vx\) to be random non-negative,
and then iterating
$$
\vx \leftarrow \vx.*\MPsi^T\vu./(\MPsi^T\hat\vu + \epsilon)
$$
where \(\hat\vu := \MPsi\vx\), we set \(\epsilon > 0\) small to avoid divide by zero,
and where we use the MATLAB notation for point-wise multiply and divide.
This update formula is easily found by taking the partial derivative of the cost function with respect to the solution weights, choosing specific learning rates, and then solving.
For the problem posed with the KLD cost, the solution can be iteratively found by first initializing \(\vx\) to be random non-negative,
and then iterating
$$
\vx \leftarrow \vx.*\MPsi^T(\vu./\hat\vu)./\MPsi^T\mathbf{1}.
$$
Lee and Seung prove using majorization in their paper that
the Euclidean distance and KLD are nonincreasing with these update rules (no matter whether \(\MPsi\) is skinny or fat),
and thus the solutions converge to at least a local minimum.

Now what if we desired a solution that had more sparsity than that already motivated by NMF?
Thus, we might want to solve
$$
\min_\vx \frac{1}{2} \|\vu – \MPsi \vx \|_2^2 + \lambda \|\vx\|_1 \; \textrm{subject to} \; \vx \succeq 0
$$
using the Euclidean cost function,
or instead, with the KLD,
$$
\min_\vx \vu^T \log(\vu./\MPsi\vx) – \|\vu – \MPsi\vx\|_1 + \lambda \|\vx\|_1 \; \textrm{subject to} \; \vx \succeq 0
$$
Cichocki et al. show (A. Cichocki, S.-i. Amari, R. Zdunek, R. Komass, G. Hori, and Z. He, “Extended SMART algorithms for non-negative matrix factorization”, Lecture notes in Artificial Intelligence, vol. 4029, pp. 548-562, 2006) that the multiplicative update rule for the Euclidean cost and sparsity constrained optimization problem is
$$
\vx \leftarrow \vx.*\max(\MPsi^T\vu – \lambda\mathbf{1}, \epsilon)./(\MPsi^T\hat\vu + \epsilon)
$$
where \(\epsilon \approx 10^{-16}\) (typically) is set such that divide by zero is avoided,
and that the solution remains positive.
Here \(\max(\vs,\epsilon)\) is applied to each element of the vector \(\vs\).
Cichocki et al. say that typically \(\lambda \in [0.01,0.5]\).
For the KLD, Cichocki et al. show that the update rule is
$$
\vx \leftarrow \vx.*\max( \MPsi^T(\vu./\hat\vu) – \lambda \mathbf{1}, \epsilon)./\MPsi^T\mathbf{1}
$$
Cichocki et al. say that typically \(\lambda \in [0.001,0.005]\).
Both of these are initialized with non-negative random \(\vx\).

I have adapted my code for testing compressive sensing recovery algorithms
to test these multiplicative updates.
Taking an ambient dimension \(N=400\) (number of atoms in dictionary, or length of \(\vx\)),
I test a range of problem indeterminaces \(m/N\) (where \(m\) is the number of measurements, or length of \(\vu\)), and sparsities \(s/m\) where \(s\) is the number of non-zero positive elements.
Each sparse vector is distributed Bernoulli (all non-zero elements are 1),
and the elements of each dictionary are sampled independently and identically from
a uniform distribution in \([0,1]\).
I normalize all columns to have unit \(\ell_1\) norm (not unit \(\ell_2\) norm.)
Then I test 10 signals for each sparsity and indeterminacy problem,
and find the proportion of signals recovered after no more than 100 multiplicative updates. (I exit the updates if \(\|\vu – \MPsi\vx\| < 10^{-5} \|\vu\|\).)
Before I test for success (exact support recovery, no more and no less)
I debias a solution by a least-squares projection onto the span of the at most \(\min(N,m)\) atoms with the largest magnitudes.
I add no noise to these simulations.
I just want to see what the differences are between the two costs.

Below we see that for the Euclidean cost and \(\lambda = 0\), the phase transition is quite low. When our system of equations becomes overdetermined, the multiplicative update always finds the correct solution, often with only a few iterations.

bernoulli_Fro_phasevsdistSupp.png
For the KLD and \(\lambda = 0\), on the other hand,
we are finding many more solutions to the underdetermined system than
with the Euclidean cost. Again, with an overdetermined system, we find the correct solution every time.

bernoulli_KLD_phasevsdistSupp.png
Now it is time to crank up the regularization parameter and see what happens.
With \(\lambda = 0.5\), the constrained problem involving the Euclidean cost performs much worse than before! That is quite strange to me.

bernoulli_Fro_phasevsdistSupp_l.png
And with \(\lambda = 0.005\), the constrained problem involving the KLD does not change that much.

bernoulli_KLD_phasevsdistSupp_l.png
So it appears that if your non-negative problem is overdetermined, and you wish to find a sparse solution, then multiplicative updates with either the Euclidean cost or KLD, provide a quick solution.
On the other hand, for underdetermined problems of particular sparsity and indeterminacy, multiplicative updates with the KLD can provide solutions.
And we needn’t even consider a regularization parameter!
But now I wonder about the speed of convergence for underdetermined problems.

Advertisements

2 thoughts on “NMF and Multiplicative Updates

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s