Tuesday, August 23, 2011

Avoiding overflow problem in softmax and alike expressions


While underflow is a common problem in probabilistic machine learning structures like HMMs, overflow problems are common in neural network-like structures.
In the case of overflow in an expression of type:





it may happen that either f(i) or the denominator overflows.
A possible solution is to put the expression in a softmax style form like:






and use a trick for this expression, which says that you can rewrite the expression as:







where you can choose any K. With a correct K this will make sure that the exponentials don't blow up causing overflow.

Now we know this, all you have to notice is that we can write




and put it into your initial expression to have it in the softmax form and use the trick.

Still, we need to select K. Although I don't know about any sophisticated method, what works for me is to set K=max( m(i)) or similarly K= max( log(f(i))). This will make sure that the biggest m(i) does not overflow. Of course, this might cause smaller m(i)s to underflow, but their contribution would have been negligible anyway.

One place where this worked for me is while implementing a Discriminative RBM (DRBM) as in [1]. Where you can see that p(y|x) has the form mentioned here.

Wednesday, April 13, 2011

Notes on graphcuts

Graphcuts are the most popular technique in computer vision for doing inference in MRF's.
The most famous papers are:

Fast approximate energy minimization via graph cuts - Boykov, Y. and Veksler, O. and Zabih

and the follower

What energy functions can be minimized via graph cuts? - Kolmogorov, V. and Zabih, R.

Here I will coment on some points regarding the first of these papers.

A graphcut is a partition of a graph in to zones.
The minimal cut and max-flow graphcuts are equivalent. (Ford–Fulkerson)
CV people cast their problems into a graph and use graphcuts algorimths to solve their problems.

There are two popular algorithms, alpha-beta swap and alpha-expansions.
In a computer vision problem where we have binary pixel values (nodes), these algorithms can give exact results. If the edge cost is V=|fq -fp|, e.i, absolute value of the label difference (labels are numbers) then it is also possible.

There are approximate algorithms for semi-metric and metric edge costs.

Typical edge costs are:
Truncated L2 distance V(a,b)=min(K,||a-b||). <-- makes sense for continuous "labels", intensity values
Potts interaction penalty V= delta(fp != fq) <-- for class "labels" , when a distance between labels is hard to define.


Moves: operations on the labels of the graph.
alpha-beta swap := interchange some nodes labeled alpha are now labeled beta and all other nodes, that are not alpha nor beta stay with the same label.

Requires a semi-metric edge costs (Vs).

alpha-expansion := Some nodes on the graph go from some label to the alpha label and all alpha nodes labeled with alpha stay the same.

Requires a metric con the costs.
If a minimum is found with respect to all possible alpha-expansions, the exact best solution is at most a constact factor away from the current best solution (so we know how far we can be from the best solution).

The trick in both, alpha-beta swaps and alpha-expansions is to build a graph Gab or Ga whose minimum cut corresponds to an the best alpha-beta swap or alpha-expansion.