Replicated Focusing Belief Propagation

The Replicated Focusing Belief Propagation (rFBP) is an entropy-maximization based algorithm operating as the underlying learning rule of feed-forward binary neural networks. Here, the entropy is defined as:

${S(\vec{w},\beta,\gamma) = \frac{1}{N} log \bigg ( \sum_{\vec{w}'} e^{-\beta E(\vec{w}')} e^{\gamma \vec{w}' \cdot \vec{w}} \bigg)}$ ,

where ${\vec{w}}$ is the whole weights set, N is its size and ${\beta}$ is the standard Boltzmann term. Further, ${E(\vec{w})}$ is the energy of such weights set, which is equal to the number of wrong predictions on the training set produced by ${\vec{w}}$ . When ${\beta \to inf}$ , only those ${\vec{w}'}$ with null energy, i.e. perfect solutions, sum up in the entropy. At the time being, the rFBP only works with ${\beta \to inf}$ .

The realization of the rFBP is equivalent to evolve a spin-glass system composed of several interacting replicas with the Belief Propagation algorithm. The density distribution of the system is modelled by:

${P(\vec{w} | \beta,y,\gamma) = \frac{e^{yN S(\vec{w},\beta,\gamma)}}{Z(\beta,y,\gamma)}}$ ,

where Z is the partition function.

Such spin-glass model depends necessarily on two parameters: y and ${\gamma}$ . The former is a temperature-alike related variable, similar to the one usually exploited by Gradient Descend approaches, but it can be also interpreted as the number of interacting replicas of the system. The latter is the penalization term associated to the distance between two weights sets. Indeed, the term ${e^{\gamma \vec{w}' \cdot \vec{w}}}$ in the entropy is larger, when ${\vec{w}}$ and ${\vec{w}'}$ are closer.

The Belief Propagation algorithm needs to be to adjusted by adding incoming extra messages for all weights, in order to involve the interacting replicas of the system. This extra term is represented by:

${\hat{m}^{t_1}_{\star \to \w_i} = tanh \big[ (y-1) artanh ( m^{t_0}_{\w_i \to \star} tanh \gamma ) \big] tanh \gamma}$ ,

where ${\w_i}$ and ${\star}$ stand respectively for the i-th weight and a representation of all i-th replicas.

The rFBP is therefore an adjusted Belief Propagation algorithm, whose general procedure can be summarized as follows:

set ${\beta \to inf}$
select protocols for y and ${\gamma}$ ;
set first values of y and ${\gamma}$ and run the adjusted-BP method until convergence ( ${ < \epsilon}$ ) or up to a limited-number of iterations;
step to the next pair values of y and ${\gamma}$ with respect to the chosen protocols and re-run the adjusted-BP method;
keep it going until a solution is reached or protocols end.

The rFBP algorithm focuses the replicated system to fall step by step into weights sets extremely closed to many perfect solutions ( ${\vec{w}}$ such that ${E(\vec{w})=0}$ ), which ables them to well generalize out of the training set [Baldassi101103].