Algorithm Optimization

The rFBP algorithm is a learning algorithm developed to justify the learning process of a binary neural network framework. The model is based on a spin-glass distribution of neurons put on a fully connected neural network architecture. In this way each neuron is identified by a spin and so only binary weights (-1 and 1) can be assumed by each entry. The learning rule which controls the weight updates is given by the Belief Propagation method.

A first implementation of the algorithm was proposed in the original paper [BaldassiE7655] jointly with an open-source Github repository. The original version of the code was written in Julia language and despite it is a quite efficient implementation the Julia programming language stays on difficult and far from many users. To broaden the scope and use of the method, a C++ implementation was developed with a joint Cython wrap for Python users. The C++ language guarantees better computational performances against the Julia implementation and the Python version enlarge its usability. This implementation is optimized for parallel computing and is endowed with a custom C++ library called Scorer (see Appendix D for further details), which is able to compute a large number of statistical measurements based on a hierarchical graph scheme. With this optimized implementation we try to encourage researchers to approach these alternative algorithms and to use them more frequently on real context.

As the Julia implementation also the C++ one provides the entire rFBP framework in a single library callable via a command line interface. The library widely uses template syntaxes to perform dynamic specialization of the methods between two magnetization versions of the algorithm. The main object categories needed by the algorithm are wrapped in handy C++ objects easy to use also from the Python interface. A further optimization is given by the reduction of the number of available functions: in the original implementation a large amount of small functions are used to perform a single complex computation step, enlarging the amount of call stack; in the C++ implementation the main functions are re-written with the minimizing the call stack to ease the vectorization of the code.

The full rFBP library is released under MIT license and it is open-source on Github [ReplicatedFocusingBeliefPropagation]. The on-line repository provides also a full list of installation instructions which could be performed via CMake or Makefile. The continuous integration of the project is guaranteed in every operative system using Travis CI and Appveyor CI which test more than 15 different C++ compilers and environments.

The Python wrap guarantees also a good integration with the other common Machine Learning tools provided by scikit-learn Python package; in this way we can use the rFBP algorithm as equivalent alternative also in other pipelines. Like other Machine Learning algorithm also the rFBP one depends on many parameters, i.e its hyper-parameters, which has to be tuned according to the given problem. The Python wrap of the library was written according to scikit-optimize Python package to allow an easy hyper-parameters optimization using the already implemented classical methods.

We firstly test the computational efficiency of our implementation against the original Julia one. The tests were performed comparing our Cython version of the code (and thus with a slight overhead given by the Python interpreter) and the Julia implementation as reference. Varying the dimension sizes (number of samples, M, and number of variables, N) we tested the time efficiency over 100 runs of both the algorithms. We divided our simulation according to the two possible type of magnetizations (MagP64 and MagT64 as described by the original implementation available here) and the obtained results are shown in Fig. [1, 2], respectively.

As can be seen by the two simulations our implementation always overcome the time performances of the original one, taken as reference in the plot. However, we can not guarantee a perfect parallel execution of our version: also with multi-threading support the scalability of our implementation does not follow a linear trend with the number of available cores. In our simulation, in fact, we used 32 cores against the single thread execution of the Julia implementation but we gained only a 4x and 2x of speedup for MagT64 and MagP64, respectively. The network training is a sequential process by definition and thus it is hard to obtain a relevant speedup using a parallel implementation. In this case it is probably jointed to a not perfect parallelization strategy chosen which bring to a not efficient scalability of our algorithm version. However, the improvements performed to the code allow us to use this algorithm with bigger dataset sizes.

next »

PhDthesis

PhD thesis in Applied Physics

Algorithm Optimization