Time performance

As described in the above sections, the DNetPRO is a combinatorial algorithm and thus it requires a particular accuracy in the code implementation to optimize as much as possible the computational performances. The theoretical optimization strategies, described up to now, have to be proved by quantitative measures.

The time evaluation was performed using the Cython (C++ wrap) implementation against the pure Python (naive) implementation showed in the previous snippet. The time evaluation was performed using the timing Python package in which we can easily simulate multiple runs of a given algorithm ¹. In our simulations, we monitored the three main parameters related to the algorithm efficiency: the number of samples, the number of variables and (as we provided a parallel multi-threading implementation) the number of threads used. For each combination of parameters, we performed 30 runs of both algorithms and we extracted the minimum execution time. The tests were performed on a classical bioinformatics server (128 GB RAM memory and 2 CPU E5-2620, with 8 cores each). The obtained results are shown in following Figure. In each plot, we fixed two variables and we evaluated the remaining one.

Execution time of `DNetPRO` algorithm. We compare the execution time between pure-`Python` (orange) and `Cython` (blue, `C++` wrap) implementations. Execution time in function of the number of variables (the number of samples and the number of threads are kept fixed).

Execution time of `DNetPRO` algorithm. We compare the execution time between pure-`Python` (orange) and `Cython` (blue, `C++` wrap) implementation. Execution time in function of the number of samples (the number of variables and the number of threads are kept fixed).

Execution time of `DNetPRO` algorithm. We compare the execution time between pure-`Python` (orange) and `Cython` (blue, `C++` wrap) implementation. Execution time in function of the number of threads (the number of variables and the number of samples are kept fixed).

In all our simulations, the efficiency of the (optimized) Cython version is easily visible and the gap between the two implementations reached more than 10^4 seconds. On the other hand, it is important to highlight the scalability of the codes against the various parameters. While the code performances scale quite well with the number of features (Fig. 1) in both the implementations, we have a different trend varying the number of samples (Fig. 2): the Cython rend starts to saturate almost immediately, while the computational time of the Python implementation continues to grow. This behavior highlights the results of the optimizations performed on the Cython version which allows the application of the DNetPRO algorithm also to larger datasets without loosing performances. An opposite behavior is found monitoring the number of threads (ref Fig. 3): the Python version scales quite well with the number of threads ², while Cython trend is more unstable. This behavior is probably due to a non-optimal schedule in the parallel section: the work is not equally distributed along the available threads and it penalizes the code efficiency, creating a bottleneck related to the slowest thread. The above results are computed considering a number of features equal to 90 and, thus, the parallel section distributes the 180 (N x N) iterations along the available threads: when the number of iterations is proportional to the number of threads used (12, 20 and 30 in our case), we have a maximization of the time performances. Despite of this, the computational efficiency of the Cython implementation is so much better than the Python one that its usage is indisputable.

next »

We would stress that we can use the timing Python package only because we provided a Cython wrap of our DNetPRO algorithm implementation. We would also highlight that, albeit minimal, the Python superstructure penalizes the computational performances and the best results can be obtained using the pure C++ version of the code. ↩
The optimal result should be a linear scalability with the number of threads but it is always difficult to reach this efficiency. Thus, a reasonable good result is given by a progressive decrease increasing the number of threads. ↩

PhDthesis

PhD thesis in Applied Physics

Time performance