Time performance
As described in the above sections, the DNetPRO
is a combinatorial algorithm and thus it requires a particular accuracy in the code implementation to optimize as much as possible the computational performances.
The theoretical optimization strategies, described up to now, have to be proved by quantitative measures.
The time evaluation was performed using the Cython
(C++
wrap) implementation against the pure Python
(naive) implementation showed in the previous snippet.
The time evaluation was performed using the timing
Python
package in which we can easily simulate multiple runs of a given algorithm 1.
In our simulations, we monitored the three main parameters related to the algorithm efficiency: the number of samples, the number of variables and (as we provided a parallel multi-threading implementation) the number of threads used.
For each combination of parameters, we performed 30 runs of both algorithms and we extracted the minimum execution time.
The tests were performed on a classical bioinformatics server (128 GB RAM memory and 2 CPU E5-2620, with 8 cores each).
The obtained results are shown in following Figure.
In each plot, we fixed two variables and we evaluated the remaining one.
In all our simulations, the efficiency of the (optimized) Cython
version is easily visible and the gap between the two implementations reached more than 10^4
seconds.
On the other hand, it is important to highlight the scalability of the codes against the various parameters.
While the code performances scale quite well with the number of features (Fig. 1) in both the implementations, we have a different trend varying the number of samples (Fig. 2): the Cython
rend starts to saturate almost immediately, while the computational time of the Python
implementation continues to grow.
This behavior highlights the results of the optimizations performed on the Cython
version which allows the application of the DNetPRO
algorithm also to larger datasets without loosing performances.
An opposite behavior is found monitoring the number of threads (ref Fig. 3): the Python
version scales quite well with the number of threads 2, while Cython
trend is more unstable.
This behavior is probably due to a non-optimal schedule in the parallel section: the work is not equally distributed along the available threads and it penalizes the code efficiency, creating a bottleneck related to the slowest thread.
The above results are computed considering a number of features equal to 90 and, thus, the parallel section distributes the 180 (N x N
) iterations along the available threads: when the number of iterations is proportional to the number of threads used (12, 20 and 30 in our case), we have a maximization of the time performances.
Despite of this, the computational efficiency of the Cython
implementation is so much better than the Python
one that its usage is indisputable.
-
We would stress that we can use the
timing
Python
package only because we provided aCython
wrap of ourDNetPRO
algorithm implementation. We would also highlight that, albeit minimal, thePython
superstructure penalizes the computational performances and the best results can be obtained using the pureC++
version of the code. ↩ -
The optimal result should be a linear scalability with the number of threads but it is always difficult to reach this efficiency. Thus, a reasonable good result is given by a progressive decrease increasing the number of threads. ↩