mRNA data
We applied both training procedure (ref. Fig.-1) on the mRNA dataset.
The results are shown, as distribution of AUC (Area under the curve) score, in Fig.1 (a) for the best signatures obtained with procedure A
(corresponding to the validation approach used in Yuan2014), while results with the full cross-validation procedure B
are shown in Fig. 1 (b).
As expected, performances decrease with the introduction of the second cross validation step, but the values remain quite stable showing the robustness of the extracted signatures, and we remark that the validation procedure used in the reference paper by Yuan et al. resembles our approach without the second validation step.
All results are comparable (LUSC) or better (KIRC, GBM) than the results reported in Yuan2014, except for the OV dataset, also with the more conservative approach involving a further cross-validation step. The size of the extracted signatures is quite constant, and smaller than 500 genes in each pipeline execution.
To test the robustness of our method, since each cross-validation procedure may generate different signatures, we measured the overlap of the genes belonging to each mRNA signature over 100 simulations with different training-test data splitting. We observed an average overlap ranging from 40% to 60%, with a smaller group of genes found across all the 100 cross-validation iterations.
In this application the DNetPRO
algorithm has several advantages: easy scalability on parallel architectures, simple signature interpretation allowing a valuable application in a biomedical context and a significant robustness in a highly noisy environment such as genomics measurements.