View on GitHub

PhDthesis

PhD thesis in Applied Physics

Author Project Documentation Build Status
N. Curti PhDthesis docs Linux : travisCI
Windows : miss
Supervisor Co-Supervisor
Prof. D. Remondini Prof. G. Castellani
Prof. A. Bazzani

GitHub pull-requests GitHub issues

GitHub stars GitHub watchers

Implementation and optimization of algorithms in Biomedical Big Data Analytics

Big Data Analytics poses many challenges to the research community who has to handle several computational problems related to the vast amount of data. An increasing interest involves Biomedical data aiming to get the so-called “personalized medicine”, where therapy plans are designed on the specific genotype and phenotype of the individual patient and algorithm optimization plays a key role to this purpose. In this work we discuss about several topics related to Biomedical Big Data Analytics with a special attention to numerical issues and algorithmic solutions related to them. We introduce a novel feature selection algorithm tailored on omics datasets, proving its efficiency on synthetic and real high-throughput genomic datasets. The proposed algorithm is a supervised signature identification method based on a bottom-up combinatorial approach that exploits the discriminant power of all variable pairs. We tested our algorithm against other state-of-art models and it outperforms existing results or compares to them.

We also implement and optimize different types of deep learning models, testing their efficiency on biomedical image processing tasks. Three customized frameworks for deep learning neural network models development are discussed and used to describe the numerical improvements proposed on the various topics. In the first implementation we optimize two Super Resolution models and we show their results on NMR images, proving their efficiency in generalization tasks without a retraining. The second optimization involves a state-of-art Object Detection neural network architecture, obtaining a significant speedup in computational performance. We also highlight how Super Resolution models are able to overcome object detection issues and increase detection performance. In the third application we discuss about femur head segmentation problem on CT images: a semi-automated pipeline for the image annotation is proposed and a deep learning neural network model trained on these images.

The last section of this work is the implementation of a novel biomedical database obtained by the harmonization of multiple data sources that provide network-like relationship between biomedical entities. The data involved in this project related to diseases, symptoms and other biological relates were mined using web-scraping methods, and a novel natural language processing pipeline was designed to maximize the overlap between the different data sources. We describe the key steps which lead us to this network-of-networks database and we discuss its potential applications to biomedical research.

Table of contents

License

The Implementation and optimization of algorithms in Biomedical Big Data Analytics document is licensed under the MIT “Expat” License. License

Acknowledgment

The authors acknowledge EU IMI2 - HARMONY Healthcare Alliance for Resourceful Medicines Offensive against Neoplasms in HematologY n. 116026, EU COMPARE COllaborative Management Platform for detection and Analyses of (Re-) emerging and foodborne outbreaks in Europe n. 643476 and EU VEO - Versatile Emerging infectious disease Observatory n. 874735 for their support on biomedical analyses. A special thank goes to INFN Gruppo V AIM - Artificial Intelligence in Medicine, Progetto FILO-BLU Bando Lazio POR-FESR 2014-2020 LIFE2020 and EU ETN-ITN ImforFuture - Innovative training in methods for future data n. 721815 for what concern the development of machine learning and deep learning analyses show in this thesis.

Citation

Please cite Implementation and optimization of algorithms in Biomedical Big Data Analytics if you use it in your research.

@misc{PhDtheis,
  author = {Nico Curti},
  title = {Implementation and optimization of algorithms in Biomedical Big Data Analytics},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://nico-curti2.gitbook.io/phd-thesis/}},
}