Baseline Acute Myeloid Leukemia Prognosis Models using Transcriptomic and Clinical Profiles by Studying the Impacts of Dimensionality Reductions and Gene Signatures on Cox-Proportional Hazard

Abstract

Gene marker extraction to evaluate risk in cancer can refine the diagnosis process and lead to adapted therapies and better survival. These survival analyses can be done through computer systems and Machine Learning (ML) algorithms such as the Cox-Proportional-Hazard model from gene expression (GE) RNA-Seq data. However, optimal tuning of CPH from genome-wide GE data is challenging and poorly assessed so far. In this work we propose to interrogate an Acute Myeloid Leukemia (AML) dataset (Leucegene) to derive key components of the CPH driving down its performance and discovering its sensitivity to various factors in hoping to ameliorate the system. In this study, we compare the projection and selection data reduction techniques, mainly the PCA and LSC17 gene signature in combination with the CPH in a ML framework. Results reveals that CPH performs better with a combination of clinical and genetic expression features. We determine that projections performs better than selections without clinical information. We ascertain that CPH is affected by overfitting and that this overfitting is linked to the number and the content of input covariables. We show that PCA links clinical features via ability to learn from the input data directly and generalizes better than LSC17 on Leucegene. We postulate that projection are preferred than selection on harder task such as assessing risk in the intermediate subset of Leucegene. We extrapolate that these findings apply in the more general context of risk detection via machine learning in cancer. We see that higher capacity models such as CPH-DNNs systems can be improved via survival-derived projections and combat overfitting through heavy regularization.

Léonard Sauvé
Léonard Sauvé
PhD student in Bioinformatics

Doctoral student in Bioinformatics | Development of automatic risk assessment systems in acute myeloid leukemia based on gene expression data

Sébastien Lemieux
Sébastien Lemieux
Principal Investigator

Principal Investigator, Functional and Structural Bioinformatics Research Unit, IRIC | Scientific direction of the Bioinformatics platform | Associate Professor, Department of Biochemistry and Molecular Medicine, Université de Montréal