EPCY Evaluation of Predictive CapabilitY for ranking biomarker gene candidates at ISMB
Finding biomarker gene candidates constitutes the entry point for the identification of links between gene expression levels and features of interest in RNA-sequencing patient data. Features can be defined as cancer subtypes, presence of prognostic mutations or genome rearrangements. Expression biomarkers are commonly identified using Differentially Expressed Gene (DEG) analysis to compare a test group (presenting the feature of interest) to a control group. At this point, it becomes standard to use significant DEGs as input to integrative analyses, based on a priori knowledge, to find a subset of DEG linked with sample features. We posit that current issues with biomarker identification derive from a misalignment of the DEG procedure with the objective pursued by biomarker identification, namely sample classification based on some criterion. We propose a more direct approach that evaluates gene expressions based strictly on their individual predictive capability to accurately classify samples. We then show that the resulting ranking returns a more informative set of biomarker gene candidate compared to other approaches. Finally, using two patient cohorts and 4 features, we show that the resulting ranking returns a more informative set of biomarkers when compared to other approaches, compare to DEG selected by DESeq, Edger or Limma.