Affiliation: | aBiointelligence Laboratory, School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Republic of Korea bGenoProt Co. Ltd., 2FL Saeseoul Bldg., 94-1, Guro 6-dong, Guro-gu, Seoul 152-841, Republic of Korea |
Abstract: | Conventional clinical decision support systems are generally based on a single classifier or a simple combination of these models, showing moderate performance. In this paper, we propose a classifier ensemble-based method for supporting the diagnosis of cardiovascular disease (CVD) based on aptamer chips. This AptaCDSS-E system overcomes conventional performance limitations by utilizing ensembles of different classifiers. Recent surveys show that CVD is one of the leading causes of death and that significant life savings can be achieved if precise diagnosis can be made. For CVD diagnosis, our system combines a set of four different classifiers with ensembles. Support vector machines and neural networks are adopted as base classifiers. Decision trees and Bayesian networks are also adopted to augment the system. Four aptamer-based biochip data sets including CVD data containing 66 samples were used to train and test the system. Three other supplementary data sets are used to alleviate data insufficiency. We investigated the effectiveness of the ensemble-based system with several different aggregation approaches by comparing the results with single classifier-based models. The prediction performance of the AptaCDSS-E system was assessed with a cross-validation test. The experimental results show that our system achieves high diagnosis accuracy (>94%) and comparably small prediction difference intervals (<6%), proving its usefulness in the clinical decision process of disease diagnosis. Additionally, 10 possible biomarkers are found for further investigation. |