首页 | 本学科首页   官方微博 | 高级检索  
     

基于多步筛选法的心脑血管疾病全基因组关联研究
引用本文:胡奕绅,朱木春,殷鹏.基于多步筛选法的心脑血管疾病全基因组关联研究[J].集成技术,2019,8(5):72-85.
作者姓名:胡奕绅  朱木春  殷鹏
作者单位:中国科学院深圳先进技术研究院 深圳 518055;深圳大学 深圳 518061;中国科学院深圳先进技术研究院 深圳 518055
基金项目:国家自然青年科学基金项目(11801542);深圳市科创委学科布局项目(JCYJ20180703145002040)
摘    要:全基因组关联研究是研究复杂疾病和性状遗传效应的一种有效手段。现有关联分析主要用的是边缘统计检验的方法,但未考虑特征间相关性、阈值选取不稳定等问题。该文以心脑血管疾病为研究对象,提出了一种基于多步筛选法的全基因组关联分析新方法。该方法可以简要概括为以下 两步:首先利用 Gini 指数做特征初始筛选,获得一个候选单核苷酸多态性子集,再用基于随机森林的递归聚类消除法从单核苷酸多态性子集中发现关联单核苷酸多态性。实验结果表明,多步筛选法比单步特征选择的效果更好,基于 Gini 指数的基于随机森林的递归聚类消除法筛选的单核苷酸多态性子集与疾病的关联性更高。

关 键 词:心脑血管疾病  特征选择  单核苷酸多态性  多步筛选

Genome-Wide Association Study of Cardiovascular and Cerebrovascular Diseases Based on Multi-Step Screening
Authors:HU Yishen  ZHU Muchun and YIN Peng
Abstract:Genome-wide association study (GWAS) is an effective method to study genetic variants associated with complex diseases or traits. Marginal statistical test is the common method of GWAS, however there following weakness such as lack of consideration of correlation between the features and unstable threshold selection. In this paper, we discuss a new method of GWAS based on multi-step tests model for cardiocerebrovascular disease. The method can be divided into the following two steps: Gini index is used for first step feature selection to achieve a subset of single-nucleotide polymorphisms (SNPs), and then random forest recursive cluster elimination (RF-RCE) filters the associated SNPs subset from first-step candidate SNP set. Experiment results show that the multi-step feature selection is better than the single-step feature selection, and the selected SNPs are more suitable for cardio-cerebrovascular disease prediction.
Keywords:cardio-cerebrovascular disease  feature selection  single-nucleotide polymorphism  multi-step selection
本文献已被 万方数据 等数据库收录!
点击此处可从《集成技术》浏览原始摘要信息
点击此处可从《集成技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号