基于医疗文本数据聚类的帕金森病早期诊断预测 Early diagnosis and prediction of Parkinson's disease based on clustering medical text data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于医疗文本数据聚类的帕金森病早期诊断预测

引用本文：	张晓博,杨燕,李天瑞,陆凡,彭莉兰.基于医疗文本数据聚类的帕金森病早期诊断预测[J].计算机应用,2005,40(10):3088-3094.

作者姓名：	张晓博杨燕李天瑞陆凡彭莉兰

作者单位：	1. 西南交通大学信息科学与技术学院, 成都 611756;2. 西南交通大学人工智能研究院, 成都 611756;3. 综合交通大数据应用技术国家工程实验室(西南交通大学), 成都 611756

基金项目：	国家自然科学基金资助项目（61976247）；四川省重点研发计划项目（20ZDYF2837）。

摘要：	针对多发于老龄人群的帕金森病（PD）的早期智能化诊断的问题，提出基于医疗检测文本信息数据的聚类技术来对PD进行分析预测。首先，对原始数据集进行预处理以获取有效特征信息，并通过主成分分析（PCA）方法将原始特征分别降维到8个不同维度的维度空间；然后，应用5个传统的经典聚类模型和3种不同的聚类集成方法分别对8个维度空间的数据进行聚类；最后，采用4个聚类性能指标来预测数据集中的多巴胺异常PD患者、健康体和无多巴胺缺失（SWEDD） PD患者。仿真结果显示，PCA特征维度值取30时，高斯混合模型（GMM）的聚类准确度达到89.12%；PCA特征维度值取70时，谱聚类（SC）的聚类准确度达到61.41%；PCA特征维度值取80时，元聚类算法（MCLA）的聚类准确度达到59.62%。对比实验结果表明，5种经典聚类方法中，PCA的特征维度值小于40时，高斯混合模型聚类效果最佳；3种聚类集成方法中，对于不同的特征维度，MCLA的聚类性能均表现优异，进而为PD的早期智能化辅助诊断提供了技术和理论支撑。
关键词：	帕金森病医疗文本数据主成分分析聚类聚类集成
收稿时间：	2020-03-26
修稿时间：	2020-05-29
Early diagnosis and prediction of Parkinson's disease based on clustering medical text data

ZHANG Xiaobo,YANG Yan,LI Tianrui,LU Fan,PENG Lilan.Early diagnosis and prediction of Parkinson's disease based on clustering medical text data[J].journal of Computer Applications,2005,40(10):3088-3094.

Authors:	ZHANG Xiaobo YANG Yan LI Tianrui LU Fan PENG Lilan

Affiliation:	1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China;2. Institute of Artificial Intelligence, Southwest Jiaotong University, Chengdu Sichuan 611756, China;3. National Engineering Laboratory of Integrated Transportation Big Data Application Technology(Southwest Jiaotong University), Chengdu Sichuan 611756, China

Abstract:	In view of the problem of the early intelligent diagnosis for Parkinson's Disease (PD) which occurs more common in the elderly, the clustering technologies based on medical detection text information data were proposed for the analysis and prediction of PD. Firstly, the original dataset was pre-processed to obtain effective feature information, and these features were respectively reduced to eight dimensional spaces with different dimensions by Principal Component Analysis (PCA) method. Then, five traditional classical clustering models and three different clustering ensemble methods were respectively used to cluster the data of eight dimensional spaces. Finally, four clustering performance indexes were selected to predict PD subject with dopamine deficiency as well as healthy control and Scans Without Evidence of Dopamine Deficiency (SWEDD) PD subject. The simulation results show that the clustering accuracy of Gaussian Mixture Model (GMM) reaches 89.12% when the value of PCA feature dimension is 30, the clustering accuracy of Spectral Clustering (SC) is 61.41% when the PCA feature dimension value is 70, and the clustering accuracy of Meta-CLustering Algorithm (MCLA) achieves 59.62% when the PCA feature dimension value is 80. The comparative experiments results show that GMM has the best clustering effect in the five classical clustering methods when the PCA feature dimension value is less than 40 and MCLA has the excellent clustering performance among the three clustering ensemble methods for different feature dimensions, which thereby provides the technical and theoretical supports for the early intelligent auxiliary diagnosis of PD.

Keywords:	Parkinson’s Disease (PD) medical text data Principal Component Analysis (PCA) clustering clustering ensemble

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏