首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 171 毫秒
1.
基于大量实测数据,将数据挖掘C4.5算法引入2型糖尿病的数据处理之中,建立了多条有效规则,通过测试其患病与未患病的平均正确识别率达97%。重要的是建立的这些规则同医学上的认识基本一致,可以说这些规则的建立为解释糖尿病发病因素之间的相互关系以及影响程度提供了一种新的方法,为建立2型糖尿病的预警和干预控制的研究提供了基础。  相似文献   

2.
数据挖掘在2型糖尿病数据处理中的应用   总被引:5,自引:0,他引:5  
基于大量实测数据探索2型糖尿病的发病规律,寻求其有效的数据处理方法。将数据挖掘技术引入到2型糖尿病数据处理中得出决策分类树,再同医学认识相对照。利用11400条实测数据,采用C4.5算法得出分类树,经实验患病人群的正确识别率为80.90%,未患病人群的正确识别率为92.05%。给出的决策分类树同目前医学上认识的高危因素趋于一致,同时给出了血糖值等于5.85的临界性数值。数据挖掘方法的引入为2型糖尿病数据处理提供了一种新的方法,为其预警、干预和有效控制提供了一种新的解决方案。  相似文献   

3.
知识发现在2型糖尿病数据处理中的应用研究   总被引:2,自引:0,他引:2  
目的:首次将知识发现理论引入到2型糖尿病发病相关因素数据处理中,从大量实测数据中识别出有效的、潜在的、有用的、可理解的发病规律。方法:根据2型糖尿病数据的特点,选用数据挖掘C4.5算法对17072条有效的整群抽样横断面健康调查数据进行决策树分类。结果:通过训练模型给出糖尿病患病与否的决策分类树,该决策树可以直观地给出发病相关因素的不同层次的相对影响,经实验测试结果对于未患病的正确识别率为92.05%,对于患病的正确识别率为80.90%,同时得出了血糖值为5.85的分类临界值。结论:决策分类树结果同目前认识的高危因素趋于一致,说明数据挖掘C4.5算法适用于2型糖尿病的发病相关因素数据分析处理,是2型糖尿病数据处理的一种新方法,其在疾病的宏观控制中有着广阔的应用前景。  相似文献   

4.
本文对某型航空发动机试车数据的数据挖掘技术应用作了研究。根据数据的特点采用粗糙集理论,研究了数据离散化处理,通过知识约简,形成了分类规则。结果表明,所得出的规则是正确的,为试车工作提供了有价值的决策信息。  相似文献   

5.
针对连续数据分发型传感器网络,提出了一种基于应用规则和概率的动态路由算法RPDR。算法基于节点的状态信息,与应用规则交互后周期性地构造一棵广度优先的数据汇集树形成动态路由路径。数据汇集树由初始生成树建立与生成树修补两阶段完成,算法首先将节点状态作为输入参数提供给应用规则,再由规则使用预定义公式计算出节点当前轮成为树节点的概率,形成初始树;而后在树修补阶段,通过添加一些新的普通节点为路由节点对初始树进行修补,完成树的连通覆盖。仿真结果表明,与TinyOS信标算法相比,在本文设计应用规则下的路由算法具有高数据传输率、时延短、平均能耗低的优点,能延长网络生存时间。  相似文献   

6.
谢永芳  胡志坤  桂卫华 《控制工程》2006,13(5):442-444,448
针对数值型数据能准确反应现实世界,但难以理解的问题,为了从数值型数据中挖掘出易于理解的知识,提出了基于数值型数据的模糊规则快速挖掘方法。该方法能从数值型数据中挖掘出一个零阶的Sugeno模糊规则,并采用一种启发式方法将这个零阶的Sugeno模糊规则的数值结论转变为两个带置信度的语言变量,并给出了规则库的存储结构。最后通过实例证明了这种快速模糊规则挖掘方法能避免复杂的数值型计算和能有效逼近非线性函数的优点.  相似文献   

7.
《软件》2019,(2):23-26
用糖尿病患者患病记录作为实例详细阐述了基于Apriori算法的关联规则问题。探讨了Apriori算法在关联规则中求解频繁项集的基本思想,并用实例描述了算法的执行过程。  相似文献   

8.
高校每年招生都积累大量的高招录取数据,这些数据中隐藏着宝贵的生源信息知识,采用矩阵等价类关联规则挖掘算法,对高招录取数据进行关联规则挖掘.并通过实验证明所挖掘的关联规则能够为高校投放招生计划、安排招生宣传方案等工作提供决策辅助.具有很广泛的应用价值。  相似文献   

9.
我国是全球糖尿病患病人数最多的国家,患病人数仍在持续快速增长,糖尿病已成为我国重大公共卫生问题。该文关注的糖尿病健康管理对话系统服务于糖尿病患者,为患者解答日常生活中糖尿病相关问题,而目前缺乏用于训练对话系统模型的糖尿病相关数据。基于此,该文构建了首个标注体系完整的糖尿病健康管理中文对话数据集“Diachat”,以支持健康管理对话系统研究。Diachat收集了来自线上聊天平台糖尿病患者与医生的693段对话(Dialogue),共4 686句语料(Sentence),完成了6 594条对话动作(Dialogue act)标注。Diachat数据集采用基于对话动作的表示方式进行意图表示并定义了15个对话动作标签(Act label)。同时,Diachat定义了6个领域(Domain)涵盖语料涉及的领域,分别为:问题(Problem)、饮食(Diet)、行为(Behavior)、运动(Sport)、治疗(Treatment)、基本信息(Profile)。为了支持构建完整的对话系统,Diachat为用户端和系统端分别构造了对话状态,并为每段对话构造了对话目标。基于Diachat数据集,该课题进...  相似文献   

10.
针对传统中医舌诊中存在的客观化、标准化、定量化问题,以及目前大部分对糖尿病舌诊的研究没有结合证型辨证论治,提出以2型糖尿病气阴两虚兼血瘀证型舌象图像颜色特征为研究对象,运用复杂适应系统(CAS)理论作为指导,采用RGB色彩模型对舌这一复杂系统进行舌色特征的提取,并通过舌色量化指标对舌色进行数字化研究。实验根据临床2型糖尿病气阴两虚兼血瘀证的诊断所提供的有效数据作比对研色诊断糖尿病提供了客观依据。  相似文献   

11.
Associative classification has been shown to provide interesting results whenever of use to classify data. With the increasing complexity of new databases, retrieving valuable information and classifying incoming data is becoming a thriving and compelling issue. The evidential database is a new type of database that represents imprecision and uncertainty. In this respect, extracting pertinent information such as frequent patterns and association rules is of paramount importance task. In this work, we tackle the problem of pertinent information extraction from an evidential database. A new data mining approach, denoted EDMA, is introduced that extracts frequent patterns overcoming the limits of pioneering works of the literature. A new classifier based on evidential association rules is thus introduced. The obtained association rules, as well as their respective confidence values, are studied and weighted with respect to their relevance. The proposed methods are thoroughly experimented on several synthetic evidential databases and showed performance improvement.  相似文献   

12.
ABSTRACT

A fuzzy if-then rule whose consequent part is a real number is referred to as a simplified fuzzy rule. Since no defuzzification is required for this rule type, it has been widely used in function approximation problems. Furthermore, data mining can be used to discover useful information by exploring and analyzing data. Therefore, this paper proposes a fuzzy data mining approach to discover simplified fuzzy if-then rules from numerical data in order to approximate an unknown mapping from input to output. Since several pre-specified parameters for deriving fuzzy rules are not easily specified, they are automatically determined by the genetic algorithm with binary chromosomes. To evaluate performance of the proposed method, computer simulations are performed on various numerical data sets, showing that the fitting ability and the generalization ability of the proposed method are comparable to the known fuzzy rule-based methods.  相似文献   

13.
The aim of this review is to provide an overview of proteomic studies in animal models of diabetes and to give some insight into the different methods available today in the rapidly developing field of proteomics. A summary of 31 papers published between 1997 and 2007 is presented. For instance, proteomics has been used to study the development of both type 1 and type 2 diabetes, diabetic complications in tissues like heart, kidney and retina and changes after treatment with anti-diabetic drugs like peroxisome proliferator-activated receptors agonists. Together, these studies give a good overview of a number of experimental approaches. Proteomics holds the promise of providing major contributions to the field of diabetes research. However, to achieve this, a number of issues need to be resolved. Appropriate data representation to facilitate data comparison, exchange, and verification is required, as well as improved statistical assessment of proteomic experiments. In addition, it is important to follow up the results with functional studies to be able to make biologically relevant conclusions. The potential of proteomics to dissect complex human disorders is now beginning to be realized. In the future, this will result in new important information concerning diabetes.  相似文献   

14.
梁凯强  陆菊康 《计算机工程与设计》2007,28(13):3033-3035,3229
关联规则是数据挖掘中的核心任务之一,近年来国内外对关联规则算法的改进取得了比较大的成果.概念格是由二元关系导出的形式化工具.体现了概念内涵和外延的统一,非常适合于发现数据中的潜在关系,因此关联规则的提取也是概念格的一个主要的应用领域,极大的提高了关联规则的挖掘效率,然而由于缺乏领域知识的指导,所挖掘出的规则有些是无意义的或无法满足用户的需要,所以在规则的提取中需要引入领域知识,而领域本体是领域知识的清晰而结构化的表示,因此提出了应用领域本体对生成的概念格进行调整,从而实现对规则提取的指导,以发掘出高层关联规则以及多层次间的关联规则,以满足用户的需要.  相似文献   

15.
证据理论合成公式的讨论及一些修正   总被引:1,自引:0,他引:1       下载免费PDF全文
D-S证据理论作为一种不确定推理方法,已经广泛用于数据融合和目标识别领域。但是D-S证据合成公式存在不足之处,使证据理论的应用受到了一定的限制。鉴于此,Yager对合成公式作了改进,但改进后的合成公式又存在着新的问题。文[2],[3],[4]针对Yager合成公式进行了一些改进。综合比较了以上几种合成公式,并对文[4]的合成公式进行了一些修正,使其满足结合律,提高了计算效率。  相似文献   

16.
IntroductionAn important quality of association rules is novelty. However, evaluating rule novelty is AI-hard and has been a serious challenge for most data mining systems.ObjectiveIn this paper, we introduce functional novelty, a new non-pairwise approach to evaluating rule novelty. A functionally novel rule is interesting as it suggests previously unknown relations between user hypotheses.MethodsWe developed a novel domain-driven KDD framework for discovering functionally novel association rules. Association rules were mined from cardiovascular data sets. At post-processing, domain knowledge-compliant rules were discovered by applying semantic-based filtering based on UMLS ontology. Their knowledge compliance scores were computed against medical knowledge in Pubmed literature. A cardiologist explored possible relationships between several pairs of unknown hypotheses. The functional novelty of each rule was computed based on its likelihood to mediate these relationships.ResultsHighly interesting rules were successfully discovered. For instance, common rules such as diabetes mellitus?coronary arteriosclerosis was functionally novel as it mediated a rare association between von Willebrand factor and intracardiac thrombus.ConclusionThe proposed post-mining domain-driven rule evaluation technique and measures proved to be useful for estimating candidate functionally novel rules with the results validated by a cardiologist.  相似文献   

17.
Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indicators of data quality in Wikidata, based on: (1) community consensus on the currently recorded knowledge, assuming that statements that have been removed and not added back are implicitly agreed to be of low quality; (2) statements that have been deprecated; and (3) constraint violations in the data. We combine these indicators to detect low-quality statements, revealing challenges with duplicate entities, missing triples, violated type rules, and taxonomic distinctions. Our findings complement ongoing efforts by the Wikidata community to improve data quality, aiming to make it easier for users and editors to find and correct mistakes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号