首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
一种基于粗糙集理论的规则提取方法   总被引:3,自引:1,他引:2  
规则提取是实现智能信息系统的重要环节,也是一个难点。针对信息系统中的规则提取问题,提出了一种基于粗糙集的研究方法,并对规则提取涉及到的属性约简、属性值约简等问题进行了研究。根据粗糙集中的不可分辨关系建立了可辫识向量,以利用可辨识向量的加法法则运算求得核属性以及属性重要性,然后以核属性为基础、属性重要性为启发信息,求得信息表的一个属性约简。在此基础上,利用条件属性与决策属性之间的对应关系,对信息表中的每条规则通过删除冗余属性值来完成信息表的属性值约简,最终实现规则提取。数值实例和试验表明本算法是有效、可行的。  相似文献   

2.
Work in inductive learning has mostly been concentrated on classifying.However,there are many applications in which it is desirable to order rather than to classify instances.Formodelling ordering problems,we generalize the notion of information tables to ordered information tables by adding order relations in attribute values.Then we propose a data analysis model by analyzing the dependency of attributes to describe the properties of ordered information tables.The problem of mining ordering rules is formulated as finding association between orderings of attribute values and the overall ordering of objects.An ordering rules may state that “if the value of an object x on an attribute a is ordered ahead of the value of another object y on the same attribute,then x is ordered ahead of y“.For mining ordering rules,we first transform an ordered information table into a binary information table,and then apply any standard machine learning and data mining algorithms.As an illustration,we analyze in detail Maclean‘s universities ranking for the year 2000.  相似文献   

3.
Incremental training has been used for genetic algorithm (GA)‐based classifiers in a dynamic environment where training samples or new attributes/classes become available over time. In this article, ordered incremental genetic algorithms (OIGAs) are proposed to address the incremental training of input attributes for classifiers. Rather than learning input attributes in batch as with normal GAs, OIGAs learn input attributes one after another. The resulting classification rule sets are also evolved incrementally to accommodate the new attributes. Furthermore, attributes are arranged in different orders by evaluating their individual discriminating ability. By experimenting with different attribute orders, different approaches of OIGAs are evaluated using four benchmark classification data sets. Their performance is also compared with normal GAs. The simulation results show that OIGAs can achieve generally better performance than normal GAs. The order of attributes does have an effect on the final classifier performance where OIGA training with a descending order of attributes performs the best. © 2004 Wiley Periodicals, Inc. Int J Int Syst 19: 1239–1256, 2004.  相似文献   

4.
决策表分析的统计依据   总被引:2,自引:0,他引:2  
给出了决策表的条件属性约简的非参数统计检验方法。首先,给出与决策表相应的列联表,进行条件属性与决策属性间相关性的显著性检验,在一定的显著性水平上,依据相关性显著与否,来判别该属性相对于决策行为是否冗余,从而获得属性约简;进而,来用Lanmbda系数对与决策属性显著相关的属性进行相关性度量,说明用条件属性对决策属性进行预测将消减误差的比例。并在列联表的基础上,获得决策表的一级规则。病例决策表的实验表明,该方法简单,有效。  相似文献   

5.
The granularity of an information system has an incumbent effect on the efficacy of the analysis from many machine learning algorithms. An information system contains a universe of objects characterized and categorized by condition and decision attributes. To manage the concomitant granularity, a level of continuous value discretization (CVD) is often undertaken. In the case of the rough set theory (RST) methodology for object classification, the granularity contributes to the grouping of objects into condition classes with the same condition attribute values. This article exposits the effect of a level of CVD on the subsequent condition classes constructed, with the introduction of the condition class space—the domain within which the condition classes exist. This domain elucidates the association of the condition classes to the related decision outcomes—reflecting the inexactness incumbent when a level of CVD is undertaken. A series of measures is defined that quantify this association. Throughout this study and without loss of generality, the findings are made through the RST methodology. This further offers a novel exposition of the relationship between all the condition attributes and the RST‐related reducts (subsets of condition attributes). © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 173–191, 2006.  相似文献   

6.
Y.Y. Yao 《Information Sciences》2006,176(23):3431-3452
An approximate retrieval model is proposed based on the notion of neighborhood systems. The knowledge used in the model consists of an information table, in which each object is represented by its values on a finite set of attributes, and neighborhood systems on attribute values, which provide semantic similarity or closeness of different values. An information table can be used for exact retrieval. With the introduction of neighborhood systems to information tables, one is able to perform approximate retrieval. Operations on neighborhood systems are introduced based on power algebras. An ordering relation representing the information of a neighborhood system is suggested and examined. Approximate retrieval is carried out by the relaxation of the original query using neighborhood systems, and the combination of intermediate results using neighborhood system operations. The final retrieval results are presented according to the proposed ordering relation. In contrast to many existing systems, a main advantage of the proposed model is that the retrieval results are a non-linear ordering of objects.  相似文献   

7.
一种基于粗集理论的增量式属性约简算法   总被引:3,自引:1,他引:2  
增量式学习中,当信息系统的对象和决策属性不变而不断增加条件属性时,为了获得该系统的约简属性,一般方法是对决策表中的所有数据重新计算,但这种方法显然效率很低且不必要.在粗集理论的基础上,给出相对区分矩阵和绝对区分矩阵的定义,提出一种新的增量式属性约简算法.通过实例得知:由该算法得到的属性约简与传统算法得到的属性约简结果相同,但该算法不仅降低了时间复杂度而且其分类质量一般要优于原来的分类质量,所以该属性约简具有一定的实用价值.  相似文献   

8.
建立企业Intranet主要涉及两个重要方面———大量信息的组织表示和信息访问安全控制。将各类信息抽象成用基本属性和扩展属性表示的对象。这种对象技术的信息表示形式大大提高了Intranet信息组织的灵活性和通用性。同时利用基于部门、角色和用户的人员管理模式,并将对象属性访问控制进行安全分级,为关系型数据库中的信息安全提供字段级控制。  相似文献   

9.
提出了一种处理海量的不完备决策表的方法。将基于互信息的属性重要度作为启发式信息,利用遗传算法对不完备的原始决策表中的条件属性进行约简,形成包含missing值的决策表,称为优化决策表。利用原始决策表自身的信息,通过属性扩展,从优化决策表中抽取一致性决策规则,而无须计算missing值。该方法在UCI的8个数据集上的实验结果优于EMAV方法,是一种有效的从海量不完备决策表中抽取规则的方法。  相似文献   

10.
专利信息抽取是专利分析的基础,属性及属性值的识别与抽取是专利信息抽取所要解决的关键问题。目前,在中文专利信息抽取领域针对属性和属性值同步抽取的研究较少。本文以中文专利摘要作为实验语料,运用统计学习知识,提出一种基于条件随机场的抽取方法。该方法将属性和属性值视为命名实体,利用语料训练得到条件随机场模型,从而实现对属性和属性值的抽取;再利用挖掘的关联规则完成属性与属性值匹配。实验结果的准确率、召回率和F值分别是80.8%、81.2%和81.0%,其表明该方法能够高效同步抽取属性和属性值。同时,在抽取结果的基础上,本文完成了对专利的分析和同类专利的比较,体现了本方法的实用价值。  相似文献   

11.
基于划分贴近度的不完备信息系统属性约简   总被引:1,自引:0,他引:1       下载免费PDF全文
在不完备信息系统中,通过引入划分贴近度,对不完备信息系统中属性的重要性进行了定义。针对不完备信息表和不完备决策表提出了两个新的基于划分贴近度的属性约简算法,两个算法的时间复杂度均为O(m2n2)。通过实例说明,这两个算法能分别得到不完备信息表的约简和不完备决策表的相对约简。  相似文献   

12.
通过分析目前信息观下不完备信息系统属性约简,针对已提出的几种信息熵存在随着属性的增加系统分类能力减弱的不足,从条件属性确定的容差类在决策属性划分上的分布出发,给出不完备决策表的条件分布信息量的定义;同时,定义了新的属性重要度,并以此为启发信息设计属性约简算法。通过实验说明了该算法对不完备决策表属性约简是可行的。  相似文献   

13.
属性约简是粗糙集理论进行知识获取的核心问题之一。针对现实信息系统中属性值取值不确定的情况,结合灰色系统理论对集中有序关系进行扩展,建立了灰色信息系统中趋于某个标准值的一种偏好关系,并以集中有序关系下的优势度为启发式信息,给出了属性的重要性度量,在此基础上提出了适合于属性值为连续灰数的信息系统的属性约简算法,给出了约简的实际操作方法,并通过实例验证了算法的可行性。  相似文献   

14.
不完备信息系统的一种属性约简   总被引:1,自引:1,他引:0       下载免费PDF全文
条件属性的重要性存在差异,通过引入差异度,对不完备信息系统中属性的重要性进行了定义,提出了一种基于权重联系度的属性约简算法。通过实例说明该算法能得到不完备决策表的最小相对约简。  相似文献   

15.
海产品安全预警系统缺失数据填补方法   总被引:1,自引:0,他引:1  
针对海产品安全预警系统中数据缺失问题,提出了一种缺失数据填补方法,目前,使用粗糙集填补的方法很多,但很多方法并没有考虑到每个对象缺失属性个数。该方法将存在缺失数据的信息表分为完备和不完备两部分,并分别对其进行处理,对缺失数据填补时综合考虑属性重要性和缺失属性个数;不存在缺失数据的信息表则直接输出;实验结果表明方法能用于海产品安全预警系统中缺失数据填补。  相似文献   

16.
姚宏亮  王秀芳  王浩 《计算机科学》2012,39(2):250-254,272
通过研究粗糙集与图论的关系,提出了以集合为权的加权多重完全多部图的概念,定义了加权多重完全多部图的邻接矩阵,得到了加权完全多部图与决策表的映射关系;给出了粗糙集决策表信息系统的图论形式和决策表信息系统属性约简的图论方法,并根据图论理论对算法进行了优化;得到了在决策表信息系统中,属性的集合不可以约简的充分必要条件;并进一步提出了基于属性置信度的计算方法和多决策属性的处理方法。编程实验结果证明该方法能有效地降低时间和空间复杂度。  相似文献   

17.
分析HORAFA算法和HORAFA-A算法的不足,给出一种获得最优约简的启发式算法.算法以核属性为初始约简集,以属性频率为启发式信息,选择必要的属性加入约简集.该算法不仅适用于相容决策表系统,也适用于不相容决策表系统;同时,改进了反向消除方法,可以更快速地删除多余条件属性.实验表明,该算法是正确的,并且效率优于HORAFA-A算法.  相似文献   

18.
Data classification is a well‐organized operation in the field of data mining. This article presents an application of the k‐nearest neighbor classification technique for mining a fuzzy database. We consider a data set in which attribute values have certain similarities in nature and analyze the observations for the domain of each attribute, on the basis of fuzzy similarity relations. The proposed technique is general and the presented case study demonstrates the suitability of using this fuzzy approach for mining fuzzy databases, especially when the database contains various levels of abstraction. © 2004 Wiley Periodicals, Inc. Int J Int Syst 19: 1277–1290, 2004.  相似文献   

19.
Intranet数据库安全访问控制方案   总被引:1,自引:0,他引:1  
建立Intranet主要涉及两个方面即大量数据的组织表示和数据库访问安全控制,将各类信息抽象成用基本属性和扩展属性表示的对象。这种对象的数据表示形式大大提高了Intranet数据组织的灵活性和通用性。同时利用基于部门、角色和用户的人员管理模式,将对象属性访问控制进行安全分级,为关系型数据库中的数据安全提供字段级的控制。  相似文献   

20.
Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. The simplest approach to tackle this problem is to assume that the attributes are independent. More accurate estimators try to capture the joint data distribution of the attributes. In databases, such estimators include the construction of multidimensional histograms, random sampling, or the wavelet transform. In statistics, kernel estimation techniques are being used. Many traditional approaches assume that attribute values come from discrete, finite domains, where different values have high frequencies. However, for many novel applications (as in temporal, spatial, and multimedia databases) attribute values come from the infinite domain of real numbers. Consequently, each value appears very infrequently, a characteristic that affects the behavior and effectiveness of the estimator. Moreover, real-life data exhibit attribute correlations that also affect the estimator. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique defines buckets of variable size and allows the buckets to overlap. The size of the cells is based on the local density of the data. The use of overlapping buckets allows a more compact approximation of the data distribution. We also show how to generalize kernel density estimators and how to apply them to the multidimensional query approximation problem. Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets. The experimental results show that the proposed techniques behave more accurately in high dimensionalities than previous approaches.Received: 30 January 2001, Accepted: 9 June 2003, Published online: 4 March 2004Edited by: Y. IoannidisDimitrios Gunopulos: Supported by NSF ITR-0220148, NSF IIS-9907477 CAREER Award, NSF IIS-9984729, and NRDRP.George Kollios: Supported by NSF IIS-0133825 CAREER Award.Vassilis J. Tsotras: Supported by NSF IIS-9907477 and the US Dept. of Defense.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号