首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
虚拟企业伙伴选择的粗糙集方法   总被引:6,自引:0,他引:6  
周庆敏  殷晨波 《控制与决策》2005,20(9):1047-1051
将粗糙集理论应用于虚拟企业合作伙伴选择中,提出了基于粗糙集理论的虚拟企业伙伴选择的模型和方法.该方法根据各潜在伙伴企业的样本数据集建立决策系统,以伙伴选择的评价指标作为属性,从中挖掘出反映评价指标本质关系的重要属性以及伙伴选择知识规则.这些规则很好地描述了有限样本中所反映出的属性之间的本质特征,运用这些规则可对伙伴选择数据库中的其他样本有效地进行伙伴选择.应用实例表明,该方法是正确有效的.  相似文献   

2.
In medical information system, the data that describe patient health records are often time stamped. These data are liable to complexities such as missing data, observations at irregular time intervals and large attribute set. Due to these complexities, mining in clinical time-series data, remains a challenging area of research. This paper proposes a bio-statistical mining framework, named statistical tolerance rough set induced decision tree (STRiD), which handles these complexities and builds an effective classification model. The constructed model is used in developing a clinical decision support system (CDSS) to assist the physician in clinical diagnosis. The STRiD framework provides the following functionalities namely temporal pre-processing, attribute selection and classification. In temporal pre-processing, an enhanced fuzzy-inference based double exponential smoothing method is presented to impute the missing values and to derive the temporal patterns for each attribute. In attribute selection, relevant attributes are selected using the tolerance rough set. A classification model is constructed with the selected attributes using temporal pattern induced decision tree classifier. For experimentation, this work uses clinical time series datasets of hepatitis and thrombosis patients. The constructed classification model has proven the effectiveness of the proposed framework with a classification accuracy of 91.5% for hepatitis and 90.65% for thrombosis.  相似文献   

3.
Packaging is classified as one of back-end processes in the integrated circuits (ICs) manufacturing, highly capital-intensive and involves complex processes. Unlike the front-end process that fabricates wafers, the back-end process is rarely uniform. Because of the complexity of the process and increasing variety of products, the packaging foundry occasionally encounters complaints that can be categorized into classes depending on the loss. We apply rough set theory to discover important attributes leading to complaints and induce decision rules based on the data of a Taiwanese IC packaging foundry that ranks one of the largest in the world. The data contain 454 records and each record includes 11 condition attributes as well as one decision attribute characterizing the class. We first obtain important set of attributes that ensures high quality of classification, and then we generate rules for each class of complaints. The strongest rules obtained relate to two attributes, number of pins and wire bonding, which are important technological factors in the packaging process. These rules are presented to the foundry’s staffs who believe that the rules are potentially applicable for the future to prevent the complaints.  相似文献   

4.
决策树是一种有效用于分类的数据挖掘方法.在决策树构造算法中,粗集理论的相对核已被应用于解决多变量检验中属性的选择问题.考虑到决策树技术和粗集的优缺点,将二者结合起来,先对每个结点包含的属性个数加以限制,再用属性相关度和De Mantaras距离函数选择相关的属性组合作为属性选择的标准,给出一种新的构造算法.该算法的优点是能有效降低树的高度,而且增强了分类规则的可读性.  相似文献   

5.
针对当前基于属性重要性的决策表属性集分解方法存在的不足,提出了一种新型的基于决策分类的决策表属性集分解方法。分析了近似分类质量和属性重要性与决策分类之间的关系,利用粗糙集理论,从提高子决策表中决策分类正确性的角度出发考虑条件属性与决策属性之间的关系,提出了决策表分解的条件属性选择量度并对决策表实施属性集分解。  相似文献   

6.
Medical data usually contain high degree of uncertainty that leads to problems such as diagnosis disparity. With the continually rising expenditures of health care, controlling the level of the expenditures by improving the diagnosis process is an emergent task and poses a greater challenge. Rough sets have been shown to be a useful technique to deal with data with uncertainty. This paper applies rough sets to identify the set of significant symptoms causing diseases and to induce decision rules using the data of a Taiwan’s otolaryngology clinic. The data are limited to rhinology and throat logy and contain 657 records with each record including 12 condition attributes and one decision attribute. The clinic’s physician agrees with the set of significant symptoms identified and favors the reduct containing one critical symptom. To generate decision rules, we use LEM2 and Explore algorithms and discuss these rules in terms of numbers of rules and implications of certain rules. Our results show that LEM2 generates fewer numbers of certain rules and LEM2’s proportions of approximate rules are higher. We also observe that LEM2’s certain rules form a more sensible pattern that Explore lacks. Discovery of the pattern is considered to be potentially helpful in improving the medical diagnosis.  相似文献   

7.
一种基于分类一致性的决策规则获取算法   总被引:3,自引:3,他引:3       下载免费PDF全文
代建华  潘云鹤 《控制与决策》2004,19(10):1086-1090
提出一种基于分类一致性的规则获取算法.它是一种例化方向的方法,即从空集开始,以条件属性子集的分类一致性来度量属性的重要性,逐步加入重要的属性,当选择的属性子集能够正确分类时,则获取到决策规则.算法中设计了一个规则约简过程,用来简化所获得的规则,增强规则的泛化能力.实验结果表明,所提出的算法获得的规则更为简洁和高效.  相似文献   

8.
Since rule induction methods generate rules whose lengths are the shortest for discrimination between given classes, they tend to generate rules too short for medical experts. Thus, these rules are difficult for the experts to interpret from the viewpoint of domain knowledge. In this paper, the characteristics of experts' rules are closely examined and a new approach to generate diagnostic rules is introduced. The proposed method focuses on the hierarchical structure of differential diagnosis and consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the classes are classified into several generalized groups with respect to the characterization. Then, two kinds of sub-rules, classification rules for each generalized group and rules for each class within each group are induced. Finally, those two parts are integrated into one rule for each decision attribute. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts' decision processes.  相似文献   

9.
We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. Horizontal reduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples in the database. In the second phase, a novel context-sensitive feature merit measure is used to rank the features, a subset of relevant attributes is chosen based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without destroying the interdependence relationships between classes and the attributes. Then rough set-based value reduction is further performed on the reduced table and all redundant condition values are dropped. Finally, tuples in the reduced table are transformed into a set of maximal generalized decision rules. The experimental results on UCI data sets and a real market database demonstrate that our method can dramatically reduce the feature space and improve learning accuracy.  相似文献   

10.
丁春荣  李龙澍 《微机发展》2007,17(11):110-113
决策树是数据挖掘任务中分类的常用方法。在构造决策树的过程中,分离属性的选择标准直接影响到分类的效果,传统的决策树算法往往是基于信息论度量的。基于粗糙集的理论提出了一种基于属性重要度和依赖度为属性选择标准的决策树规则提取算法。使用该算法,能提取出明确的分类规则,比传统的ID3算法结构简单,并且能提高分类效率。  相似文献   

11.
基于BP神经网络与粗糙集理论的分类挖掘方法   总被引:1,自引:0,他引:1  
分类是数据挖掘中重要的课题,为协调决策分类,提出了一种基于粗糙集理论和BP神经网络的数据挖掘的方法。在此方法中首先用粗糙集约简决策表中的冗余属性,然后用BP神经网络进行噪声过滤,最后由粗糙集从约简的决策表中产生规则集。此方法不仅避免了从训练神经网络中提取规则的复杂性,而且有效的提高了分类的精确度。  相似文献   

12.
决策树是数据挖掘中常用的分类方法。针对高等院校学生就业问题中出现由噪声造成的不一致性数据,本文提出了基于变精度粗糙集的决策树模型,并应用于学生就业数据分析。该方法以变精度粗糙集的分类质量的量度作为信息函数,对条件属性进行选择,作为树的节点,自上而下地分割数据集,直到满足某种终止条件。它充分考虑了属性间的依赖性和冗余性,允许在构造决策树的过程中划入正域的实例类别存在一定的不一致性。实验表明,该算法能够有效地处理不一致性数据集,并能正确合理地将就业数据分类,最终得到若干有价值的结论,供决策分析。该算法大大提高了决策规则的泛化能力,减化了树的结构。  相似文献   

13.
纪霞  李龙澍 《控制与决策》2013,28(12):1837-1842

提出一种基于属性分辨度的不完备决策表规则提取算法, 它是一种例化方向的方法. 首先从空集开始, 逐步 选择当前最重要的条件属性对对象集分类, 从广义决策值唯一的相容块提取确定规则, 从其他的相容块提取不确定 规则; 然后设计属性必要性判断步骤去除每条规则的冗余属性; 最后通过规则约简过程来简化所获得的规则, 增强规 则的泛化能力. 实验结果表明, 所提出的算法效率更高, 并且所获得的规则简洁有效.

  相似文献   

14.
We present a data mining method which integrates discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated. Numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. The horizontal reduction is done by merging identical tuples after substituting an attribute value by its higher level value in a pre- defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples we consider further in the database(s). In the second phase, a novel context- sensitive feature merit measure is used to rank features, a subset of relevant attributes is chosen, based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without changing the interdependence relationships between the classes and the attributes. Finally, the tuples in the reduced relation are transformed into different knowledge rules based on different knowledge discovery algorithms. Based on these principles, a prototype knowledge discovery system DBROUGH-II has been constructed by integrating discretization, generalization, rough set feature selection and a variety of data mining algorithms. Tests on a telecommunication customer data warehouse demonstrates that different kinds of knowledge rules, such as characteristic rules, discriminant rules, maximal generalized classification rules, and data evolution regularities, can be discovered efficiently and effectively.  相似文献   

15.
Rough Sets Theory is often applied to the task of classification and prediction, in which objects are assigned to some pre-defined decision classes. When the classes are preference-ordered, the process of classification is referred to as sorting. To deal with the specificity of sorting problems an extension of the Classic Rough Sets Approach, called the Dominance-based Rough Sets Approach, was introduced. The final result of the analysis is a set of decision rules induced from what is called rough approximations of decision classes. The main role of the induced decision rules is to discover regularities in the analyzed data set, but the same rules, when combined with a particular classification method, may also be used to classify/sort new objects (i.e. to assign the objects to appropriate classes). There exist many different rule induction strategies, including induction of an exhaustive set of rules. This strategy produces the most comprehensive knowledge base on the analyzed data set, but it requires a considerable amount of computing time, as the complexity of the process is exponential. In this paper we present a shortcut that allows classifying new objects without generating the rules. The presented approach bears some resemblance to the idea of lazy learning.  相似文献   

16.
三支决策基于代价敏感,通过引入延迟决策,在信息不完备的情况下,能够使分类更加合理。考虑具有混合属性特征的决策信息系统优化决策问题,在混合属性信息系统上定义了邻域关系,构建了基于邻域关系的决策粗糙集模型。在此基础上将其应用于痛风临床诊断决策问题,运用多次迭代学习的方法对痛风数据进行分类。与SVM(Support Vector Machine)、RF(Random Forest)、LR(Logistic Regression)分类算法进行对比,证明了该方法的优越性。根据分类结果发现因素之间的内在联系,获取分类规则,探究痛风与肝功、肾功、血脂、血糖的相关性,为痛风成因研究和诊断治疗提供知识支持和决策支持。  相似文献   

17.
18.
Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.  相似文献   

19.
Classifiability-based omnivariate decision trees   总被引:1,自引:0,他引:1  
Top-down induction of decision trees is a simple and powerful method of pattern classification. In a decision tree, each node partitions the available patterns into two or more sets. New nodes are created to handle each of the resulting partitions and the process continues. A node is considered terminal if it satisfies some stopping criteria (for example, purity, i.e., all patterns at the node are from a single class). Decision trees may be univariate, linear multivariate, or nonlinear multivariate depending on whether a single attribute, a linear function of all the attributes, or a nonlinear function of all the attributes is used for the partitioning at each node of the decision tree. Though nonlinear multivariate decision trees are the most powerful, they are more susceptible to the risks of overfitting. In this paper, we propose to perform model selection at each decision node to build omnivariate decision trees. The model selection is done using a novel classifiability measure that captures the possible sources of misclassification with relative ease and is able to accurately reflect the complexity of the subproblem at each node. The proposed approach is fast and does not suffer from as high a computational burden as that incurred by typical model selection algorithms. Empirical results over 26 data sets indicate that our approach is faster and achieves better classification accuracy compared to statistical model select algorithms.  相似文献   

20.
王利民  姜汉民 《控制与决策》2019,34(6):1234-1240
经典K阶贝叶斯分类模型(KDB)进行属性排序时,仅考虑类变量与决策属性间的直接相关,而忽略以决策属性为条件二者之间的条件相关.针对以上问题,在KDB结构的基础上,以充分表达属性间的依赖信息为原则,强化属性间的依赖关系,提升决策属性对分类的决策表达,利用类变量与决策属性间的条件互信息优化属性次序,融合属性约简策略剔除冗余属性,降低模型结构复杂带来的过拟合风险,根据贪婪搜索策略选择最优属性并构建模型结构.在UCI机器学习数据库中数据集的实验结果表明,该模型相比于KDB而言,具有更好的分类精度和突出的鲁棒性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号