首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
本文根据票据图像的版面特点,选取表格线、背景信息及纹理信息作为主要特征,采用粗糙集理论进行特征属性约简、规则提取及规则约简。并提出了一种基于多规则集的综合决策分类方法,巧妙的利用其分类及拒识的特点,有效地实现了票据图像分类。  相似文献   

2.
本文主要研究基于粗集理论的属性约简算法。提出了一种新的启发式约简算法,即基于加权平均和频度的双向选择约简算法。本文还通过实例验证了该算法的可行性和有效性。  相似文献   

3.
多维优化案例推理检索算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
案例检索是案例推理系统的中心环节,检索质量关系着整个系统的质量。利用遗传算法GA和层次分析法AHP相结合,从案例库,属性的约简,权值确定三方面对案例检索进行优化。利用遗传算法在搜索优化上的优势,使用两维的编码结合权值从而形成三维优化,并利用经验和权值中间表进行权值学习。从而提高检索命中率。并将这种模型运用到基于旅游的多策略数据挖掘系统进行实验,结果表明在案例检索的命中率上有明显提高。  相似文献   

4.
We are witnessing the era of big data computing where computing the resources is becoming the main bottleneck to deal with those large datasets. In the case of high-dimensional data where each view of data is of high dimensionality, feature selection is necessary for further improving the clustering and classification results. In this paper, we propose a new feature selection method, Incremental Filtering Feature Selection (IF2S) algorithm, and a new clustering algorithm, Temporal Interval based Fuzzy Minimal Clustering (TIFMC) algorithm that employs the Fuzzy Rough Set for selecting optimal subset of features and for effective grouping of large volumes of data, respectively. An extensive experimental comparison of the proposed method and other methods are done using four different classifiers. The performance of the proposed algorithms yields promising results on the feature selection, clustering and classification accuracy in the field of biomedical data mining.  相似文献   

5.
基于互信息和粗糙集理论的特征选择   总被引:2,自引:0,他引:2       下载免费PDF全文
朱颢东  李红婵 《计算机工程》2011,37(15):181-183
针对互信息方法在精度方面的不足,通过引入粗糙集,给出一种基于关系积理论的属性约简算法,以此为基础提出一个适用于海量文本数据集的特征选择方法。该方法采用互信息进行特征初选,利用提出的属性约简算法消除冗余,获得较具代表性的特征子集。实验结果表明,该特征选择方法能获得冗余度小且较具代表性的特征子集。  相似文献   

6.
粗糙集理论是一种有效的信息处理工具,决策表属性约简是粗糙集理论研究的一个核心内容。利用RoughSet理论的相关知识,提出了一种基于包含度的决策表属性约简算法。与现有的决策表属性约简算法进行比较,它具有较低的复杂度和较强的可使用性。最后对UCI机器学习数据库中的例子进行约简的实验结果证明,它可以取得比较满意的效果。  相似文献   

7.
基于属性相关性的属性约简新方法   总被引:7,自引:0,他引:7  
文章给出了一个基于粗糙集理论的属性相关性的新定义,并在此基础上给出了基于属性相关性的属性约简新方法。本算法不但能过滤掉属性集合中的无关属性,而且能有效地找到属性集合中的冗余属性,从而得到满意的属性约简。对UCI机器学习数据集的测试结果也验证了算法的有效性。  相似文献   

8.
Feature selection for ensembles has shown to be an effective strategy for ensemble creation due to its ability of producing good subsets of features, which make the classifiers of the ensemble disagree on difficult cases. In this paper we present an ensemble feature selection approach based on a hierarchical multi-objective genetic algorithm. The underpinning paradigm is the “overproduce and choose”. The algorithm operates in two levels. Firstly, it performs feature selection in order to generate a set of classifiers and then it chooses the best team of classifiers. In order to show its robustness, the method is evaluated in two different contexts:supervised and unsupervised feature selection. In the former, we have considered the problem of handwritten digit recognition and used three different feature sets and multi-layer perceptron neural networks as classifiers. In the latter, we took into account the problem of handwritten month word recognition and used three different feature sets and hidden Markov models as classifiers. Experiments and comparisons with classical methods, such as Bagging and Boosting, demonstrated that the proposed methodology brings compelling improvements when classifiers have to work with very low error rates. Comparisons have been done by considering the recognition rates only.  相似文献   

9.
王希雷  王磊 《计算机工程》2006,32(24):204-205
用Rough集理论提取车牌中的文字、字母、数字、短横线的特征,再用这些特征进行模板匹配。该文中的基于Rough集可辨矩阵的特征选择算法,时间复杂度为O(mn2),改变了过去人们认为基于可辨矩阵的特征选择算法的时间复杂度不低于O(m2n2)的观点(其中m为数据集中特征/属性的个数,n为数据集中样本的个数)。给出了在车牌识别中的实验结果。  相似文献   

10.
朱颢东  钟勇 《计算机工程》2010,36(19):39-41
传统特征选择方法选出的特征子集存在冗余,并且不具备较好的代表性。针对该问题,提出基于粗糙集与泛系等价算子的特征选择方法。利用基于最小词频的文档频提取初始特征,通过泛系等价算子对粗糙集进行扩展,并给出属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明,该方法具有较高的准确率和召回率。  相似文献   

11.
集成案例推理综述   总被引:2,自引:0,他引:2  
本文总结和分析了近几年集成案例推理研究的新进展。在案例推理和智能方法集成方面,本文分析了案例推理与智能方法集成的原因,归纳了所应用的集成模型的类别,总结了案例推理与智能方法集成的优势;在案例推理和智能技术集成方面,本文分析了案例推理与智能技术集成的原因,介绍了智能技术在案例表示、案例库构建、案例检索和案例适配中的集成应用,指出了案例推理与智能技术集成的优势。最后,本文展望了集成案例推理未来发展的趋势。  相似文献   

12.
The current research presents a methodology for classification based on Mahalanobis Distance (MD) and Association Mining using Rough Sets Theory (RST). MD has been used in Mahalanobis Taguchi System (MTS) to develop classification scheme for systems having dichotomous states or categories. In MTS, selection of important features or variables to improve classification accuracy is done using Signal-to-Noise (S/N) ratios and Orthogonal Arrays (OAs). OAs has been reviewed for limitations in handling large number of variables. Secondly, penalty for over-fitting or regularization is not included in the feature selection process for the MTS classifier. Besides, there is scope to enhance the utility of MTS to a classification-cum-causality analysis method by adding comprehensive information about the underlying process which generated the data. This paper proposes to select variables based on maximization of degree-of-dependency between Subset of System Variables (SSVs) and system classes or categories (R). Degree-of-dependency, which reflects goodness-of-model and hence goodness of the SSV, is measured by conditional probability of system states on subset of variables. Moreover, a suitable regularization factor equivalent to L0 norm is introduced in an optimization problem which jointly maximizes goodness-of-model and effect of regularization. Dependency between SSVs and R is modeled via the equivalent sets of Rough Set Theory. Two new variants of MTS classifier are developed and their performance in terms of accuracy of classification is evaluated on test datasets from five case studies. The proposed variants of MTS are observed to be performing better than existing MTS methods and other classification techniques found in literature.  相似文献   

13.
陈泽华  谢刚  谢珺  谢克明 《计算机科学》2011,38(2):222-224,228
同一问题在不同知识表示下算法难度不同。Rough集理论把知识定义为对对象的分类能力,并提供了一套基于代数系统的知识表达和处理方法。然而在代数表示下,知识的本质以及运算直观性较差,不易于理解。同济大学苗夺谦教授建立了知识与信息之间的关系,在此基础上给出了Rough集理论中概念和运算的信息表示,并给出了知识约简在代数和信息两种表示下的等价性证明。现进一步将知识及其运算表示成粒矩阵形式,继而给出了知识约简在代数、信息和粒矩阵3种表示下的等价性证明。  相似文献   

14.
In contingency management of a complex system, identification of error condition or faults diagnosis is a very important stage. It determines the methods and techniques to be applied in the following stages of contingency management. In this paper, Rough Set Theory as a new fault-diagnosing tool is used to identify the valve fault for a multi-cylinder diesel engine. This method overcomes the shortcoming of conventional methods where each method of fault diagnosis on diesel engine can only provide one corresponding fault category. By the analysis of the final reducts generated using Rough Set Theory, it is shown that this new method is effective for valve fault diagnosis and it is a new powerful tool that can be applied in contingency management.  相似文献   

15.
A genetic algorithm-based method for feature subset selection   总被引:5,自引:2,他引:3  
As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.  相似文献   

16.
粗糙模糊集的不确定性度量   总被引:8,自引:1,他引:7  
粗糙集理论是一种有效处理不精确、不确定含糊信息的数学理论,近年来已被广泛应用于机器学习、数据挖掘、智能数据分析。该文结合知识粗糙性与信息熵给出了一种关于粗糙模糊集(RF集)的不确定性度量。  相似文献   

17.
基于案例推理的供应商选择决策支持系统研究   总被引:10,自引:1,他引:10  
在介绍了基于案例推理方法的基本原理基础之上,分析了基于案例推理技术的供应商选择决策支持系统的工作原理、框架结构及功能;重点论述了基于案例推理的供应商选择决策支持系统中的一些关键步骤,并结合实例给出了基于案例推理的供应商选择与评价方法,用来验证基于案例推理技术在供应商选择决策支持系统中应用的可行性和有效性,为企业供应商选择决策提供了一个系统模型。  相似文献   

18.
Imbalanced data is a common problem in classification. This phenomenon is growing in importance since it appears in most real domains. It has special relevance to highly imbalanced data-sets (when the ratio between classes is high). Many techniques have been developed to tackle the problem of imbalanced training sets in supervised learning. Such techniques have been divided into two large groups: those at the algorithm level and those at the data level. Data level groups that have been emphasized are those that try to balance the training sets by reducing the larger class through the elimination of samples or increasing the smaller one by constructing new samples, known as undersampling and oversampling, respectively. This paper proposes a new hybrid method for preprocessing imbalanced data-sets through the construction of new samples, using the Synthetic Minority Oversampling Technique together with the application of an editing technique based on the Rough Set Theory and the lower approximation of a subset. The proposed method has been validated by an experimental study showing good results using C4.5 as the learning algorithm.  相似文献   

19.
粗糙集与泛系理论相结合已成为一个新兴的研究领域,基于泛系理论中的泛权场/网等理论,对粗糙集理论的基本概念进行了基本的概括和扩展,将粗糙集理论泛系化扩展加以研究,进而构建了粗糙集的泛系化扩展模型,并通过实例给予解释,为粗糙集的进一步完善和扩展找到了一条新路。  相似文献   

20.
A Case-Based Explanation System for Black-Box Systems   总被引:4,自引:0,他引:4  
Most users of machine-learning products are reluctant to use them without any sense of the underlying logic that has led to the system’s predictions. Unfortunately many of these systems lack any transparency in the way they operate and are deemed to be black boxes. In this paper we present a Case-Based Reasoning (CBR) solution to providing supporting explanations of black-box systems. This CBR solution has two key facets; it uses local information to assess the importance of each feature and using this, it selects the cases from the data used to build the black-box system for use in explanation. The retrieval mechanism takes advantage of the derived feature importance information to help select cases that are a better reflection of the black-box solution and thus more convincing explanations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号