首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 41 毫秒
1.
Attribute selection is one of the important problems encountered in pattern recognition, machine learning, data mining, and bioinformatics. It refers to the problem of selecting those input attributes or features that are most effective to predict the sample categories. In this regard, rough set theory has been shown to be successful for selecting relevant and nonredundant attributes from a given data set. However, the classical rough sets are unable to handle real valued noisy features. This problem can be addressed by the fuzzy-rough sets, which are the generalization of classical rough sets. A feature selection method is presented here based on fuzzy-rough sets by maximizing both relevance and significance of the selected features. This paper also presents different feature evaluation criteria such as dependency, relevance, redundancy, and significance for attribute selection task using fuzzy-rough sets. The performance of different rough set models is compared with that of some existing feature evaluation indices based on the predictive accuracy of nearest neighbor rule, support vector machine, and decision tree. The effectiveness of the fuzzy-rough set based attribute selection method, along with a comparison with existing feature evaluation indices and different rough set models, is demonstrated on a set of benchmark and microarray gene expression data sets.  相似文献   

2.
Data categorization using decision trellises   总被引:4,自引:0,他引:4  
We introduce a probabilistic graphical model for supervised learning on databases with categorical attributes. The proposed belief network contains hidden variables that play a role similar to nodes in decision trees and each of their states either corresponds to a class label or to a single attribute test. As a major difference with respect to decision trees, the selection of the attribute to be tested is probabilistic. Thus, the model can be used to assess the probability that a tuple belongs to some class, given the predictive attributes. Unfolding the network along the hidden states dimension yields a trellis structure having a signal flow similar to second order connectionist networks. The network encodes context specific probabilistic independencies to reduce parametric complexity. We present a custom tailored inference algorithm and derive a learning procedure based on the expectation-maximization algorithm. We propose decision trellises as an alternative to decision trees in the context of tuple categorization in databases, which is an important step for building data mining systems. Preliminary experiments on standard machine learning databases are reported, comparing the classification accuracy of decision trellises and decision trees induced by C4.5. In particular, we show that the proposed model can offer significant advantages for sparse databases in which many predictive attributes are missing  相似文献   

3.
姚晟  汪杰  徐风  陈菊 《计算机应用》2018,38(1):97-103
针对现有的属性约简算法不适合处理数值型属性和符号型属性共同存在的不完备数据,提出了一种拓展不完备邻域粗糙集模型。首先,通过考虑属性值的概率分布来定义缺失属性值之间的距离,可以度量具有混合属性的不完备数据;其次,定义了邻域混合熵来评价属性约简的质量,分析证明了相关的性质定理,并构造了一种基于邻域混合熵的不完备邻域粗糙集属性约简算法;最后从UCI数据集中选取了7组数据进行实验,并分别与基于依赖度的属性约简(ARD)、基于邻域条件熵的属性约简(ARCE)、基于邻域组合测度的属性约简(ARNCM)算法进行了比较。理论分析和实验结果表明,所提算法约简属性比ARD、ARCE、ARNCM分别减少了约1,7,0个,所提算法的分类精度比ARD、ARCE、ARNCM分别提高了约2.5,2.1,0.8个百分点。所提算法不仅能够获得较少的约简属性,同时具有较高的分类精度。  相似文献   

4.
Neural-network feature selector   总被引:12,自引:0,他引:12  
Feature selection is an integral part of most learning algorithms. Due to the existence of irrelevant and redundant attributes, by selecting only the relevant attributes of the data, higher predictive accuracy can be expected from a machine learning method. In this paper, we propose the use of a three-layer feedforward neural network to select those input attributes that are most useful for discriminating classes in a given set of input patterns. A network pruning algorithm is the foundation of the proposed algorithm. By adding a penalty term to the error function of the network, redundant network connections can be distinguished from those relevant ones by their small weights when the network training process has been completed. A simple criterion to remove an attribute based on the accuracy rate of the network is developed. The network is retrained after removal of an attribute, and the selection process is repeated until no attribute meets the criterion for removal. Our experimental results suggest that the proposed method works very well on a wide variety of classification problems.  相似文献   

5.
Attribute reduction can be defined as the process of determining a minimal subset of attributes from an original set of attributes. This paper proposes a new attribute reduction method that is based on a record-to-record travel algorithm for solving rough set attribute reduction problems. This algorithm has a solitary parameter called the DEVIATION, which plays a pivotal role in controlling the acceptance of the worse solutions, after it becomes pre-tuned. In this paper, we focus on a fuzzy-based record-to-record travel algorithm for attribute reduction (FuzzyRRTAR). This algorithm employs an intelligent fuzzy logic controller mechanism to control the value of DEVIATION, which is dynamically changed throughout the search process. The proposed method was tested on standard benchmark data sets. The results show that FuzzyRRTAR is efficient in solving attribute reduction problems when compared with other meta-heuristic approaches.  相似文献   

6.
属性约简能有效地消除信息冗余,广泛应用于人工智能、机器学习.通过实例指出基于辨识矩阵的经典的属性约简方法存在不能得到约简的可能性,仍具有冗余性.因此,提出了综合属性选择和删除算法的辨识矩阵属性约简方法,并有效解决该问题.通过UCI标准数据集验证表明,新方法比经典方法进一步减少了属性的个数,凸显其实用性和有效性.  相似文献   

7.
如何选择和处理学习样本是地震预报专家系统中非常重要的问题。该文在分析以前方法的特点和不足的基础上,提出了异常驱动样本构造法,并用基于RBF神经网络的属性约简方法来处理学习样本。使用异常驱动样本构造法可以方便、科学地根据异常属性出现的频率选择学习样本的属性和根据每条样本的空缺属性率选择学习样本。基于RBF神经网络的属性约简方法利用RBF(RadialBasisFunction)神经网络的特点来量化各维属性对结果的影响程度,从而约简对结果影响程度小的属性。通过实验,表明了用该文方法进行地震预报样本的选择和处理,能明显地提高地震预报的精度。  相似文献   

8.
Business operation performance is related to corporation profitability and directly affects the choices of investment in the stock market. This paper proposes a hybrid method, which combines the ordered weighted averaging (OWA) operator and rough set theory after an attribute selection procedure to deal with multi-attribute forecasting problems with respect to revenue growth rate of the electronic industry. In the attribute selection step, four most-important attributes within 12 attributes collected from related literature are determined via five attribute selection methods as the input of the following procedure of the proposed method. The OWA operator can adjust the weight of an attribute based on the situation of a decision-maker and aggregate different attribute values into a single aggregated value of each instance, and then the single aggregated values are utilized to generate classification rules by rough set for forecasting operation performance.To verify the proposed method, this research collects the financial data of 629 electronic firms for public companies listed in the TSE (Taiwan Stock Exchange) and OTC (Over-the-Counter) market in 2004 and 2005 to forecast the revenue growth rate. The results show that the proposed method outperforms the listing methods.  相似文献   

9.
王利民  姜汉民 《控制与决策》2019,34(6):1234-1240
经典K阶贝叶斯分类模型(KDB)进行属性排序时,仅考虑类变量与决策属性间的直接相关,而忽略以决策属性为条件二者之间的条件相关.针对以上问题,在KDB结构的基础上,以充分表达属性间的依赖信息为原则,强化属性间的依赖关系,提升决策属性对分类的决策表达,利用类变量与决策属性间的条件互信息优化属性次序,融合属性约简策略剔除冗余属性,降低模型结构复杂带来的过拟合风险,根据贪婪搜索策略选择最优属性并构建模型结构.在UCI机器学习数据库中数据集的实验结果表明,该模型相比于KDB而言,具有更好的分类精度和突出的鲁棒性.  相似文献   

10.
Incremental training has been used for genetic algorithm (GA)‐based classifiers in a dynamic environment where training samples or new attributes/classes become available over time. In this article, ordered incremental genetic algorithms (OIGAs) are proposed to address the incremental training of input attributes for classifiers. Rather than learning input attributes in batch as with normal GAs, OIGAs learn input attributes one after another. The resulting classification rule sets are also evolved incrementally to accommodate the new attributes. Furthermore, attributes are arranged in different orders by evaluating their individual discriminating ability. By experimenting with different attribute orders, different approaches of OIGAs are evaluated using four benchmark classification data sets. Their performance is also compared with normal GAs. The simulation results show that OIGAs can achieve generally better performance than normal GAs. The order of attributes does have an effect on the final classifier performance where OIGA training with a descending order of attributes performs the best. © 2004 Wiley Periodicals, Inc. Int J Int Syst 19: 1239–1256, 2004.  相似文献   

11.
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multiagent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an "integration" operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed.  相似文献   

12.
This paper focuses on ensemble methods for Fuzzy Rule-Based Classification Systems (FRBCS) where the decisions of different classifiers are combined in order to form the final classification model. The proposed methods reduce the FRBCS complexity and the generated rules number. We are interested in particular in ensemble methods which cluster the attributes into subgroups of attributes and treat each subgroup separately. Our work is an extension of a previous ensemble method called SIFRA. This method uses frequent itemsets mining concept in order to deduce the groups of related attributes by analyzing their simultaneous appearances in the databases. The drawback of this method is that it forms the groups of attributes by searching for dependencies between the attributes independently from the class information. Besides, since we deal with supervised learning problems, it would be very interesting to consider the class attribute when forming the attributes subgroups. In this paper, we proposed two new supervised attributes regrouping methods which take into account not only the dependencies between the attributes but also the information about the class labels. The results obtained with various benchmark datasets show a good accuracy of the built classification model.  相似文献   

13.
康猛  蒙祖强 《计算机应用》2022,42(2):449-456
基于区分矩阵的传统属性约简方法具有直观易理解的优点,但时间和空间复杂度都很高,当数据规模较大或条件属性较多时,会无法快速得到约简结果.为解决该问题,在区分关系的基础上构造了条件区分能力来进行属性选择,提出一种基于条件区分能力的属性约简算法.而为了进一步加快属性重要性的计算、提高约简效率,依据大数定律中频率的稳定性,通过...  相似文献   

14.
Dimensionality reduction has been applied in the most different areas, among which the data analysis of gene expression obtained with the microarray approach. The data involved in this problem is challenging for machine learning algorithms due to a small number of samples and a high number of attributes. This paper proposes a preprocessing phase by means of attribute selection and random projection method in microarray data. Experimental results are promising and show that the use of these methods improves the performance of classification algorithms.  相似文献   

15.
We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.  相似文献   

16.
Kernel-based methods have been widely investigated in the soft-computing community. However, they focus mainly on numeric data. In this paper, we propose a novel method for kernel learning on categorical data, and show how the method can be used to derive effective classifiers for linear classification. Based on kernel density estimation for categorical attributes, three popular classification methods, i.e., Naive Bayes, nearest neighbor and prototype-based classification, are effectively extended to classify categorical data. We also propose two data-driven approaches to the bandwidth selection problem, with one aimed at minimizing the mean squared error of the kernel estimate and the other endeavored to attribute weights optimization. Theoretical analysis indicates that, as in the numeric case, kernel learning of categorical attributes is capable to make the classes to be more separable, resulting in outstanding performances of the new classifiers on various real-world data sets.  相似文献   

17.
王荣  陈纯 《计算机应用与软件》2007,24(11):98-99,113
数据挖掘是从海量数据中提取隐含在其中的、针对某些用户的信息的高级处理过程.属性选择是数据挖掘领域非常重要的一个研究方向,属性选择的好坏对挖掘的性能和结果有着很大的影响.提出了一种新的属性选择算法,即基于信息增益和卡方检验的属性选择算法,并在离网预测模型中得到了应用,取得了相当不错的效果.  相似文献   

18.
信息系统中的属性约简是粗糙集知识发现的一个重要步骤。致力于研究一个信息系统中的特征选择、删除冗余属性。新的算法从属性重要性出发,采用迭代特征选择的标准,使得选择特征属性集不断缩小,获得信息系统的约简。通过实验证明该方法可行,有效。  相似文献   

19.
研究信息系统的属性重要性评分方法,通过引入敏感系数构建神经网络模型,提出属性重要性评分算法,将信息系统的各条件属性和决策属性构造一个径向基函数(RBF)神经网络。经训练和学习后,综合考虑各属性间的关系,动态调整RBF网络的拓扑结构,评分各属性的重要性。以红籽西瓜性状数据作为样本数据和测试数据进行实例分析,验证该方法的有效性。  相似文献   

20.
A compact and accurate model for classification   总被引:6,自引:0,他引:6  
We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: when no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号