首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对约简属性组合的爆炸问题,将RS属性核参数作为先验信息的免疫疫苗引入抗体编码,概率性对种群接种疫苗.将属性集合的分类近似标准作为抗体适应度,通过在免疫克隆选择过程中引入聚类竞争机制,提高抗体群分布的多样性及亲和力成熟,从而获得多个属性约简及最小约简的平衡.实验结果表明,这种粗糙集属性约简方法对于多维条件属性集是快速且有效的.  相似文献   

2.
Neighborhood rough set based heterogeneous feature subset selection   总被引:6,自引:0,他引:6  
Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.  相似文献   

3.
Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques.  相似文献   

4.
Soft fuzzy rough sets for robust feature evaluation and selection   总被引:2,自引:0,他引:2  
The fuzzy dependency function proposed in the fuzzy rough set model is widely employed in feature evaluation and attribute reduction. It is shown that this function is not robust to noisy information in this paper. As datasets in real-world applications are usually contaminated by noise, robustness of data analysis models is very important in practice. In this work, we develop a new model of fuzzy rough sets, called soft fuzzy rough sets, which can reduce the influence of noise. We discuss the properties of the model and construct a new dependence function from the model. Then we use the function to evaluate and select features. The presented experimental results show the effectiveness of the new model.  相似文献   

5.
Solving the feature selection problem is considered an important issue when addressing data from real applications that contain a large number of features. However, not all of these features are important; therefore, the redundant features must be removed because they affect the accuracy of the data representation and introduce time complexity into the analysis of these data. For these reasons, the feature selection problem is considered an NP-complete nonlinearly constrained optimization problem. The rough set (RS) and neighborhood rough set (NRS) are the most powerful methods used to solve the feature selection problem; however, both approaches suffer from high time complexity. To avoid these limitations, we combined the RS and NRS with a new metaheuristic algorithm called the runner-root algorithm (RRA). The spirit of the RRA originated from real-life plants called running plants, which have roots and runners that spread the plants in search of minerals and water resources through their root and runner development. To validate the proposed algorithm, several UCI Machine Learning Repository datasets are used to compute the performance of our algorithm employing two effective classifiers, the random forest and the K-nearest neighbor, in addition to some other measures for the performance evaluation. The experimental results illustrate that the proposed algorithm is superior to the state-of-the-art metaheuristic algorithms in terms of the performance measures. Additionally, the NRS increases the performance of the proposed method more than the RS as an objective function.  相似文献   

6.
This paper investigates feature selection based on rough sets for dimensionality reduction in Case-Based Reasoning classifiers. In order to be useful, Case-Based Reasoning systems should be able to manage imprecise, uncertain and redundant data to retrieve the most relevant information in a potentially overwhelming quantity of data. Rough Set Theory has been shown to be an effective tool for data mining and for uncertainty management. This paper has two central contributions: (1) it develops three strategies for feature selection, and (2) it proposes several measures for estimating attribute relevance based on Rough Set Theory. Although we concentrate on Case-Based Reasoning classifiers, the proposals are general enough to be applicable to a wide range of learning algorithms. We applied these proposals on twenty data sets from the UCI repository and examined the impact of feature selection over classification performance. Our evaluation shows that all three proposals benefit the basic Case-Based Reasoning system. They also present robustness in comparison to well-known feature selection strategies.  相似文献   

7.
Feature selection is about finding useful (relevant) features to describe an application domain. Selecting relevant and enough features to effectively represent and index the given dataset is an important task to solve the classification and clustering problems intelligently. This task is, however, quite difficult to carry out since it usually needs a very time-consuming search to get the features desired. This paper proposes a bit-based feature selection method to find the smallest feature set to represent the indexes of a given dataset. The proposed approach originates from the bitmap indexing and rough set techniques. It consists of two-phases. In the first phase, the given dataset is transformed into a bitmap indexing matrix with some additional data information. In the second phase, a set of relevant and enough features are selected and used to represent the classification indexes of the given dataset. After the relevant and enough features are selected, they can be judged by the domain expertise and the final feature set of the given dataset is thus proposed. Finally, the experimental results on different data sets also show the efficiency and accuracy of the proposed approach.  相似文献   

8.
The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process. For this reason, many methods of automatic feature selection have been developed. Some of these methods are based on the search of the features that allows the data set to be considered consistent. In a search problem we usually evaluate the search states, in the case of feature selection we measure the possible feature sets. This paper reviews the state of the art of consistency based feature selection methods, identifying the measures used for feature sets. An in-deep study of these measures is conducted, including the definition of a new measure necessary for completeness. After that, we perform an empirical evaluation of the measures comparing them with the highly reputed wrapper approach. Consistency measures achieve similar results to those of the wrapper approach with much better efficiency.  相似文献   

9.
Dubois and Prade (1990) [1] introduced the notion of fuzzy rough sets as a fuzzy generalization of rough sets, which was originally proposed by Pawlak (1982) [8]. Later, Radzikowska and Kerre introduced the so-called (I,T)-fuzzy rough sets, where I is an implication and T is a triangular norm. In the present paper, by using a pair of implications (I,J), we define the so-called (I,J)-fuzzy rough sets, which generalize the concept of fuzzy rough sets in the sense of Radzikowska and Kerre, and that of Mi and Zhang. Basic properties of (I,J)-fuzzy rough sets are investigated in detail.  相似文献   

10.
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods.  相似文献   

11.
数据库通常包含很多冗余特征,找出重要特征叫做特征提取。本文提出一种基于属性重要度的启发式特征选取算法。该算法以属性重要度为迭代准则得到属性集合的最小约简。  相似文献   

12.
This paper presents a new extension of fuzzy sets: R-fuzzy sets. The membership of an element of a R-fuzzy set is represented as a rough set. This new extension facilitates the representation of an uncertain fuzzy membership with a rough approximation. Based on our definition of R-fuzzy sets and their operations, the relationships between R-fuzzy sets and other fuzzy sets are discussed and some examples are provided.  相似文献   

13.
The aim of this paper is to provide an efficient input feature selection algorithm for modeling of systems based on modified definition of fuzzy-rough sets. Some of the critical issues concerning the complexity and convergence of the feature selection algorithm are discussed in detail. Based on some natural properties of fuzzy t-norm and t-conorm operators, the concept of fuzzy-rough sets on compact computational domain is put forward, which is then utilized to construct improved Fuzzy-Rough Feature Selection algorithm. Various mathematical properties of this new definition of fuzzy-rough sets are discussed from pattern classification viewpoint. Speedup factor as high as 622 has been achieved with proposed algorithm compared to recently proposed FRSAR, with improved model performance on selected set of features.  相似文献   

14.
Applying rough sets to market timing decisions   总被引:1,自引:0,他引:1  
A lot of research has been done to predict economic development. The problem studied here is about the stock prediction for use of investors. More specifically, the stock market's movements will be analyzed and predicted. We wish to retrieve knowledge that could guide investors on when to buy and sell.Through a detailed case study on trading S&P 500 index, rough sets is shown to be an applicable and effective tool to achieve this goal. Some problems concerning time series transformation, indicator selection, trading system building in real implementation are also discussed.  相似文献   

15.
Bayesian networks provide the means for representing probabilistic conditional independence. Conditional independence is widely considered also beyond the theory of probability, with linkages to, e.g. the database multi-valued dependencies, and at a higher abstraction level of semi-graphoid models. The rough set framework for data analysis is related to the topics of conditional independence via the notion of a decision reduct, to be considered within a wider domain of the feature selection. Given probabilistic version of decision reducts equivalent to the data-based Markov boundaries, the studies were also conducted for other criteria of the rough-set-based feature selection, e.g. those corresponding to the multi-valued dependencies. In this paper, we investigate the degrees of approximate conditional dependence, which could be a topic corresponding to the well-known notions such as conditional mutual information and polymatroid functions, however, with many practically useful approximate conditional independence models unmanageable within the information theoretic framework. The major paper’s contribution lays in extending the means for understanding the degrees of approximate conditional dependence, with appropriately generalized semi-graphoid properties formulated and with the mathematical soundness of the Bayesian network-like representation of the approximate conditional independence statements thoroughly proved. As an additional contribution, we provide a case study of the approximate conditional independence model, which would not be manageable without the above-mentioned extensions.  相似文献   

16.
Efficient attribute reduction in large, incomplete decision systems is a challenging problem; existing approaches have time complexities no less than O(∣C2U2). This paper derives some important properties of incomplete information systems, then constructs a positive region-based algorithm to solve the attribute reduction problem with a time complexity no more than O(∣C2U∣log∣U∣). Furthermore, our approach does not change the size of the original incomplete system. Numerical experiments show that the proposed approach is indeed efficient, and therefore of practical value to many real-world problems. The proposed algorithm can be applied to both consistent and inconsistent incomplete decision systems.  相似文献   

17.
MGRS: A multi-granulation rough set   总被引:4,自引:0,他引:4  
The original rough set model was developed by Pawlak, which is mainly concerned with the approximation of sets described by a single binary relation on the universe. In the view of granular computing, the classical rough set theory is established through a single granulation. This paper extends Pawlak’s rough set model to a multi-granulation rough set model (MGRS), where the set approximations are defined by using multi equivalence relations on the universe. A number of important properties of MGRS are obtained. It is shown that some of the properties of Pawlak’s rough set theory are special instances of those of MGRS.Moreover, several important measures, such as accuracy measureα, quality of approximationγ and precision of approximationπ, are presented, which are re-interpreted in terms of a classic measure based on sets, the Marczewski-Steinhaus metric and the inclusion degree measure. A concept of approximation reduct is introduced to describe the smallest attribute subset that preserves the lower approximation and upper approximation of all decision classes in MGRS as well. Finally, we discuss how to extract decision rules using MGRS. Unlike the decision rules (“AND” rules) from Pawlak’s rough set model, the form of decision rules in MGRS is “OR”. Several pivotal algorithms are also designed, which are helpful for applying this theory to practical issues. The multi-granulation rough set model provides an effective approach for problem solving in the context of multi granulations.  相似文献   

18.
一种基于粗糙集理论的最简决策规则挖掘算法   总被引:1,自引:2,他引:1       下载免费PDF全文
钱进  孟祥萍  刘大有  叶飞跃 《控制与决策》2007,22(12):1368-1372
研究粗糙集理论中可辨识矩阵,扩展了类别特征矩阵,提出一种基于粗糙集理论的最筒决策规则算法.该算法根据决策属性将原始决策表分成若干个等价子决策表.借助核属性和属性频率函数对各类别特征矩阵挖掘出最简决策规则.与可辨识矩阵相比,采用类别特征矩阵可有效减少存储空间和时间复杂度。增强规则的泛化能力.实验结果表明,采用所提出的算法获得的规则更为简洁和高效.  相似文献   

19.
Given a large set of potential features, it is usually necessary to find a small subset with which to classify. The task of finding an optimal feature set is inherently combinatoric and therefore suboptimal algorithms are typically used to find feature sets. If feature selection is based directly on classification error, then a feature-selection algorithm must base its decision on error estimates. This paper addresses the impact of error estimation on feature selection using two performance measures: comparison of the true error of the optimal feature set with the true error of the feature set found by a feature-selection algorithm, and the number of features among the truly optimal feature set that appear in the feature set found by the algorithm. The study considers seven error estimators applied to three standard suboptimal feature-selection algorithms and exhaustive search, and it considers three different feature-label model distributions. It draws two conclusions for the cases considered: (1) depending on the sample size and the classification rule, feature-selection algorithms can produce feature sets whose corresponding classifiers possess errors far in excess of the classifier corresponding to the optimal feature set; and (2) for small samples, differences in performances among the feature-selection algorithms are less significant than performance differences among the error estimators used to implement the algorithms. Moreover, keeping in mind that results depend on the particular classifier-distribution pair, for the error estimators considered in this study, bootstrap and bolstered resubstitution usually outperform cross-validation, and bolstered resubstitution usually performs as well as or better than bootstrap.  相似文献   

20.
Traditional rough set theory is mainly used to extract rules from and reduce attributes in databases in which attributes are characterized by partitions, while the covering rough set theory, a generalization of traditional rough set theory, does the same yet characterizes attributes by covers. In this paper, we propose a way to reduce the attributes of covering decision systems, which are databases characterized by covers. First, we define consistent and inconsistent covering decision systems and their attribute reductions. Then, we state the sufficient and the necessary conditions for reduction. Finally, we use a discernibility matrix to design algorithms that compute all the reducts of consistent and inconsistent covering decision systems. Numerical tests on four public data sets show that the proposed attribute reductions of covering decision systems accomplish better classification performance than those of traditional rough sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号