首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Neighborhood rough set based heterogeneous feature subset selection   总被引:6,自引:0,他引:6  
Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.  相似文献   

2.
Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques.  相似文献   

3.
Soft fuzzy rough sets for robust feature evaluation and selection   总被引:2,自引:0,他引:2  
The fuzzy dependency function proposed in the fuzzy rough set model is widely employed in feature evaluation and attribute reduction. It is shown that this function is not robust to noisy information in this paper. As datasets in real-world applications are usually contaminated by noise, robustness of data analysis models is very important in practice. In this work, we develop a new model of fuzzy rough sets, called soft fuzzy rough sets, which can reduce the influence of noise. We discuss the properties of the model and construct a new dependence function from the model. Then we use the function to evaluate and select features. The presented experimental results show the effectiveness of the new model.  相似文献   

4.
This paper investigates feature selection based on rough sets for dimensionality reduction in Case-Based Reasoning classifiers. In order to be useful, Case-Based Reasoning systems should be able to manage imprecise, uncertain and redundant data to retrieve the most relevant information in a potentially overwhelming quantity of data. Rough Set Theory has been shown to be an effective tool for data mining and for uncertainty management. This paper has two central contributions: (1) it develops three strategies for feature selection, and (2) it proposes several measures for estimating attribute relevance based on Rough Set Theory. Although we concentrate on Case-Based Reasoning classifiers, the proposals are general enough to be applicable to a wide range of learning algorithms. We applied these proposals on twenty data sets from the UCI repository and examined the impact of feature selection over classification performance. Our evaluation shows that all three proposals benefit the basic Case-Based Reasoning system. They also present robustness in comparison to well-known feature selection strategies.  相似文献   

5.
Feature selection is about finding useful (relevant) features to describe an application domain. Selecting relevant and enough features to effectively represent and index the given dataset is an important task to solve the classification and clustering problems intelligently. This task is, however, quite difficult to carry out since it usually needs a very time-consuming search to get the features desired. This paper proposes a bit-based feature selection method to find the smallest feature set to represent the indexes of a given dataset. The proposed approach originates from the bitmap indexing and rough set techniques. It consists of two-phases. In the first phase, the given dataset is transformed into a bitmap indexing matrix with some additional data information. In the second phase, a set of relevant and enough features are selected and used to represent the classification indexes of the given dataset. After the relevant and enough features are selected, they can be judged by the domain expertise and the final feature set of the given dataset is thus proposed. Finally, the experimental results on different data sets also show the efficiency and accuracy of the proposed approach.  相似文献   

6.
The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process. For this reason, many methods of automatic feature selection have been developed. Some of these methods are based on the search of the features that allows the data set to be considered consistent. In a search problem we usually evaluate the search states, in the case of feature selection we measure the possible feature sets. This paper reviews the state of the art of consistency based feature selection methods, identifying the measures used for feature sets. An in-deep study of these measures is conducted, including the definition of a new measure necessary for completeness. After that, we perform an empirical evaluation of the measures comparing them with the highly reputed wrapper approach. Consistency measures achieve similar results to those of the wrapper approach with much better efficiency.  相似文献   

7.
Dubois and Prade (1990) [1] introduced the notion of fuzzy rough sets as a fuzzy generalization of rough sets, which was originally proposed by Pawlak (1982) [8]. Later, Radzikowska and Kerre introduced the so-called (I,T)-fuzzy rough sets, where I is an implication and T is a triangular norm. In the present paper, by using a pair of implications (I,J), we define the so-called (I,J)-fuzzy rough sets, which generalize the concept of fuzzy rough sets in the sense of Radzikowska and Kerre, and that of Mi and Zhang. Basic properties of (I,J)-fuzzy rough sets are investigated in detail.  相似文献   

8.
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods.  相似文献   

9.
数据库通常包含很多冗余特征,找出重要特征叫做特征提取。本文提出一种基于属性重要度的启发式特征选取算法。该算法以属性重要度为迭代准则得到属性集合的最小约简。  相似文献   

10.
This paper presents a new extension of fuzzy sets: R-fuzzy sets. The membership of an element of a R-fuzzy set is represented as a rough set. This new extension facilitates the representation of an uncertain fuzzy membership with a rough approximation. Based on our definition of R-fuzzy sets and their operations, the relationships between R-fuzzy sets and other fuzzy sets are discussed and some examples are provided.  相似文献   

11.
Bayesian networks provide the means for representing probabilistic conditional independence. Conditional independence is widely considered also beyond the theory of probability, with linkages to, e.g. the database multi-valued dependencies, and at a higher abstraction level of semi-graphoid models. The rough set framework for data analysis is related to the topics of conditional independence via the notion of a decision reduct, to be considered within a wider domain of the feature selection. Given probabilistic version of decision reducts equivalent to the data-based Markov boundaries, the studies were also conducted for other criteria of the rough-set-based feature selection, e.g. those corresponding to the multi-valued dependencies. In this paper, we investigate the degrees of approximate conditional dependence, which could be a topic corresponding to the well-known notions such as conditional mutual information and polymatroid functions, however, with many practically useful approximate conditional independence models unmanageable within the information theoretic framework. The major paper’s contribution lays in extending the means for understanding the degrees of approximate conditional dependence, with appropriately generalized semi-graphoid properties formulated and with the mathematical soundness of the Bayesian network-like representation of the approximate conditional independence statements thoroughly proved. As an additional contribution, we provide a case study of the approximate conditional independence model, which would not be manageable without the above-mentioned extensions.  相似文献   

12.
Applying rough sets to market timing decisions   总被引:1,自引:0,他引:1  
A lot of research has been done to predict economic development. The problem studied here is about the stock prediction for use of investors. More specifically, the stock market's movements will be analyzed and predicted. We wish to retrieve knowledge that could guide investors on when to buy and sell.Through a detailed case study on trading S&P 500 index, rough sets is shown to be an applicable and effective tool to achieve this goal. Some problems concerning time series transformation, indicator selection, trading system building in real implementation are also discussed.  相似文献   

13.
MGRS: A multi-granulation rough set   总被引:4,自引:0,他引:4  
The original rough set model was developed by Pawlak, which is mainly concerned with the approximation of sets described by a single binary relation on the universe. In the view of granular computing, the classical rough set theory is established through a single granulation. This paper extends Pawlak’s rough set model to a multi-granulation rough set model (MGRS), where the set approximations are defined by using multi equivalence relations on the universe. A number of important properties of MGRS are obtained. It is shown that some of the properties of Pawlak’s rough set theory are special instances of those of MGRS.Moreover, several important measures, such as accuracy measureα, quality of approximationγ and precision of approximationπ, are presented, which are re-interpreted in terms of a classic measure based on sets, the Marczewski-Steinhaus metric and the inclusion degree measure. A concept of approximation reduct is introduced to describe the smallest attribute subset that preserves the lower approximation and upper approximation of all decision classes in MGRS as well. Finally, we discuss how to extract decision rules using MGRS. Unlike the decision rules (“AND” rules) from Pawlak’s rough set model, the form of decision rules in MGRS is “OR”. Several pivotal algorithms are also designed, which are helpful for applying this theory to practical issues. The multi-granulation rough set model provides an effective approach for problem solving in the context of multi granulations.  相似文献   

14.
Efficient attribute reduction in large, incomplete decision systems is a challenging problem; existing approaches have time complexities no less than O(∣C2U2). This paper derives some important properties of incomplete information systems, then constructs a positive region-based algorithm to solve the attribute reduction problem with a time complexity no more than O(∣C2U∣log∣U∣). Furthermore, our approach does not change the size of the original incomplete system. Numerical experiments show that the proposed approach is indeed efficient, and therefore of practical value to many real-world problems. The proposed algorithm can be applied to both consistent and inconsistent incomplete decision systems.  相似文献   

15.
Given a large set of potential features, it is usually necessary to find a small subset with which to classify. The task of finding an optimal feature set is inherently combinatoric and therefore suboptimal algorithms are typically used to find feature sets. If feature selection is based directly on classification error, then a feature-selection algorithm must base its decision on error estimates. This paper addresses the impact of error estimation on feature selection using two performance measures: comparison of the true error of the optimal feature set with the true error of the feature set found by a feature-selection algorithm, and the number of features among the truly optimal feature set that appear in the feature set found by the algorithm. The study considers seven error estimators applied to three standard suboptimal feature-selection algorithms and exhaustive search, and it considers three different feature-label model distributions. It draws two conclusions for the cases considered: (1) depending on the sample size and the classification rule, feature-selection algorithms can produce feature sets whose corresponding classifiers possess errors far in excess of the classifier corresponding to the optimal feature set; and (2) for small samples, differences in performances among the feature-selection algorithms are less significant than performance differences among the error estimators used to implement the algorithms. Moreover, keeping in mind that results depend on the particular classifier-distribution pair, for the error estimators considered in this study, bootstrap and bolstered resubstitution usually outperform cross-validation, and bolstered resubstitution usually performs as well as or better than bootstrap.  相似文献   

16.
Traditional rough set theory is mainly used to extract rules from and reduce attributes in databases in which attributes are characterized by partitions, while the covering rough set theory, a generalization of traditional rough set theory, does the same yet characterizes attributes by covers. In this paper, we propose a way to reduce the attributes of covering decision systems, which are databases characterized by covers. First, we define consistent and inconsistent covering decision systems and their attribute reductions. Then, we state the sufficient and the necessary conditions for reduction. Finally, we use a discernibility matrix to design algorithms that compute all the reducts of consistent and inconsistent covering decision systems. Numerical tests on four public data sets show that the proposed attribute reductions of covering decision systems accomplish better classification performance than those of traditional rough sets.  相似文献   

17.
Vanessa  Michel  Jrme 《Neurocomputing》2009,72(16-18):3580
The classification of functional or high-dimensional data requires to select a reduced subset of features among the initial set, both to help fighting the curse of dimensionality and to help interpreting the problem and the model. The mutual information criterion may be used in that context, but it suffers from the difficulty of its estimation through a finite set of samples. Efficient estimators are not designed specifically to be applied in a classification context, and thus suffer from further drawbacks and difficulties. This paper presents an estimator of mutual information that is specifically designed for classification tasks, including multi-class ones. It is combined to a recently published stopping criterion in a traditional forward feature selection procedure. Experiments on both traditional benchmarks and on an industrial functional classification problem show the added value of this estimator.  相似文献   

18.
Attribute selection with fuzzy decision reducts   总被引:2,自引:0,他引:2  
Rough set theory provides a methodology for data analysis based on the approximation of concepts in information systems. It revolves around the notion of discernibility: the ability to distinguish between objects, based on their attribute values. It allows to infer data dependencies that are useful in the fields of feature selection and decision model construction. In many cases, however, it is more natural, and more effective, to consider a gradual notion of discernibility. Therefore, within the context of fuzzy rough set theory, we present a generalization of the classical rough set framework for data-based attribute selection and reduction using fuzzy tolerance relations. The paper unifies existing work in this direction, and introduces the concept of fuzzy decision reducts, dependent on an increasing attribute subset measure. Experimental results demonstrate the potential of fuzzy decision reducts to discover shorter attribute subsets, leading to decision models with a better coverage and with comparable, or even higher accuracy.  相似文献   

19.
Soft sets and soft rough sets   总被引:4,自引:0,他引:4  
In this study, we establish an interesting connection between two mathematical approaches to vagueness: rough sets and soft sets. Soft set theory is utilized, for the first time, to generalize Pawlak’s rough set model. Based on the novel granulation structures called soft approximation spaces, soft rough approximations and soft rough sets are introduced. Basic properties of soft rough approximations are presented and supported by some illustrative examples. We also define new types of soft sets such as full soft sets, intersection complete soft sets and partition soft sets. The notion of soft rough equal relations is proposed and related properties are examined. We also show that Pawlak’s rough set model can be viewed as a special case of the soft rough sets, and these two notions will coincide provided that the underlying soft set in the soft approximation space is a partition soft set. Moreover, an example containing a comparative analysis between rough sets and soft rough sets is given.  相似文献   

20.
Attribute reduction is considered as an important preprocessing step for pattern recognition, machine learning, and data mining. This paper provides a systematic study on attribute reduction with rough sets based on general binary relations. We define a relation information system, a consistent relation decision system, and a relation decision system and their attribute reductions. Furthermore, we present a judgment theorem and a discernibility matrix associated with attribute reduction in each type of system; based on the discernibility matrix, we can compute all the reducts. Finally, the experimental results with UCI data sets show that the proposed reduction methods are an effective technique to deal with complex data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号