共查询到20条相似文献,搜索用时 15 毫秒
1.
Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Traditional hill-climbing search approaches to feature selection have difficulties to find optimal reducts. And the current stochastic search strategies, such as GA, ACO and PSO, provide a more robust solution but at the expense of increased computational effort. It is necessary to investigate fast and effective search algorithms. Rough set theory provides a mathematical tool to discover data dependencies and reduce the number of features contained in a dataset by purely structural methods. In this paper, we define a structure called power set tree (PS-tree), which is an order tree representing the power set, and each possible reduct is mapped to a node of the tree. Then, we present a rough set approach to feature selection based on PS-tree. Two kinds of pruning rules for PS-tree are given. And two novel feature selection algorithms based on PS-tree are also given. Experiment results demonstrate that our algorithms are effective and efficient. 相似文献
2.
在数据分析中,特征选择是能够保留信息的数据约简的一个有效方法。粗糙集理论提供了一种发现所有可能的特征子集的数学工具。提出了一种新的基于粗糙集的启发函数叫做加权平均支持启发函数。该方法的优点是它考虑了可能性规则集的整体质量。也就是说,对所有的决策类,它考虑了规则的加权平均支持度。最后,实例表明该方法是有效的。 相似文献
3.
Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data. 相似文献
4.
Si-Yuan Jing 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(7):1373-1382
Rough set theory has been proven to be an effective tool to feature subset selection. Current research usually employ hill-climbing as search strategy to select feature subset. However, they are inadequate to find the optimal feature subset since no heuristic can guarantee optimality. Due to this, many researchers study stochastic methods. Since previous works of combination of genetic algorithm and rough set theory do not show competitive performance compared with some other stochastic methods, we propose a hybrid genetic algorithm for feature subset selection in this paper, called HGARSTAR. Different from previous works, HGARSTAR embeds a novel local search operation based on rough set theory to fine-tune the search. This aims to enhance GA’s intensification ability. Moreover, all candidates (i.e. feature subsets) generated in evolutionary process are enforced to include core features to accelerate convergence. To verify the proposed algorithm, experiments are performed on some standard UCI datasets. Experimental results demonstrate the efficiency of our algorithm. 相似文献
5.
X. Z. Gao X. Wang T. Jokinen S. J. Ovaska A. Arkkio K. Zenger 《Neural computing & applications》2012,21(5):1071-1083
The harmony search (HS) method is a popular meta-heuristic optimization algorithm, which has been extensively employed to handle various engineering problems. However, it sometimes fails to offer a satisfactory convergence performance under certain circumstances. In this paper, we propose and study a hybrid HS approach, HS–PBIL, by merging the HS together with the population-based incremental learning (PBIL). Numerical simulations demonstrate that our HS–PBIL is well capable of outperforming the regular HS method in dealing with nonlinear function optimization and a practical wind generator optimization problem. 相似文献
6.
Rough set theory is one of the effective methods to feature selection, which can preserve the meaning of the features. The essence of rough set approach to feature selection is to find a subset of the original features. Since finding a minimal subset of the features is a NP-hard problem, it is necessary to investigate effective and efficient heuristic algorithms. Ant colony optimization (ACO) has been successfully applied to many difficult combinatorial problems like quadratic assignment, traveling salesman, scheduling, etc. It is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. However, ants can discover the best feature combinations as they traverse the graph. In this paper, we propose a new rough set approach to feature selection based on ACO, which adopts mutual information based feature significance as heuristic information. A novel feature selection algorithm is also given. Jensen and Shen proposed a ACO-based feature selection approach which starts from a random feature. Our approach starts from the feature core, which changes the complete graph to a smaller one. To verify the efficiency of our algorithm, experiments are carried out on some standard UCI datasets. The results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features. 相似文献
7.
针对"大数据"中常见的大规模无监督数据集中特征选择速度难以满足实际应用要求的问题,在经典粗糙集绝对约简增量式算法的基础上提出了一种快速的属性选择算法。首先,将大规模数据集看作一个随机到来的对象序列,并初始化候选约简为空集;然后每次都从大规模数据集中无放回地随机抽取一个对象,并且每次都判断使用当前候选约简能否区分这一对象和当前对象集中所有应当区分的对象,并将该对象放入到当前对象集中,如果不能区分则向候选约简中添加合适的属性;最后,如果连续I次都没有发现无法区分的对象,那么将候选约简作为大规模数据集的约简。在5个非监督大规模数据集上的实验表明,所求得的约简能够区分95%以上的对象对,并且求取该约简所需的时间不到基于区分矩阵的算法和增量式约简算法的1%;在文本主题挖掘的实验中,使用约简后的数据集挖掘出的文本主题同原始数据集挖掘出的主题基本一致。两组实验结果表明该方法能够有效快速对大规模数据集进行属性选择。 相似文献
8.
A novel correlation based memetic framework (MA-C) which is a combination of genetic algorithm (GA) and local search (LS) using correlation based filter ranking is proposed in this paper. The local filter method used here fine-tunes the population of GA solutions by adding or deleting features based on Symmetrical Uncertainty (SU) measure. The focus here is on filter methods that are able to assess the goodness or ranking of the individual features. Empirical study of MA-C on several commonly used datasets from the large-scale Gene expression datasets indicates that it outperforms recent existing methods in the literature in terms of classification accuracy, selected feature size and efficiency. Further, we also investigate the balance between local and genetic search to maximize the search quality and efficiency of MA-C. 相似文献
9.
Applied Intelligence - With increases in feature dimensions and the emergence of hierarchical class structures, hierarchical feature selection has become an important data preprocessing step in... 相似文献
10.
基于模糊粗糙集信息熵的蚁群特征选择方法 总被引:1,自引:0,他引:1
目前针对高维数据特征选择提出的启发式算法多数容易陷入局部最优,无法对整个特征空间进行有效搜索。为了提高对特征域的并行搜索能力,基于模糊粗糙集的信息熵原理,对蚁群模型的搜索策略、信息素更新和状态转移规则等进行了改进,提出蚁群特征选择方法。经UCI数据实验验证,该算法比传统的特征选择算法具有更好的选择效果,是有效的。 相似文献
11.
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods. 相似文献
12.
Md. Monirul KabirAuthor Vitae 《Neurocomputing》2011,74(17):2914-2928
This paper presents a new hybrid genetic algorithm (HGA) for feature selection (FS), called as HGAFS. The vital aspect of this algorithm is the selection of salient feature subset within a reduced size. HGAFS incorporates a new local search operation that is devised and embedded in HGA to fine-tune the search in FS process. The local search technique works on basis of the distinct and informative nature of input features that is computed by their correlation information. The aim is to guide the search process so that the newly generated offsprings can be adjusted by the less correlated (distinct) features consisting of general and special characteristics of a given dataset. Thus, the proposed HGAFS receives the reduced redundancy of information among the selected features. On the other hand, HGAFS emphasizes on selecting a subset of salient features with reduced number using a subset size determination scheme. We have tested our HGAFS on 11 real-world classification datasets having dimensions varying from 8 to 7129. The performances of HGAFS have been compared with the results of other existing ten well-known FS algorithms. It is found that, HGAFS produces consistently better performances on selecting the subsets of salient features with resulting better classification accuracies. 相似文献
13.
14.
15.
Recently, many methods have been proposed for microarray data analysis. One of the challenges for microarray applications is to select a proper number of the most relevant genes for data analysis. In this paper, we propose a novel hybrid method for feature selection in microarray data analysis. This method first uses a genetic algorithm with dynamic parameter setting (GADP) to generate a number of subsets of genes and to rank the genes according to their occurrence frequencies in the gene subsets. Then, this method uses the χ2-test for homogeneity to select a proper number of the top-ranked genes for data analysis. We use the support vector machine (SVM) to verify the efficiency of the selected genes. Six different microarray datasets are used to compare the performance of the GADP method with the existing methods. The experimental results show that the GADP method is better than the existing methods in terms of the number of selected genes and the prediction accuracy. 相似文献
16.
17.
Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques. 相似文献
18.
19.
基于粗糙集的故障诊断特征提取 总被引:11,自引:3,他引:11
故障的特征提取对于进行准确可靠的诊断非常重要。而实际的故障诊断数据样本的分类边界常常是不确定的,并且故障与征兆之间的关系往往也是不确定的。粗糙集理论是处理模糊和不确定性问题的新的数学工具。论文将粗糙集理论引入到故障诊断特征提取,提出了一种基于粗糙集的故障诊断特征提取方法。并通过两个故障诊断实例对该方法进行了验证。结果表明:在有效地保持故障诊断分类结果的情况下,该方法可以提取出最能反映故障的特征,从而为粗糙集在故障诊断中的深入应用打下了基础。 相似文献
20.
Pattern Analysis and Applications - Multi-label feature selection has been essential in many big data applications and plays a significant role in processing high-dimensional data. However, the... 相似文献