首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Software Product Line (SPL) customizes software by combining various existing features of the software with multiple variants. The main challenge is selecting valid features considering the constraints of the feature model. To solve this challenge, a hybrid approach is proposed to optimize the feature selection problem in software product lines. The Hybrid approach ‘Hyper-PSOBBO’ is a combination of Particle Swarm Optimization (PSO), Biogeography-Based Optimization (BBO) and hyper-heuristic algorithms. The proposed algorithm has been compared with Bird Swarm Algorithm (BSA), PSO, BBO, Firefly, Genetic Algorithm (GA) and Hyper-heuristic. All these algorithms are performed in a set of 10 feature models that vary from a small set of 100 to a high-quality data set of 5000. The detailed empirical analysis in terms of performance has been carried out on these feature models. The results of the study indicate that the performance of the proposed method is higher to other state-of-the-art algorithms.

  相似文献   

2.
A new local search based hybrid genetic algorithm for feature selection   总被引:2,自引:0,他引:2  
This paper presents a new hybrid genetic algorithm (HGA) for feature selection (FS), called as HGAFS. The vital aspect of this algorithm is the selection of salient feature subset within a reduced size. HGAFS incorporates a new local search operation that is devised and embedded in HGA to fine-tune the search in FS process. The local search technique works on basis of the distinct and informative nature of input features that is computed by their correlation information. The aim is to guide the search process so that the newly generated offsprings can be adjusted by the less correlated (distinct) features consisting of general and special characteristics of a given dataset. Thus, the proposed HGAFS receives the reduced redundancy of information among the selected features. On the other hand, HGAFS emphasizes on selecting a subset of salient features with reduced number using a subset size determination scheme. We have tested our HGAFS on 11 real-world classification datasets having dimensions varying from 8 to 7129. The performances of HGAFS have been compared with the results of other existing ten well-known FS algorithms. It is found that, HGAFS produces consistently better performances on selecting the subsets of salient features with resulting better classification accuracies.  相似文献   

3.
Feature selection is an important filtering method for data analysis, pattern classification, data mining, and so on. Feature selection reduces the number of features by removing irrelevant and redundant data. In this paper, we propose a hybrid filter–wrapper feature subset selection algorithm called the maximum Spearman minimum covariance cuckoo search (MSMCCS). First, based on Spearman and covariance, a filter algorithm is proposed called maximum Spearman minimum covariance (MSMC). Second, three parameters are proposed in MSMC to adjust the weights of the correlation and redundancy, improve the relevance of feature subsets, and reduce the redundancy. Third, in the improved cuckoo search algorithm, a weighted combination strategy is used to select candidate feature subsets, a crossover mutation concept is used to adjust the candidate feature subsets, and finally, the filtered features are selected into optimal feature subsets. Therefore, the MSMCCS combines the efficiency of filters with the greater accuracy of wrappers. Experimental results on eight common data sets from the University of California at Irvine Machine Learning Repository showed that the MSMCCS algorithm had better classification accuracy than the seven wrapper methods, the one filter method, and the two hybrid methods. Furthermore, the proposed algorithm achieved preferable performance on the Wilcoxon signed-rank test and the sensitivity–specificity test.  相似文献   

4.
Accurate software estimation such as cost estimation, quality estimation and risk analysis is a major issue in software project management. In this paper, we present a soft computing framework to tackle this challenging problem. We first use a preprocessing neuro-fuzzy inference system to handle the dependencies among contributing factors and decouple the effects of the contributing factors into individuals. Then we use a neuro-fuzzy bank to calibrate the parameters of contributing factors. In order to extend our framework into fields that lack of an appropriate algorithmic model of their own, we propose a default algorithmic model that can be replaced when a better model is available. One feature of this framework is that the architecture is inherently independent of the choice of algorithmic models or the nature of the estimation problems. By integrating neural networks, fuzzy logic and algorithmic models into one scheme, this framework has learning ability, integration capability of both expert knowledge and project data, good interpretability, and robustness to imprecise and uncertain inputs. Validation using industry project data shows that the framework produces good results when used to predict software cost.  相似文献   

5.
Lin  Fan  Zeng  Wenhua  Yang  Lvqing  Wang  Yue  Lin  Shufu  Zeng  Jiasong 《Neural computing & applications》2017,28(7):1863-1876

The main cloud computing service providers usually provide cross-regional and services of Crossing Multi-Internet Data Centers that supported with selection strategy of service level agreement risk constraint. But the traditional quality of service (QoS)-aware Web service selection approach cannot ensure the real-time and the reliability of services selection. We proposed a cloud computing system risk assessment method based on cloud theory, and generated the five property clouds by collecting the risk value and four risk indicators from each virtual machine. The cloud backward generator integrated these five clouds into one cloud, according to the weight matrix. So the risk prediction value is transferred to the risk level quantification. Then we tested the Web service selection experiments by using risk assessment level as QoS mainly constraint and comparing with LRU and MAIS methods. The result showed that the success rate and efficiency of risk assessment with cloud focus theory Web services selection approaches are more quickly and efficient.

  相似文献   

6.
This paper presents a novel approach to feature selection based on analysis of class regions which are generated by a fuzzy classifier. A measure for feature evaluation is proposed and is defined as the exception ratio. The exception ratio represents the degree of overlaps in the class regions, in other words, the degree of having exceptions inside of fuzzy rules generated by the fuzzy classifier. It is shown that for a given set of features, a subset of features that has the lowest sum of the exception ratios has the tendency to contain the most relevant features, compared to the other subsets with the same number of features. An algorithm is then proposed that performs elimination of irrelevant features. Given a set of remaining features, the algorithm eliminates the next feature, the elimination of which minimizes the sum of the exception ratios. Next, a terminating criterion is given. Based on this criterion, the proposed algorithm terminates when a significant increase in the sum of the exception ratios occurs due to the next elimination. Experiments show that the proposed algorithm performs well in eliminating irrelevant features while constraining the increase in recognition error rates for unknown data of the classifiers in use.  相似文献   

7.
Partner selection is an active research topic in agile manufacturing and supply chain management. In this paper, the problem is described by a 0-1 integer programming with non-analytical objective function. Then, the solution space is reduced by defining the inefficient candidate. By using the fuzzy rule quantification method, a fuzzy logic based decision making approach for the project scheduling is proposed. We then develop a fuzzy decision embedded genetic algorithm. We compare the algorithm with tranditional methods. The results show that the suggested approach can quickly achieve optimal solution for large size problems with high probability. The approach was applied to the partner selection problem of a coal fire power station construction project. The satisfactory results have been achieved.  相似文献   

8.
Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness.  相似文献   

9.
We focus on a hybrid approach of feature selection. We begin our analysis with a filter model, exploiting the geometrical information contained in the minimum spanning tree (MST) built on the learning set. This model exploits a statistical test of relative certainty gain, used in a forward selection algorithm. In the second part of the paper, we show that the MST can be replaced by the 1 nearest-neighbor graph without challenging the statistical framework. This leads to a feature selection algorithm belonging to a new category of hybrid models (filter-wrapper). Experimental results on readily available synthetic and natural domains are presented and discussed.  相似文献   

10.
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.  相似文献   

11.
After surveying existing feature selection procedures based upon the Karhunen-Loeve (K-L) expansion, the paper describes a new K-L technique that overcomes some of the limitations of the earlier procedures. The new method takes into account information on both the class variances and means, but lays particular emphasis on the classification potential of the latter. The results of a series of experiments concerned with the classification of real vector-electrocardiogram and artificially generated data demonstrate the advantages of the new method. They suggest that it is particularly useful for pattern recognition when combined with classification procedures based upon discriminant functions obtained by recursive least squares analysis.  相似文献   

12.
13.
Axiomatic approach to feature subset selection based on relevance   总被引:7,自引:0,他引:7  
Relevance has traditionally been linked with feature subset selection, but formalization of this link has not been attempted. In this paper, we propose two axioms for feature subset selection-sufficiency axiom and necessity axiom-based on which this link is formalized: The expected feature subset is the one which maximizes relevance. Finding the expected feature subset turns out to be NP-hard. We then devise a heuristic algorithm to find the expected subset which has a polynomial time complexity. The experimental results show that the algorithm finds good enough subset of features which, when presented to C4.5, results in better prediction accuracy  相似文献   

14.
This paper presents an informatics framework to apply feature-based engineering concept for cost estimation supported with data mining algorithms. The purpose of this research work is to provide a practical procedure for more accurate cost estimation by using the commonly available manufacturing process data associated with ERP systems. The proposed method combines linear regression and data-mining techniques, leverages the unique strengths of the both, and creates a mechanism to discover cost features. The final estimation function takes the user’s confidence level over each member technique into consideration such that the application of the method can phase in gradually in reality by building up the data mining capability. A case study demonstrates the proposed framework and compares the results from empirical cost prediction and data mining. The case study results indicate that the combined method is flexible and promising for determining the costs of the example welding features. With the result comparison between the empirical prediction and five different data mining algorithms, the ANN algorithm shows to be the most accurate for welding operations.  相似文献   

15.
A number of software cost estimation methods have been presented in literature over the past decades. Analogy based estimation (ABE), which is essentially a case based reasoning (CBR) approach, is one of the most popular techniques. In order to improve the performance of ABE, many previous studies proposed effective approaches to optimize the weights of the project features (feature weighting) in its similarity function. However, ABE is still criticized for the low prediction accuracy, the large memory requirement, and the expensive computation cost. To alleviate these drawbacks, in this paper we propose the project selection technique for ABE (PSABE) which reduces the whole project base into a small subset that consist only of representative projects. Moreover, PSABE is combined with the feature weighting to form FWPSABE for a further improvement of ABE. The proposed methods are validated on four datasets (two real-world sets and two artificial sets) and compared with conventional ABE, feature weighted ABE (FWABE), and machine learning methods. The promising results indicate that project selection technique could significantly improve analogy based models for software cost estimation.  相似文献   

16.
Software cost estimation is one of the most crucial activities in software development process. In the past decades, many methods have been proposed for cost estimation. Case based reasoning (CBR) is one of these techniques. Feature selection is an important preprocessing stage of case based reasoning. Most existing feature selection methods of case based reasoning are ‘wrappers’ which can usually yield high fitting accuracy at the cost of high computational complexity and low explanation of the selected features. In our study, the mutual information based feature selection (MICBR) is proposed. This approach hybrids both ‘wrapper’ and ‘filter’ mechanism which is another kind of feature selector with much lower complexity than wrappers, and the features selected by filters are likely to be generalized to other conditions. The MICBR is then compared with popular feature selectors and the published works. The results show that the MICBR is an effective feature selector for case based reasoning by overcoming some of the limitations and computational complexities of other feature selection techniques in the field.  相似文献   

17.
In computer aided medical system, many practical classification applications are confronted to the massive multiplication of collection and storage of data, this is especially the case in areas such as the prediction of medical test efficiency, the classification of tumors and the detection of cancers. Data with known class labels (labeled data) can be limited but unlabeled data (with unknown class labels) are more readily available. Semi-supervised learning deals with methods for exploiting the unlabeled data in addition to the labeled data to improve performance on the classification task. In this paper, we consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in large dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature evaluation method called Optimized co-Forest for Feature Selection (OFFS) that combines ideas from co-forest and the embedded principle of selecting in Random Forest based by the permutation of out-of-bag set. We provide empirical results on several medical and biological benchmark datasets, indicating an overall significant improvement of OFFS compared to four other feature selection approaches using filter, wrapper and embedded manner in semi-supervised learning. Our method proves its ability and effectiveness to select and measure importance to improve the performance of the hypothesis learned with a small amount of labeled samples by exploiting unlabeled samples.  相似文献   

18.
Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Traditional hill-climbing search approaches to feature selection have difficulties to find optimal reducts. And the current stochastic search strategies, such as GA, ACO and PSO, provide a more robust solution but at the expense of increased computational effort. It is necessary to investigate fast and effective search algorithms. Rough set theory provides a mathematical tool to discover data dependencies and reduce the number of features contained in a dataset by purely structural methods. In this paper, we define a structure called power set tree (PS-tree), which is an order tree representing the power set, and each possible reduct is mapped to a node of the tree. Then, we present a rough set approach to feature selection based on PS-tree. Two kinds of pruning rules for PS-tree are given. And two novel feature selection algorithms based on PS-tree are also given. Experiment results demonstrate that our algorithms are effective and efficient.  相似文献   

19.
Rough set theory is one of the effective methods to feature selection, which can preserve the meaning of the features. The essence of rough set approach to feature selection is to find a subset of the original features. Since finding a minimal subset of the features is a NP-hard problem, it is necessary to investigate effective and efficient heuristic algorithms. Ant colony optimization (ACO) has been successfully applied to many difficult combinatorial problems like quadratic assignment, traveling salesman, scheduling, etc. It is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. However, ants can discover the best feature combinations as they traverse the graph. In this paper, we propose a new rough set approach to feature selection based on ACO, which adopts mutual information based feature significance as heuristic information. A novel feature selection algorithm is also given. Jensen and Shen proposed a ACO-based feature selection approach which starts from a random feature. Our approach starts from the feature core, which changes the complete graph to a smaller one. To verify the efficiency of our algorithm, experiments are carried out on some standard UCI datasets. The results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features.  相似文献   

20.
Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号