首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Pattern recognition letters》1999,20(11-13):1149-1156
Nearest neighbor classifiers demand significant computational resources (time and memory). Editing of the reference set and feature selection are two different approaches to this problem. Here we encode the two approaches within the same genetic algorithm (GA) and simultaneously select features and reference cases. Two data sets were used: the SATIMAGE data and a generated data set. The GA was found to be an expedient solution compared to editing followed by feature selection, feature selection followed by editing, and the individual results from feature selection and editing.  相似文献   

2.
黄宇扬  董明刚  敬超 《计算机应用》2018,38(11):3112-3118
针对传统的实例选择算法会误删训练集中非噪声样本、算法效率低的不足,提出了一种面向K最近邻(KNN)的遗传实例选择算法。该算法采用基于决策树和遗传算法的二阶段筛选机制,先使用决策树确定噪声样本存在的范围;再使用遗传算法在该范围内精确删除噪声样本,可有效地降低误删率并提高效率,采用基于最近邻规则的验证集选择策略,进一步提高了遗传算法实例选择的准确度;最后引进基于均方误差(MSE)的分类精度惩罚函数来计算遗传算法中个体的适应度,提高有效性和稳定性。在20个数据集上,该方法相较于基于预分类的KNN (PRKNN)、基于协同进化的实例特征选择算法(IFS-CoCo)、K最近邻(KNN),在分类精度上的提升分别为0.07~26.9个百分点、0.03~11.8个百分点、0.2~12.64个百分点,在AUC和Kappa的上的提升分别为0.25~18.32个百分点、1.27~23.29个百分点、0.04~12.82个百分点。实验结果表明,该方法相较于当前实例选择算法在分类精度和分类效率上均具有优势。  相似文献   

3.
Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters which are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MinDist) to prune the search process by avoiding the processing of some of the clusters as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MaxNearestDist) is described. The MaxNearestDist upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (i.e., k=1) as in its previous uses. Both the depth-first and best-first k-nearest neighbor algorithms are modified to use MaxNearestDist, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depth-first algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements.  相似文献   

4.
周靖  刘晋胜 《计算机应用》2011,31(7):1785-1788
特征参数分类泛化性差及分类计算量大影响着K近邻(KNN)的分类性能。提出了一种降维条件下基于联合熵的改进KNN算法,其具体思路是,通过计算任意两个条件属性下对应的特征参数的联合熵衡量数据特征针对分类影响程度的大小,建立特征分类特性与具体分类过程的内在联系,并给出根据特征联合熵集约简条件属性的方法。理论分析与仿真实验表明,与经典KNN等算法相比,提出的算法具有更高的分类性能。  相似文献   

5.
Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.  相似文献   

6.
为了有效提高文本分类的效率,提出了一种基于语义相似的改进KNN算法.该算法结合了特征词的语义和文本的特征位串,由于考虑到文本向量中同义的关联特征词对文本相似度的贡献,有效地提高了文本分类的准确率和召回率;而基于文本特征位串进行的位计算方法,能从大量的训练文本集中筛选出可能的相似文本,较好地克服了KNN算法计算量大的问题.算法的分析与实验表明,改进的算法明显提高了KNN的计算效率,同时也提高了分类的准确率和召回率.  相似文献   

7.
Features selection is the process of choosing the relevant subset of features from the high-dimensional dataset to enhance the performance of the classifier. Much research has been carried out in the present world for the process of feature selection. Algorithms such as Naïve Bayes (NB), decision tree, and genetic algorithm are applied to the high-dimensional dataset to select the relevant features and also to increase the computational speed. The proposed model presents a solution for selection of features using ensemble classifier algorithms. The proposed algorithm is the combination of minimum redundancy and maximum relevance (mRMR) and forest optimization algorithm (FOA). Ensemble-based algorithms such as support vector machine (SVM), K-nearest neighbor (KNN), and NB is further used to enhance the performance of the classifier algorithm. The mRMR-FOA is used to select the relevant features from the various datasets and 21% to 24% improvement is recorded in the feature selection. The ensemble classifier algorithms further improves the performance of the algorithm and provides accuracy of 96%.  相似文献   

8.
The problem addressed in this letter concerns the multiclassifier generation by a random subspace method (RSM). In the RSM, the classifiers are constructed in random subspaces of the data feature space. In this letter, we propose an evolved feature weighting approach: in each subspace, the features are multiplied by a weight factor for minimizing the error rate in the training set. An efficient method based on particle swarm optimization (PSO) is here proposed for finding a set of weights for each feature in each subspace. The performance improvement with respect to the state-of-the-art approaches is validated through experiments with several benchmark data sets.  相似文献   

9.
Simultaneous feature selection and clustering using mixture models   总被引:6,自引:0,他引:6  
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.  相似文献   

10.
The process of placing a separating hyperplane for data classification is normally disconnected from the process of selecting the features to use. An approach for feature selection that is conceptually simple but computationally explosive is to simply apply the hyperplane placement process to all possible subsets of features, selecting the smallest set of features that provides reasonable classification accuracy. Two ways to speed this process are (i) use a faster filtering criterion instead of a complete hyperplane placement, and (ii) use a greedy forward or backwards sequential selection method. This paper introduces a new filtering criterion that is very fast: maximizing the drop in the sum of infeasibilities in a linear-programming transformation of the problem. It also shows how the linear programming transformation can be applied to reduce the number of features after a separating hyperplane has already been placed while maintaining the separation that was originally induced by the hyperplane. Finally, a new and highly effective integrated method that simultaneously selects features while placing the separating hyperplane is introduced.  相似文献   

11.
属性加权的朴素贝叶斯集成分类器   总被引:2,自引:1,他引:1  
为提高朴素贝叶斯分类器的分类精度和泛化能力,提出了基于属性相关性的加权贝叶斯集成方法(WEBNC)。根据每个条件属性与决策属性的相关度对其赋以相应的权值,然后用AdaBoost训练属性加权后的BNC。该分类方法在16个UCI标准数据集上进行了测试,并与BNC、贝叶斯网和由AdaBoost训练出的BNC进行比较,实验结果表明,该分类器具有更高的分类精度与泛化能力。  相似文献   

12.
针对多峰函数的全局优化问题,提出了混合禁忌搜索的全局优化方法,通过实例验证了所提出的算法的可行性、有效性并且收敛速度较快.  相似文献   

13.
In this paper a new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set. A newly developed multiobjective simulated annealing based optimization technique namely archived multiobjective simulated annealing (AMOSA) is used as the background strategy for optimization. Here features and cluster centers are encoded in the form of a string. As the objective functions, two internal cluster validity indices measuring the goodness of the obtained partitioning using Euclidean distance and point symmetry based distance, respectively, and a count on the number of features are utilized. These three objectives are optimized simultaneously using AMOSA in order to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning. Points are allocated to different clusters using a point symmetry based distance. Mutation changes the feature combination as well as the set of cluster centers. Since AMOSA, like any other MOO technique, provides a set of solutions on the final Pareto front, a technique based on the concept of semi-supervised classification is developed to select a solution from the given set. The effectiveness of the proposed FeaClustMOO in comparison with other clustering techniques like its Euclidean distance based version where Euclidean distance is used for cluster assignment, a genetic algorithm based automatic clustering technique (VGAPS-clustering) using point symmetry based distance with all the features, K-means clustering technique with all features is shown for seven higher dimensional data sets obtained from real-life.  相似文献   

14.
We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for the classifier. Additionally, KP-SVM employs an explicit stopping condition, avoiding the elimination of features that would negatively affect the classifier’s performance. We performed experiments on four real-world benchmark problems comparing our approach with well-known feature selection techniques. KP-SVM outperformed the alternative approaches and determined consistently fewer relevant features.  相似文献   

15.
Hand gesture recognition provides an alternative way to many devices for human computer interaction. In this work, we have developed a classifier fusion based dynamic free-air hand gesture recognition system to identify the isolated gestures. Different users gesticulate at different speed for the same gesture. Hence, when comparing different samples of the same gesture, variations due to difference in gesturing speed should not contribute to the dissimilarity score. Thus, we have introduced a two-level speed normalization procedure using DTW and Euclidean distance-based techniques. Three features such as ‘orientation between consecutive points’, ‘speed’ and ‘orientation between first and every trajectory points’ were used for the speed normalization. Moreover, in feature extraction stage, 44 features were selected from the existing literatures. Use of total feature set could lead to overfitting, information redundancy and may increase the computational complexity due to higher dimension. Thus, we have tried to overcome this difficulty by selecting optimal set of features using analysis of variance and incremental feature selection techniques. The performance of the system was evaluated using this optimal set of features for different individual classifiers such as ANN, SVM, k-NN and Naïve Bayes. Finally, the decisions of the individual classifiers were combined using classifier fusion model. Based on the experimental results it may be concluded that classifier fusion provides satisfactory results compared to other individual classifiers. An accuracy of 94.78 % was achieved using the classifier fusion technique as compared to baseline CRF (85.07 %) and HCRF (89.91 %) models.  相似文献   

16.
This study investigates stock market indices prediction that is an interesting and important research in the areas of investment and applications, as it can get more profits and returns at lower risk rate with effective exchange strategies. To realize accurate prediction, various methods have been tried, among which the machine learning methods have drawn attention and been developed. In this paper, we propose a basic hybridized framework of the feature weighted support vector machine as well as feature weighted K-nearest neighbor to effectively predict stock market indices. We first establish a detailed theory of feature weighted SVM for the data classification assigning different weights for different features with respect to the classification importance. Then, to get the weights, we estimate the importance of each feature by computing the information gain. Lastly, we use feature weighted K-nearest neighbor to predict future stock market indices by computing k weighted nearest neighbors from the historical dataset. Experiment results on two well known Chinese stock market indices like Shanghai and Shenzhen stock exchange indices are finally presented to test the performance of our established model. With our proposed model, it can achieve a better prediction capability to Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index in the short, medium and long term respectively. The proposed algorithm can also be adapted to other stock market indices prediction.  相似文献   

17.
The capacitated clustering problem (CCP) is the problem in which a given set of weighted objects is to be partitioned into clusters so that the total weight of objects in each cluster is less than a given value (cluster ‘capacity’). The objective is to minimize the total scatter of objects from the ‘centre’ of the cluster to which they have been allocated. A simple constructive heuristic, a R-interchange generation mechanism, a hybrid simulated annealing (SA) and tabu search (TS) algorithm which has computationally desirable features using a new non-monotonic cooling schedule, are developed. A classification of the existing SA cooling schedules is presented. The effects on the final solution quality of the initial solutions, the cooling schedule parameters and the neighbourhood search strategies are investigated. Computational results on randomly generated problems with size ranging from 50 to 100 customers indicate that the hybrid SA/TS algorithm out-performs previous simulated annealing algorithms, a simple tabu search and local descent algorithms.  相似文献   

18.
视觉显著性度量是图像显著区域提取中的一个关键问题,现有的方法主要根据图像的底层视觉特征,构造相应的显著图。不同的特征对视觉显著性的贡献是不同的,为此提出一种能够自动进行特征选择和加权的图像显著区域检测方法。提取图像的亮度、颜色和方向等特征,构造相应的特征显著图。提出一种新的特征融合策略,动态计算各特征显著图的权值,整合得到最终的显著图,检测出图像中的显著区域。在多幅自然图像上进行实验,实验结果表明,该方法在运算速度和检测效果方面都取得了不错的效果。  相似文献   

19.
为了追求节能减排与净利润最大化,建立一种置换流水车间订单接受与调度模型。禁忌搜索是一类启发式全局搜索算法,传统禁忌搜索对初始解依赖较大,没有对考虑能效的置换流水车间调度问题进行更深入的优化。鉴于问题的复杂性,提出了一种节能混合禁忌搜索算法,结合了NEH构造启发式算法的优势,并在该算法中设计了订单接受与拒绝编码方式、能耗调整与交货期配置策略。最后采用大量随机实例对性能进行分析。实验结果表明,通过上述改进,改善了算法的全局搜索能力与解决复杂模型的寻优能力,节能混合禁忌搜索较单一算法而言性能更优,可以有效增加企业总净利润,降低能源消耗。  相似文献   

20.
Many recent image retrieval methods are based on the “bag-of-words” (BoW) model with some additional spatial consistency checking. This paper proposes a more accurate similarity measurement that takes into account spatial layout of visual words in an offline manner. The similarity measurement is embedded in the standard pipeline of the BoW model, and improves two features of the model: i) latent visual words are added to a query based on spatial co-occurrence, to improve query recall; and ii) weights of reliable visual words are increased to improve the precision. The combination of these methods leads to a more accurate measurement of image similarity. This is similar in concept to the combination of query expansion and spatial verification, but does not require query time processing, which is too expensive to apply to full list of ranked results. Experimental results demonstrate the effectiveness of our proposed method on three public datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号