首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Feature weighting is an aspect of increasing importance in clustering because data are becoming more and more complex. In this paper, we propose new feature weighting methods based on genetic algorithms. These methods use the cost function defined in LKM as a fitness function. We present new methods based on Darwinian, Lamarckian, and Baldwinian evolution. For each one of them, we describe evolutionary and coevolutionary versions. We compare classical hill-climbing optimization with these six genetic algorithms on different datasets. The results show that the proposed methods, except Darwinian methods, are always better than the LKM algorithm.  相似文献   

2.
爬山法是一种局部搜索能力相当好的算法,主要是因为它是通过个体的优劣信息来引导搜索的。而传统的遗传算法作为一种全局搜索算法,在搜索过程中却没有考虑个体间的信息,而仅依靠个体适应度来引导搜索,使得算法的收敛性受到限制。将定向爬山机制应用于遗传算法,提出了一种基于定向爬山的遗传算法(OHCGA)。该算法结合了爬山法与遗传算法的优点,通过比较个体的优劣,使用定向爬山操作引导算法向更优秀的解区域进行搜索。实验结果表明,与传统遗传算法(TGA)相比,OHCGA较大地提高了算法的收敛速度和搜索最优解的能力。  相似文献   

3.
针对计算机辅助诊断(CAD)技术在乳腺癌疾病诊断准确率的优化问题,提出了一种基于随机森林模型下Gini指标特征加权的支持向量机方法(RFG-SVM)。该方法利用了随机森林模型下的Gini指数衡量各个特征对分类结果的重要性,构造具有加权特征向量核函数的支持向量机,并在乳腺癌疾病诊断方面加以应用。经理论分析和实验数据验证,相比于传统的支持向量机(SVM),该方法提升了分类预测的性能,其结果与最新的方法相比也具有一定的竞争力,而且在医疗诊断应用方面更具优势。  相似文献   

4.
翟俊海    刘博  张素芳 《智能系统学报》2017,12(3):397-404
特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程,是数据挖掘的重要预处理步骤。通过剔除冗余属性,以达到降低算法复杂度和提高算法性能的目的。针对离散值特征选择问题,提出了一种将粗糙集相对分类信息熵和粒子群算法相结合的特征选择方法,依托粒子群算法,以相对分类信息熵作为适应度函数,并与其他基于进化算法的特征选择方法进行了实验比较,实验结果表明本文提出的方法具有一定的优势。  相似文献   

5.
Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.  相似文献   

6.
K-modes算法中原有的分类变量间距离度量方法无法体现属性值之间差异,对此提出了一种基于朴素贝叶斯分类器中间运算结果的距离度量。该度量构建代表分类变量的特征向量并计算向量间的欧氏距离作为变量间的距离。将提出的距离度量代入K-modes聚类算法并在多个UCI公共数据集上与其他度量方法进行比较,实验结果表明该距离度量更加有效。  相似文献   

7.
The Nearest Neighbor rule is one of the most successful classifiers in machine learning. However, it is very sensitive to noisy, redundant and irrelevant features, which may cause its performance to deteriorate. Feature weighting methods try to overcome this problem by incorporating weights into the similarity function to increase or reduce the importance of each feature, according to how they behave in the classification task. This paper proposes a new feature weighting classifier, in which the computation of the weights is based on a novel idea combining imputation methods – used to estimate a new distribution of values for each feature based on the rest of the data – and the Kolmogorov–Smirnov nonparametric statistical test to measure the changes between the original and imputed distribution of values. This proposal is compared with classic and recent feature weighting methods. The experimental results show that our feature weighting scheme is very resilient to the choice of imputation method and is an effective way of improving the performance of the Nearest Neighbor classifier, outperforming the rest of the classifiers considered in the comparisons.  相似文献   

8.
In the rapidly evolving field of genomics, many clustering and classification methods have been developed and employed to explore patterns in gene expression data. Biologists face the choice of which clustering algorithm(s) to use and how to interpret different results from various clustering algorithms. No clear objective criteria have been developed to assess the agreement and compare the results from different clustering methods. We describe two generally applicable objective measures to quantify agreement between different clustering methods. These two measures are referred to as the local agreement measure, which is defined for each gene/subject, and the global agreement measure, which is defined for the whole gene expression experiment. The agreement measures are based on a probabilistic weighting scheme applied to the number of concordant and discordant pairs from two clustering methods. In the comparison and assessment process, newly-developed concepts are implemented under the framework of reliability of a cluster. The algorithms are illustrated by simulations and then applied to a yeast sporulation gene expression microarray data. Analysis of the sporulation data identified ∼5% (23 of 477) genes which were not consistently clustered using a neural net algorithm and K-means or pam. The two agreement measures provide objective criteria to conclude whether or not two clustering methods agree with each other. Using the local agreement measure, genes of unknown function which cluster consistently can more confidently be assigned functions based on co-regulation.  相似文献   

9.
Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Traditional hill-climbing search approaches to feature selection have difficulties to find optimal reducts. And the current stochastic search strategies, such as GA, ACO and PSO, provide a more robust solution but at the expense of increased computational effort. It is necessary to investigate fast and effective search algorithms. Rough set theory provides a mathematical tool to discover data dependencies and reduce the number of features contained in a dataset by purely structural methods. In this paper, we define a structure called power set tree (PS-tree), which is an order tree representing the power set, and each possible reduct is mapped to a node of the tree. Then, we present a rough set approach to feature selection based on PS-tree. Two kinds of pruning rules for PS-tree are given. And two novel feature selection algorithms based on PS-tree are also given. Experiment results demonstrate that our algorithms are effective and efficient.  相似文献   

10.
林梦雷  刘景华  王晨曦  林耀进 《计算机科学》2017,44(10):289-295, 317
在多标记学习中,特征选择是解决多标记数据高维性的有效手段。每个标记对样本的可分性程度不同,这可能会为多标记学习提供一定的信息。基于这一假设,提出了一种基于标记权重的多标记特征选择算法。该算法首先利用样本在整个特征空间的分类间隔对标记进行加权,然后将特征在整个标记集合下对样本的可区分性作为特征权重,以此衡量特征对标记集合的重要性。最后,根据特征权重对特征进行降序排列,从而得到一组新的特征排序。在6个多标记数据集和4个评价指标上的实验结果表明,所提算法优于一些当前流行的多标记特征选择算法。  相似文献   

11.
In this paper, we introduce new algorithms that perform clustering and feature weighting simultaneously and in an unsupervised manner. The proposed algorithms are computationally and implementationally simple, and learn a different set of feature weights for each identified cluster. The cluster dependent feature weights offer two advantages. First, they guide the clustering process to partition the data set into more meaningful clusters. Second, they can be used in the subsequent steps of a learning system to improve its learning behavior. An extension of the algorithm to deal with an unknown number of clusters is also proposed. The extension is based on competitive agglomeration, whereby the number of clusters is over-specified, and adjacent clusters are allowed to compete for data points in a manner that causes clusters which lose in the competition to gradually become depleted and vanish. We illustrate the performance of the proposed approach by using it to segment color images, and to build a nearest prototype classifier.  相似文献   

12.
There is substantial research into genetic algorithms that are used to group large numbers of objects into mutually exclusive subsets based upon some fitness function. However, nearly all methods involve degeneracy to some degree.We introduce a new representation for grouping genetic algorithms, the restricted growth function genetic algorithm, that effectively removes all degeneracy, resulting in a more efficient search. A new crossover operator is also described that exploits a measure of similarity between chromosomes in a population. Using several synthetic datasets, we compare the performance of our representation and crossover with another well known state-of-the-art GA method, a strawman optimisation method and a well-established statistical clustering algorithm, with encouraging results.  相似文献   

13.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

14.
为了提高大数据挖掘的效率及准确度,文中将稀疏表示和特征加权运用于大数据处理过程中。首先,采用求解线性方程稀疏解的方式对大数据进行特征分类,在稀疏解的求解过程中利用向量的范数将此过程转化为最优化目标函数的求解。在完成特征分类后进行特征提取以降低数据维度,最后充分结合数据在类中的分布情况进行有效加权来实现大数据挖掘。实验结果表明,相比于常见的特征提取和特征加权算法,提出的算法在查全率和查准率方面均呈现出明显优势。  相似文献   

15.
一种改进的基于特征赋权的K均值聚类算法   总被引:2,自引:0,他引:2  
聚类分析是数据挖掘及机器学习领域内的重点问题之一。近年来,为了提高聚类质量,借鉴和引入了分类领域特征选择及特征赋权思想,提出了一些基于特征赋权的聚类算法。在这些研究基础上,本文提出了一种基于密度的初始中心点选择算法,并借鉴文[1]所提出的特征赋权方法,给出了一种改进的基于特征赋权的K均值算法。实验表明该算法能较为稳定地得到较高质量的聚类结果。  相似文献   

16.
面对海量数据的管理和分析,文本自动分类技术必不可少。特征权重计算是文本分类过程的基础,一个好的特征权重算法能够明显提升文本分类的性能。本文对比了多种不同的特征权重算法,并针对前人算法的不足,提出了基于文档类密度的特征权重算法(tf-idcd)。该算法不仅包括传统的词频度量,还提出了一个新的概念,文档类密度,它通过计算类内包含特征的文档数和类内总文档数的比值来度量。最后,本文在两个中文常见数据集上对五种算法进行实验对比。实验结果显示,本文提出的算法相比较其他特征权重算法在F1宏平均和F1微平均上都有较大的提升。  相似文献   

17.
Evolutionary computation is a research field dealing with black-box and complex optimization problems whose fitness landscapes are usually unknown in advance. It is difficult to select an appropriate evolutionary algorithm and parameters for a given problem due to the black-box setting although many evolutionary algorithms have been developed. In this context, several landscape features have been proposed and their usefulness examined for understanding the problem. In this paper, we propose a novel feature vector by focusing on the local landscape in order to characterize the fitness landscape. The proposed landscape features are a vector form and composed of a histogram of quantized local landscape features. We introduce two implementation methods of this concept, called the bag of local landscape patterns (BoLLP) and the bag of evolvability (BoEvo). The BoLLP uses the fitness pattern of the neighbors of a certain candidate solution, and the BoEvo uses the number of better candidate solutions in the neighbors as the local landscape features. Furthermore, the hierarchical versions of the BoLLP and the BoEvo, concatenated feature vectors with different sample sizes, are considered to capture the landscape characteristic with various resolutions. We extract the proposed landscape feature vectors from well-known continuous optimization benchmark functions and the BBOB benchmark function set to investigate their properties; the visualization of the proposed landscape features, clustering and running time prediction experiments are conducted. Then the effectiveness of the proposed landscape features for the fitness landscape analysis is discussed based on the experimental results.  相似文献   

18.
随着各类生物智能演化算法的日益成熟,基于演化技术及其混合算法的特征选择方法不断涌现。针对高维小样本安全数据的特征选择问题,将文化基因算法(Memetic Algorithm,MA)与最小二乘支持向量机(Least Squares Support Vector Machine,LS-SVM)进行结合,设计了一种封装式(Wrapper)特征选择方法(MA-LSSVM)。该方法利用最小二乘支持向量机易于求解的特点来构造分类器,以分类的准确率作为文化基因算法寻优过程中适应度函数的主要成分。实验表明,MA-LSSVM可以较高效地、稳定地获取对分类贡献较大的特征,降低了数据维度,提高了分类效率。  相似文献   

19.
Feature weighting is a vital step in machine learning tasks that is used to approximate the optimal degree of influence of individual features. Because the salience of a feature can be changed by different queries, the majority of existing methods are not sensitive enough to describe the effectiveness of features. We suggest dynamic weights, which are dynamically sensitive to the effectiveness of features. In order to achieve this, we propose a differentiable feature weighting function that dynamically assigns proper weights for each feature, based on the distinct feature values of the query and the instance. The proposed weighting function, which is an extension of our previous work, is suitable for both single-modal and multi-modal weighting problems, and, hence, is referred to as a General Weighting Function. The number of parameters of the proposed weighting function is fewer compared to the ordinary weighting methods. To show the performance of the General Weighting Function, we proposed a classification algorithm based on the notion of dynamic weights, which is optimized for one nearest neighbor algorithm. The experimental results show that the proposed method outperforms the ordinary feature weighting methods.  相似文献   

20.
This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号