首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Neural Computing and Applications - Ant colony optimization (ACO) is a well-explored meta-heuristic algorithm, among whose many applications feature selection (FS) is an important one. Most...  相似文献   

2.
A technique for feature selection in multiclass problems   总被引:1,自引:0,他引:1  
One of the main phases in the development of a system for the classification of remote sensing images is the definition of an effective set of features to be given as input to the classifier. In particular, it is often useful to reduce the number of features available, while saving the possibility to discriminate among the different land-cover classes to be recognized. This paper addresses this topic with reference to applications that involve more than two land-cover classes (multiclass problems). Several criteria proposed in the remote sensing literature are considered and compared with one another and with the criterion presented by the authors. Such a criterion, unlike those usually adopted for multiclass problems, is related to an upper bound to the error probability of the Bayes classifier. As the objective of feature selection is generally to identify a reduced set of features that minimize the errors of the classifier, the aforementioned property is very important because it allows one to select features by taking into account their effects on classification errors. Experiments on two remote sensing datasets are described and discussed. These experiments confirm the effectiveness of the proposed criterion, which performs slightly better than all the others considered in the paper. In addition, the results obtained provide useful information about the behaviour of different classical criteria when applied in multiclass cases.  相似文献   

3.
针对传统的谱特征选择算法只考虑单特征的重要性,将特征之间的统计相关性引入到传统谱分析中,构造了基于特征相关的谱特征选择模型。首先利用Laplacian Score找出最核心的一个特征作为已选特征,然后设计了新的特征组区分能力目标函数,采用前向贪心搜索策略依次评价候选特征,并选中使目标函数最小的候选特征加入到已选特征。该算法不仅考虑了特征重要性,而且充分考虑了特征之间的关联性,最后在2个不同分类器和8个UCI数据集上的实验结果表明:该算法不仅提高了特征子集的分类性能,而且获得较高的分类精度下所需特征子集的数量较少。  相似文献   

4.
以轴承故障诊断为应用背景,基于低维投影能够反映原高维数据某些特征的思想,提出了一种基于投影的特征选择方法。该方法利用遗传算法找到最能反映样本分类特性的投影方向,并利用该方向剔除与投影值无关的特征指标,克服了传统特征选择方法在高维空间中计算复杂的缺点,有效避免了"维数灾难"。仿真结果表明,该方法能够在不降低投影值类别特性的情况下,有效降低样本数据维数,完成特征选择,提高了分类效率及准确率。  相似文献   

5.
针对传统的偏最小二乘法只考虑单特征的重要性以及特征之间存在冗余和多重共线性等问题,将特征之间的统计相关性引入到传统的偏最小二乘分析中,构造了一种基于特征相关的偏最小二乘模型。首先利用特征相关度对特征进行评估预选出特征组,然后将其放入偏最小二乘模型中进行训练,评估该特征组是否可取。结合前向贪心搜索策略依次评价候选特征,并选中使目标函数最小的候选特征加入到已选特征。分别采用麻杏石甘汤君药止咳、平喘和UCI数据集进行分析处理,实验结果表明,该特征选择方法能较好寻找较优的特征组。  相似文献   

6.
特征选择是数据挖掘和机器学习领域中一种常用的数据预处理技术。在无监督学习环境下,定义了一种特征平均相关度的度量方法,并在此基础上提出了一种基于特征聚类的特征选择方法 FSFC。该方法利用聚类算法在不同子空间中搜索簇群,使具有较强依赖关系(存在冗余性)的特征被划分到同一个簇群中,然后从每一个簇群中挑选具有代表性的子集共同构成特征子集,最终达到去除不相关特征和冗余特征的目的。在 UCI 数据集上的实验结果表明,FSFC 方法与几种经典的有监督特征选择方法具有相当的特征约减效果和分类性能。  相似文献   

7.
In cases where there are larger numbers of features available than should be used for a given classification task, current practice is to arbitrarily pick the number of features to be used and then to use a feature selection algorithm to determine the specific feature subset to be used. An algorithm is presented that predicts the best feature dimensionality, taking into account the number of training samples. It is demonstrated that rather small training set sizes are still practical using these techniques. Several experiments are presented to assess the algorithm's performance, and a binary tree classification procedure with two examples that utilize the algorithm is shown to demonstrate its usefulness.  相似文献   

8.
特征选择旨在降低待处理数据的维度,剔除冗余特征,是机器学习领域的关键问题之一。现有的半监督特征选择方法一般借助图模型提取数据集的聚类结构,但其所提取的聚类结构缺乏清晰的边界,影响了特征选择的效果。为此,提出一种基于稀疏图表示的半监督特征选择方法,构建了聚类结构和特征选择的联合学习模型,采用l__1范数约束图模型以得到清晰的聚类结构,并引入l_2,1范数以避免噪声的干扰并提高特征选择的准确度。为了验证本方法的有效性,选择了目前流行的几种特征方法进行对比分析,实验结果表明了本方法的有效性。  相似文献   

9.

对于包含大量特征的数据集, 特征选择已成为一个研究热点, 能剔除无关和冗余特征, 将会有效改善分类准确性. 对此, 在分析已有文献的基础上, 提出一种基于属性关系的特征选择算法(NCMIPV), 获取优化特征子集, 并在UCI 数据集上对NCMIPV 算法进行性能评估. 实验结果表明, 与原始特征子集相比, 该算法能有效降低特征空间维数, 运行时间也相对较短, 分类差错率可与其他算法相比, 在某些场合下性能明显优于其他算法.

  相似文献   

10.
一种基于CFN的特征选择及权重算法   总被引:1,自引:0,他引:1  
利用TF和DF的组合进行特征选择,及利用TF-IDF算法计算权重.是文本分类中常用的算法.但当训练集较小时,此特征选择算法会将一些特征区分能力强的低频词过滤掉,并直接影响特征词的权重.本文提出一种基于汉语框架网络(以下简称CFN)的特征选择和计算权重的算法.实验表明:算法可使分类的准确率达到67.3%,较传统算法有很大提高.也说明了该算法能够满足小训练集环境下对文本分类准确率的要求.  相似文献   

11.
为解决高维数据在分类时造成的“维数灾难”问题,提出一种新的将核函数与稀疏学习相结合的属性选择算法。具体地,首先将每一维属性利用核函数映射到核空间,在此高维核空间上执行线性属性选择,从而实现低维空间上的非线性属性选择;其次,对映射到核空间上的属性进行稀疏重构,得到原始数据集的一种稀疏表达方式;接着利用L 1范数构建属性评分选择机制,选出最优属性子集;最后,将属性选择后的数据用于分类实验。在公开数据集上的实验结果表明,该算法能够较好地实现属性选择,与对比算法相比分类准确率提高了约3%。  相似文献   

12.
13.
《传感器与微系统》2019,(1):152-154
针对传统聚类算法无法处理大规模数据的特点,结合增量算法和簇特征的思想,在初始聚类阶段,采用基于距离的K-means聚类算法获取相应簇的特征。根据簇特征,并结合K最近邻(KNN)的思想处理增量,提出了基于簇特征的增量聚类算法。提出的方法已经在加州大学尔湾分校(UCI)机器学习库中提供的真实数据集的帮助下得到验证。实验结果表明:提出的增量聚类方法的聚类精度较普通K-means算法和原始增量K-means算法有明显提高。  相似文献   

14.
15.
Unsupervised feature selection is an important problem, especially for high‐dimensional data. However, until now, it has been scarcely studied and the existing algorithms cannot provide satisfying performance. Thus, in this paper, we propose a new unsupervised feature selection algorithm using similarity‐based feature clustering, Feature Selection‐based Feature Clustering (FSFC). FSFC removes redundant features according to the results of feature clustering based on feature similarity. First, it clusters the features according to their similarity. A new feature clustering algorithm is proposed, which overcomes the shortcomings of K‐means. Second, it selects a representative feature from each cluster, which contains most interesting information of features in the cluster. The efficiency and effectiveness of FSFC are tested upon real‐world data sets and compared with two representative unsupervised feature selection algorithms, Feature Selection Using Similarity (FSUS) and Multi‐Cluster‐based Feature Selection (MCFS) in terms of runtime, feature compression ratio, and the clustering results of K‐means. The results show that FSFC can not only reduce the feature space in less time, but also significantly improve the clustering performance of K‐means.  相似文献   

16.
针对多维数据集,为得到一个最优特征子集,提出一种基于特征聚类的封装式特征选择算法。在初始阶段,利用三支决策理论动态地将原始特征集划分为若干特征子空间,通过特征聚类算法对每个特征子空间内的特征进行聚类;从每个特征类簇里挑选代表特征,利用邻域互信息对剩余特征进行降序排序并依次迭代选择,使用封装器评估该特征是否应该被选择,可得到一个具有最低分类错误率的最优特征子集。在UCI数据集上的实验结果表明,相较于其它特征选择算法,该算法能有效地提高各数据集在libSVM、J48、Nave Bayes以及KNN分类器上的分类准确率。  相似文献   

17.
杜政霖  李云 《计算机应用》2017,37(3):866-870
针对既有历史数据又有流特征的全新应用场景,提出了一种基于组特征选择和流特征的在线特征选择算法。在对历史数据的组特征选择阶段,为了弥补单一聚类算法的不足,引入聚类集成的思想。先利用k-means方法通过多次聚类得到一个聚类集体,在集成阶段再利用层次聚类算法对聚类集体进行集成得到最终的结果。在对流特征数据的在线特征选择阶段,对组构造产生的特征组通过探讨特征间的相关性来更新特征组,最终通过组变换获得特征子集。实验结果表明,所提算法能有效应对全新场景下的在线特征选择问题,并且有很好的分类性能。  相似文献   

18.
19.
A Spatial Distance Join (SDJ) based feature selection method (SDJ-FS) is developed to extend the concept of Correlation Fractal Dimension (CFD) to handle both feature relevance and redundancy jointly for supervised feature selection problems. The Pair-count Exponents (PCEs) for the SDJ between different classes and that of the entire dataset (i.e., the CFD of the dataset) are proposed respectively as feature relevance and redundancy measures. For the SDJ-FS method, an efficient divide-count approach of backward elimination property is designed for the calculation of the SDJ based feature quality (relevance and redundancy) measures. The extensive evaluations on both synthetic and benchmark datasets demonstrate the capability of SDJ-FS in identification of feature subsets of high relevance and low redundancy, along with the favorable performance of SDJ-FS over other reference feature selection methods (including those based on CFD). The success of SDJ-FS shows that, SDJ provides a good framework for the extension of CFD to supervised feature selection problems and offers a new view point for feature selection researches.  相似文献   

20.
特征选择是从原始数据集中去除无关的特征并选择良好的特征子集,可以避免维数灾难和提高学习算法的性能。为解决已选特征和类别动态变化(DCSF)算法在特征选择过程中只考虑已选特征和类别之间动态变化的信息量,而忽略候选特征和已选特征的交互相关性的问题,提出了一种基于动态相关性的特征选择(DRFS)算法。该算法采用条件互信息度量已选特征和类别的条件相关性,并采用交互信息度量候选特征和已选特征发挥的协同作用,从而选择相关特征并且去除冗余特征以获得优良特征子集。仿真实验表明,与现有算法相比,所提算法能有效地提升特征选择的分类准确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号