首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 134 毫秒
1.
针对现有相移键控(PSK)信号码速率估计方法在脉冲噪声下性能退化甚至失效的问题,提出一种循环平稳理论框架下基于相关熵的码速率估计新方法.构造信号的循环相关熵函数,并理论推导给出在二相相移键控(BPSK)信号下的循环相关熵函数.该方法可通过检测PSK信号循环相关熵函数的离散谱线实现码速率估计,不需要噪声的先验信息,且能够直接利用FFT计算信号的循环相关熵函数的延迟切片,实现简单.仿真结果表明,所提出方法能有效抑制脉冲噪声,尤其在强脉冲噪声环境下,具有良好的PSK信号码速率估计性能.  相似文献   

2.
基于增强稀疏性特征选择的网络图像标注   总被引:1,自引:0,他引:1  
史彩娟  阮秋琦 《软件学报》2015,26(7):1800-1811
面对网络图像的爆炸性增长,网络图像标注成为近年来一个热点研究内容,稀疏特征选择在提升网络图像标注效率和性能方面发挥着重要的作用.提出了一种增强稀疏性特征选择算法,即,基于l2,1/2矩阵范数和共享子空间的半监督稀疏特征选择算法(semi-supervised sparse feature selection based on l2,1/2-matix norm with shared subspace learning,简称SFSLS)进行网络图像标注.在SFSLS算法中,应用l2,1/2矩阵范数来选取最稀疏和最具判别性的特征,通过共享子空间学习,考虑不同特征之间的关联信息.另外,基于图拉普拉斯的半监督学习,使SFSLS算法同时利用了有标签数据和无标签数据.设计了一种有效的迭代算法来最优化目标函数.SFSLS算法与其他稀疏特征选择算法在两个大规模网络图像数据库上进行了比较,结果表明,SFSLS算法更适合于大规模网络图像的标注.  相似文献   

3.
We address the feature subset selection problem for classification tasks. We examine the performance of two hybrid strategies that directly search on a ranked list of features and compare them with two widely used algorithms, the fast correlation based filter (FCBF) and sequential forward selection (SFS). The proposed hybrid approaches provide the possibility of efficiently applying any subset evaluator, with a wrapper model included, to large and high-dimensional domains. The experiments performed show that our two strategies are competitive and can select a small subset of features without degrading the classification error or the advantages of the strategies under study.  相似文献   

4.
This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward-building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some real-world applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.  相似文献   

5.
In this paper we propose a feature selection method for symbolic interval data based on similarity margin. In this method, classes are parameterized by an interval prototype based on an appropriate learning process. A similarity measure is defined in order to estimate the similarity between the interval feature value and each class prototype. Then, a similarity margin concept has been introduced. The heuristic search is avoided by optimizing an objective function to evaluate the importance (weight) of each interval feature in a similarity margin framework. The experimental results show that the proposed method selects meaningful features for interval data. In particular, the method we propose yields a significant improvement on classification task of three real-world datasets.  相似文献   

6.
Feature selection is the basic pre-processing task of eliminating irrelevant or redundant features through investigating complicated interactions among features in a feature set. Due to its critical role in classification and computational time, it has attracted researchers’ attention for the last five decades. However, it still remains a challenge. This paper proposes a binary artificial bee colony (ABC) algorithm for the feature selection problems, which is developed by integrating evolutionary based similarity search mechanisms into an existing binary ABC variant. The performance analysis of the proposed algorithm is demonstrated by comparing it with some well-known variants of the particle swarm optimization (PSO) and ABC algorithms, including standard binary PSO, new velocity based binary PSO, quantum inspired binary PSO, discrete ABC, modification rate based ABC, angle modulated ABC, and genetic algorithms on 10 benchmark datasets. The results show that the proposed algorithm can obtain higher classification performance in both training and test sets, and can eliminate irrelevant and redundant features more effectively than the other approaches. Note that all the algorithms used in this paper except for standard binary PSO and GA are employed for the first time in feature selection.  相似文献   

7.
大规模特征选择问题的求解通常面临两大挑战:一是真实标签不足,难以引导算法进行特征选择;二是搜索空间规模大,难以搜索到满意的高质量解。为此,提出了新型的面向大规模特征选择的自监督数据驱动粒子群优化算法。第一,提出了自监督数据驱动特征选择的新型算法框架,可不依赖于真实标签进行特征选择。第二,提出了基于离散区域编码的搜索策略,帮助算法在大规模搜索空间中找到更优解。第三,基于上述的框架和方法,提出了自监督数据驱动粒子群优化算法,实现对问题的求解。在大规模特征数据集上的实验结果显示,提出的算法与主流有监督算法表现相当,并比前沿无监督算法具有更高的特征选择效率。  相似文献   

8.

In machine learning, searching for the optimal feature subset from the original datasets is a very challenging and prominent task. The metaheuristic algorithms are used in finding out the relevant, important features, that enhance the classification accuracy and save the resource time. Most of the algorithms have shown excellent performance in solving feature selection problems. A recently developed metaheuristic algorithm, gaining-sharing knowledge-based optimization algorithm (GSK), is considered for finding out the optimal feature subset. GSK algorithm was proposed over continuous search space; therefore, a total of eight S-shaped and V-shaped transfer functions are employed to solve the problems into binary search space. Additionally, a population reduction scheme is also employed with the transfer functions to enhance the performance of proposed approaches. It explores the search space efficiently and deletes the worst solutions from the search space, due to the updation of population size in every iteration. The proposed approaches are tested over twenty-one benchmark datasets from UCI repository. The obtained results are compared with state-of-the-art metaheuristic algorithms including binary differential evolution algorithm, binary particle swarm optimization, binary bat algorithm, binary grey wolf optimizer, binary ant lion optimizer, binary dragonfly algorithm, binary salp swarm algorithm. Among eight transfer functions, V4 transfer function with population reduction on binary GSK algorithm outperforms other optimizers in terms of accuracy, fitness values and the minimal number of features. To investigate the results statistically, two non-parametric statistical tests are conducted that concludes the superiority of the proposed approach.

  相似文献   

9.
针对原始病理图像经软件提取形态学特征后存在高维度,以及医学领域上样本的少量性问题,提出ReliefF-HEPSO头颈癌病理图像特征选择算法。该算法构建了多层次降维框架,首先根据特征和类别的相关性,利用ReliefF算法确定不同的特征权重,实现初步降维。其次利用进化神经策略(ENS)丰富二进制粒子群算法(BPSO)的种群的多样性,提出混合二进制进化粒子群算法(HEPSO)对候选特征子集完成最佳特征子集的自动寻找。与7种特征选择算法的实验对比结果证明,该算法能更有效筛选出高相关性的病理图像形态学特征,实现快速降维,以较少特征获得较高分类性能。  相似文献   

10.
为提高企业财务危机的预测准确率,提出一种基于引力搜索算法优化核极限学习机(KELM)的并行模型PHGSA-KELM。模型考虑了特征选择机制和参数优化两者对KELM模型起着同等重要的作用,提出改进的引力搜索算法(HGSA)同步实现特征选择机制和KELM参数优化,同时设计的线性加权多目标函数综合考虑了分类精度和特征子集数量,改善了算法的分类性能,并且基于多核平台的多线程并行方式进一步提高了算法的计算效率。通过真实数据集的实验结果表明,提出的模型不仅获得了较少的特征子集个数,找出了与企业财务危机紧密相关的特征,得到了很高的分类准确率,并且计算效率也得到极大提高,是一种有效的企业财务危机预警模型。  相似文献   

11.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

12.
特征选择旨在降低待处理数据的维度,剔除冗余特征,是机器学习领域的关键问题之一。现有的半监督特征选择方法一般借助图模型提取数据集的聚类结构,但其所提取的聚类结构缺乏清晰的边界,影响了特征选择的效果。为此,提出一种基于稀疏图表示的半监督特征选择方法,构建了聚类结构和特征选择的联合学习模型,采用l__1范数约束图模型以得到清晰的聚类结构,并引入l_2,1范数以避免噪声的干扰并提高特征选择的准确度。为了验证本方法的有效性,选择了目前流行的几种特征方法进行对比分析,实验结果表明了本方法的有效性。  相似文献   

13.
A genetic algorithm-based method for feature subset selection   总被引:5,自引:2,他引:3  
As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.  相似文献   

14.
目的 本文针对基于最小均方差准则的主成分分析算法(如2DPCA-L2(two-dimensional PCA with L2-norm)算法和2DPCA-L1(two-dimensional PCA with L1-norm)算法)对外点敏感、识别率低的问题,结合信息论中的最大相关熵准则,提出了一种基于最大相关熵准则的2DPCA(2DPCA-MCC)。方法 2DPCA-MCC算法采用最大相关熵表示目标函数,通过半二次优化技术解决相关熵问题,降低了外点在目标函数评价中的贡献,从而提高了算法的鲁棒性和识别精度。结果 通过对比2DPCA-MCC算法和2DPCA-L2、2DPCA-L1在ORL人脸数据库上的识别效果,表明了2DPCA-MCC算法的识别率比2维主成分分析算法的识别率最低提高了近10%,最高提高了近30%。结论 提出了一种基于最大相关熵的2DPCA算法,通过半二次优化技术解决非线性优化问题,实验结果表明,本算法能够较好地解决外点问题,显著提高识别精度,适用于解决人脸识别中的外点问题。  相似文献   

15.
Since given classification data often contains redundant, useless or misleading features, feature selection is an important pre-processing step for solving classification problems. This problem is often solved by applying evolutionary algorithms to decrease the dimensional number of features involved. Removing irrelevant features in the feature space and identifying relevant features correctly is the primary objective, which can increase classification accuracy. In this paper, a novel QBGSA–K-NN hybrid system which hybridizes the quantum-inspired binary gravitational search algorithm (QBGSA) with the K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) is proposed. The main aim of this system is to improve classification accuracy with an appropriate feature subset in binary problems. We evaluate the proposed hybrid system on several UCI machine learning benchmark examples. The experimental results show that the proposed method is able to select the discriminating input features correctly and achieve high classification accuracy which is comparable to or better than well-known similar classifier systems.  相似文献   

16.
Advances in computer technologies have enabled corporations to accumulate data at an unprecedented speed. Large-scale business data might contain billions of observations and thousands of features, which easily brings their scale to the level of terabytes. Most traditional feature selection algorithms are designed and implemented for a centralized computing architecture. Their usability significantly deteriorates when data size exceeds tens of gigabytes. High-performance distributed computing frameworks and protocols, such as the Message Passing Interface (MPI) and MapReduce, have been proposed to facilitate software development on grid infrastructures, enabling analysts to process large-scale problems efficiently. This paper presents a novel large-scale feature selection algorithm that is based on variance analysis. The algorithm selects features by evaluating their abilities to explain data variance. It supports both supervised and unsupervised feature selection and can be readily implemented in most distributed computing environments. The algorithm was implemented as a SAS High-Performance Analytics procedure, which can read data in distributed form and perform parallel feature selection in both symmetric multiprocessing mode (SMP) and massively parallel processing mode (MPP). Experimental results demonstrated the superior performance of the proposed method for large scale feature selection.  相似文献   

17.
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.  相似文献   

18.
郭娜  刘聪  李彩虹  陆婷  闻立杰  曾庆田 《软件学报》2024,35(3):1341-1356
流程剩余时间预测对于业务异常的预防和干预有着重要的价值和意义.现有的剩余时间预测方法通过深度学习技术达到了更高的准确率,然而大多数深度模型结构复杂难以解释预测结果,即不可解释问题.此外,剩余时间预测除了活动这一关键属性还会根据领域知识选择若干其他属性作为预测模型的输入特征,缺少通用的特征选择方法,对于预测的准确率和模型的可解释性存在一定的影响.针对上述问题,提出基于可解释特征分层模型(explainable feature-based hierarchical model,EFH model)的流程剩余时间预测框架.具体而言,首先提出特征自选择策略,通过基于优先级的后向特征删除和基于特征重要性值的前向特征选择,得到对预测任务具有积极影响的属性作为模型输入.然后提出可解释特征分层模型架构,通过逐层加入不同特征得到每层的预测结果,解释特征值与预测结果的内在联系.采用LightGBM (light gradient boosting machine)和LSTM (long short-term memory)算法实例化所提方法,框架是通用的,不限于选用算法.最后在8个真实事件日志上与最新方法进行比较.实验结果表明所提方法能够选取出有效特征,提高预测的准确率,并解释预测结果.  相似文献   

19.
分类问题的一种可伸缩特征选择算法   总被引:4,自引:0,他引:4  
张巍  邹翔  吴晓如 《计算机学报》2005,28(7):1223-1229
特征选择是数据挖掘分类中的一个重要问题.该文推导出一种新的衡量特征与类别相关度的测度SCD即描述特征取值序列类分布的CV系数,利用该测度给出一种线性的可伸缩特征选择算法StaFSOS,并证明了在类别数为2时,SCD测度满足分支界限法的单调性;给出了StaFSOS的一个完备形式——BBStaFS.在12个标准数据集中,StaFSOS算法得出的结果和目标集几乎一致,而StaFSOS的效率高于其它算法;而在另1个中,BBStaFS算法得出了准确结果.在用1000个样本20个特征的真实数据进行的测试中,StaFSOS运行时间是目前较快的GRSR的1/2,得出的特征集准确有效.  相似文献   

20.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号