首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
刘兆赓  李占山  王丽  王涛  于海鸿 《软件学报》2020,31(5):1511-1524
特征选择作为一种重要的数据预处理方法,不但能解决维数灾难问题,还能提高算法的泛化能力.各种各样的方法已被应用于解决特征选择问题,其中,基于演化计算的特征选择算法近年来获得了更多的关注并取得了一些成功.近期研究结果表明,森林优化特征选择算法具有更好的分类性能及维度缩减能力.然而,初始化阶段的随机性、全局播种阶段的人为参数设定,影响了该算法的准确率和维度缩减能力;同时,算法本身存在着高维数据处理能力不足的本质缺陷.从信息增益率的角度给出了一种初始化策略,在全局播种阶段,借用模拟退火控温函数的思想自动生成参数,并结合维度缩减率给出了适应度函数;同时,针对形成的优质森林采取贪心算法,形成一种特征选择算法EFSFOA(enhanced feature selection using forest optimization algorithm).此外,在面对高维数据的处理时,采用集成特征选择的方案形成了一个适用于EFSFOA的集成特征选择框架,使其能够有效处理高维数据特征选择问题.通过设计对比实验,验证了EFSFOA与FSFOA相比在分类准确率和维度缩减率上均有明显的提高,高维数据处理能力更是提高到了100 000维.将EFSFOA与近年来提出的比较高效的基于演化计算的特征选择方法进行对比,EFSFOA仍具有很强的竞争力.  相似文献   

2.
现实世界中高维数据无处不在,然而在高维数据中往往存在大量的冗余和噪声信息,这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能.实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中.因而,降维成为挖掘高维数据类簇结构的关键技术.在众多降维方法中,基于图的降维方法是研究的热点.然而,大部分基于图的降维算法存在以下两个问题:(1)需要计算或者学习邻接图,计算复杂度高;(2)降维的过程中没有考虑降维后的用途.针对这两个问题,提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型,通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图,具有样本个数的线性时间复杂度.在真实数据集上的实验结果表明,与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵,使投影后的数据有利于聚类.  相似文献   

3.
利用PCA进行深度学习图像特征提取后的降维研究   总被引:1,自引:0,他引:1  
深度学习是当前人工智能领域广泛使用的一种机器学习方法.深度学习对数据的高度依赖性使得数据需要处理的维度剧增,极大地影响了计算效率和数据分类性能.本文以数据降维为研究目标,对深度学习中的各种数据降维方法进行分析.在此基础上,以Caltech 101图像数据集为实验对象,采用VGG-16深度卷积神经网络进行图像的特征提取,以PCA主成分分析方法为例来实现高维图像特征数据的降维处理.在实验阶段,采用欧氏距离作为相似性度量来检验经过降维处理后的精度指标.实验证明:当提取VGG-16神经网络fc3层的4096维特征后,使用PCA法将数据维度降至64维,依然能够保持较高的特征信息.  相似文献   

4.
In this paper, we propose a novel supervised dimension reduction algorithm based on K-nearest neighbor (KNN) classifier. The proposed algorithm reduces the dimension of data in order to improve the accuracy of the KNN classification. This heuristic algorithm proposes independent dimensions which decrease Euclidean distance of a sample data and its K-nearest within-class neighbors and increase Euclidean distance of that sample and its M-nearest between-class neighbors. This algorithm is a linear dimension reduction algorithm which produces a mapping matrix for projecting data into low dimension. The dimension reduction step is followed by a KNN classifier. Therefore, it is applicable for high-dimensional multiclass classification. Experiments with artificial data such as Helix and Twin-peaks show ability of the algorithm for data visualization. This algorithm is compared with state-of-the-art algorithms in classification of eight different multiclass data sets from UCI collection. Simulation results have shown that the proposed algorithm outperforms the existing algorithms. Visual place classification is an important problem for intelligent mobile robots which not only deals with high-dimensional data but also has to solve a multiclass classification problem. A proper dimension reduction method is usually needed to decrease computation and memory complexity of algorithms in large environments. Therefore, our method is very well suited for this problem. We extract color histogram of omnidirectional camera images as primary features, reduce the features into a low-dimensional space and apply a KNN classifier. Results of experiments on five real data sets showed superiority of the proposed algorithm against others.  相似文献   

5.
针对高维输入数据维数较大时可能存在奇异值问题,同时为提高算法的运算效率以及算法的鲁棒性,提出了一种基于L1范数的分块二维局部保持投影算法B2DLPP-L1。传统的局部保持投影算法为避免出现奇异值问题,首先运用主成分分析算法将高维数据投影到子空间中,然而这种方式将会造成高维数据中部分有效信息的流失,B2DLPP-L1算法选择将二维数据直接作为输入数据,避免运用向量形式的输入数据时可能造成的数据流失;同时该算法对二维输入数据进行分块处理,将分块后的数据块作为新的输入数据,之后运用基于L1范数的二维局部保持投影算法对其进行降维。理论上,B2DLPP-L1算法能够较好地对数据进行降维,不仅能够保持高维数据中的有效信息,降低计算复杂程度,提高算法的运行效率,同时还能够克服存在外点情况下分类准确率较低问题,提高算法的鲁棒性。通过选择不同的人脸数据库进行实验,实验结果表明,在存在外点的情况下,运用最近邻分类器时能够取得更高的分类准确率,同时所需的分类时间有所减少。  相似文献   

6.
基于邻域粗糙模型的高维数据集快速约简算法   总被引:1,自引:0,他引:1  
刘遵仁  吴耿锋 《计算机科学》2012,39(10):268-271
根据粒子群优化算法的思想,给出了求解高维邻域决策表的一个约简算法SPRA。通过采用固有维数的分析方法MLE等,将其估算的维数值作为SPRA算法的初始化参数,提出了高维数据集快速约简算法QSPRA。利用5个UCI标准数据集对该算法进行了验证,结果表明,该算法是有效的、可行的。详细分析了种群规模和迭代次数对结果产生的影响。实验表明,基于核的启发式添加算法思想已经不适合求解高维数据集。  相似文献   

7.
传统数据降维算法分为线性或流形学习降维算法,但在实际应用中很难确定需要哪一类算法.设计一种综合的数据降维算法,以保证它的线性降维效果下限为主成分分析方法且在流形学习降维方面能揭示流形的数据结构.通过对高维数据构造马尔可夫转移矩阵,使越相似的节点转移概率越大,从而发现高维数据降维到低维流形的映射关系.实验结果表明,在人造...  相似文献   

8.
Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data.  相似文献   

9.
信息技术的高速发展促进了信息领域内涵的根本性变革,信息特征的表述方法和内涵不断扩充,高维特征大幅涌现;这些高维特征中可能存在许多不相关和冗余特征,造成了维度灾难,这对基于特征空间聚散特性的分类识别算法提出了更高的要求,需要利用特征选择算法,降低特征向量维数并消除数据噪音的干扰;针对高维特征向量引入的维度灾难等问题,围绕目标分类识别的具体应用,基于标准的序列浮动前向特征选择算法,完成交叉验证重复次数优化,提出了改进的特征选择算法;通过仿真实验表明,基于Bayesian分类器开展识别时,改进算法能够在确保分类识别正确率的前提下,有效提升特征选择的计算速度,并维持一个相对更为收敛且稳定的置信区间,具备良好的准确度。  相似文献   

10.
“Kernel logistic PLS” (KL-PLS) is a new tool for supervised nonlinear dimensionality reduction and binary classification. The principles of KL-PLS are based on both PLS latent variables construction and learning with kernels. The KL-PLS algorithm can be seen as a supervised dimensionality reduction (complexity control step) followed by a classification based on logistic regression. The algorithm is applied to 11 benchmark data sets for binary classification and to three medical problems. In all cases, KL-PLS proved its competitiveness with other state-of-the-art classification methods such as support vector machines. Moreover, due to successions of regressions and logistic regressions carried out on only a small number of uncorrelated variables, KL-PLS allows handling high-dimensional data. The proposed approach is simple and easy to implement. It provides an efficient complexity control by dimensionality reduction and allows the visual inspection of data segmentation.  相似文献   

11.
文本分类在采用向量空间模型(VSM)表达文本特征时,容易出现特征向量高维且稀疏的现象,为了对原始的文本特征向量进行有效简化,提出了一种基于粒子群(PSO)优化独立分量分析(ICA)进行降维的方法,并将其运用到文本分类中。在该算法中,以负熵作为粒子群算法的适应度函数,依据其高斯性原理作为独立性判别标准对分离矩阵进行自适应更新。实验结果表明,相比于传统的特征降维方法,该方法可以解决高维度文本特征向量降维困难的问题,使得文本分类的效率、准确率显著提升。  相似文献   

12.
宋欣  叶世伟 《计算机工程》2008,34(8):205-207
高维非线性数据的降维处理对于计算机完成高复杂度的数据源分析是非常重要的。从拓扑学角度分析,维数约简的过程是挖掘嵌入在高维数据中的低维线性或非线性的流形。该文在局部嵌入思想的流形学习算法的基础上,提出直接估计梯度值的方法,从而达到局部线性误差逼近最小化,实现高维非线性数据的维数约简,并在Swiss roll曲线上采样测试取得了良好的降维效果。  相似文献   

13.
Gene expression data are expected to be a significant aid in the development of efficient cancer diagnosis and classification platforms. However, gene expression data are high-dimensional and the number of samples is small in comparison to the dimensions of the data. Furthermore, the data are inherently noisy. Therefore, in order to improve the accuracy of the classifiers, we would be better off reducing the dimensionality of the data. As a method of dimensionality reduction, there are two previous proposals: feature selection and dimensionality reduction. Feature selection is a feedback method which incorporate the classifier algorithm in the future selection process. Dimensionality reduction refers to algorithms and techniques which create new attributes as combinations of the original attributes in order to reduce the dimensionality of a data set. In this article, we compared the feature selection methods and the dimensionality reduction methods, and verified the effectiveness of both types. For the feature selection methods we used one previously known method and three proposed methods, and for the dimensionality reduction methods we used one previously known method and one proposed method. From an experiment using a benchmark data set, we confirmed the effectiveness of our proposed method with each type of dimensional reduction method.  相似文献   

14.
化探异常识别是成矿预测的重要依据。化探异常识别本质上是一不均衡数据的分类问题。异常识别过程中面临的主要问题是高维数据的处理问题,流形学习通过非线性降维方法实现维数约简。提出了一种基于流形学习的异常识别算法,通过流形学习进行维数约简,结合AdaCost技术,以改善不平衡数据的分类性能。以某锡铜多金属矿床的数据为研究对象进行仿真实验,实验结果表明该算法能够更准确地圈定区域化探异常,为成矿预测与评价提供了新的解决途径。  相似文献   

15.
为了解决主成分分析(PCA)算法无法处理高维数据降维后再聚类精确度下降的问题,提出了一种新的属性空间概念,通过属性空间与信息熵的结合构建了基于特征相似度的降维标准,提出了新的降维算法ENPCA。针对降维后特征是原特征的线性组合而导致可解释性变差以及输入不够灵活的问题,提出了基于岭回归的稀疏主成分算法(ESPCA)。ESPCA算法的输入为主成分降维结果,不需要迭代获得稀疏结果,增加了灵活性和求解速度。最后在降维数据的基础上,针对遗传算法聚类收敛速度慢等问题,对遗传算法的初始化、选择、交叉、变异等操作进行改进,提出了新的聚类算法GKA++。实验分析表明EN-PCA算法表现稳定,GKA++算法在聚类有效性和效率方面表现良好。  相似文献   

16.
Support vector machines (SVM) has achieved great success in multi-class classification. However, with the increase in dimension, the irrelevant or redundant features may degrade the generalization performances of the SVM classifiers, which make dimensionality reduction (DR) become indispensable for high-dimensional data. At present, most of the DR algorithms reduce all data points to the same dimension for multi-class datasets, or search the local latent dimension for each class, but they neglect the fact that different class pairs also have different local latent dimensions. In this paper, we propose an adaptive class pairwise dimensionality reduction algorithm (ACPDR) to improve the generalization performances of the multi-class SVM classifiers. In the proposed algorithm, on the one hand, different class pairs are reduced to different dimensions; on the other hand, a tabu strategy is adopted to select adaptively a suitable embedding dimension. Five popular DR algorithms are employed in our experiment, and the numerical results on some benchmark multi-class datasets show that compared with the traditional DR algorithms, the proposed ACPDR can improve the generalization performances of the multi-class SVM classifiers, and also verify that it is reasonable to consider the different class pairs have different local dimensions.  相似文献   

17.
全局与局部判别信息融合的转子故障数据集降维方法研究   总被引:1,自引:0,他引:1  
针对传统的数据降维方法无法兼顾保持全局特征信息与局部判别信息的问题,提出一种核主元分析(Kernel principal component analysis,KPCA)和正交化局部敏感判别分析(Orthogonal locality sensitive discriminant analysis,OLSDA)相结合的转子故障数据集降维方法.该方法首先利用KPCA算法有效降低数据集的相关性、消除冗余属性,由此实现了最大程度地保留原始数据全局非线性信息的作用;然后利用OLSDA算法充分挖掘出数据的局部流形结构信息,达到了提取出具有高判别力低维本质特征的目的.上述方法的特点是通过同时进行的正交化处理可避免局部子空间结构发生失真,采用三维图直观显示出低维结果,以低维特征子集输入最近邻分类器(K-nearest neighbor,KNN)的识别率和聚类分析之类间距Sb、类内距Sw作为衡量降维效果的指标.实验表明该方法能够全面地提取出全局与局部判别信息,使故障分类更清晰,相应地识别准确率得到了明显提升.该研究可为解决高维和非线性机械故障数据集的可视化与分类问题,提供理论参考依据.  相似文献   

18.

One of the most efficient means to understand complex data is by visualizing them in two- or three-dimensional space. As meaningful data are likely to be high dimensional, visualizing them requires dimensional reduction algorithms, which objective is to map high-dimensional data into low-dimensional space while preserving some of their underlying structures. For labeled data, their low-dimensional representations should embed their classifiability so that their class-structures become visible. It is also beneficial if an algorithm can classify labeled input while at the same time executes dimensional reduction to visually offer information regarding the data’s structure to give rational behind the classification. However, most of the currently available dimensional reduction methods are not usually equipped with classification features, while most classification algorithm lacks transparencies in rationalizing their decisions. In this paper, the restricted radial basis function networks (rRBF), a recently proposed supervised neural network with low-dimensional internal representation, is utilized for visualizing high-dimensional data while also performing classification. The primary focus of this paper is to empirically explain the classifiability and visual transparency of the rRBF.

  相似文献   

19.
等距映射算法(ISOMAP)是一种典型的非线性流形降维算法,该算法可在尽量保持高维数据测地距离与低维数据空间距离对等关系的基础上实现降维.但ISOMAP容易受噪声的影响,导致数据降维后不能保持高维拓扑结构.针对这一问题,提出了一种基于最优密度方向的等距映射(ODD-ISOMAP)算法.该算法通过筛选数据的自然邻居确定每...  相似文献   

20.
针对恶意安卓应用程序检测中存在的特征维度大、检测效率低的问题,结合卷积神经网络CNN良好的特征提取和降维能力以及catboost算法无需广泛数据训练即可产生较好分类结果的优点,构建一个CNN-catboost混合恶意安卓应用检测模型。通过逆向工程获取安卓应用的权限、API包、组件、intent、硬件特性和OpCode特征等静态特征并映射为特征向量,再在特征处理层使用卷积核对特征进行局部感知处理以增强信号。使用最大池化对处理后的特征进行下采样,降低维数并保持特征性质不变。将处理后的特征作为catboost分类层的输入向量,利用遗传算法的全局寻优能力对catboost模型进行调参,进一步提升分类准确率。对训练完成的模型,分别使用已知和未知类型的安卓应用程序数据集作实际应用测试。实验结果表明CNN-catboost模型调参用时较少,在预测精度和检测效率上也展示出较为良好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号