首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised regression on high-dimensional data. We propose an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Furthermore, we show how a recent generalization of the LLE algorithm for correspondence learning can be cast into the GF framework, which obviates the need to choose a representation dimensionality.  相似文献   

2.
Dimension reduction methods are often applied in machine learning and data mining problems. Linear subspace methods are the commonly used ones, such as principal component analysis (PCA), Fisher's linear discriminant analysis (FDA), common spatial pattern (CSP), et al. In this paper, we describe a novel feature extraction method for binary classification problems. Instead of finding linear subspaces, our method finds lower-dimensional affine subspaces satisfying a generalization of the Fukunaga–Koontz transformation (FKT). The proposed method has a closed-form solution and thus can be solved very efficiently. Under normality assumption, our method can be seen as finding an optimal truncated spectrum of the Kullback–Leibler divergence. Also we show that FDA and CSP are special cases of our proposed method under normality assumption. Experiments on simulated data show that our method performs better than PCA and FDA on data that is distributed on two cylinders, even one within the other. We also show that, on several real data sets, our method provides statistically significant improvement on test set accuracy over FDA, CSP and FKT. Therefore the proposed method can be used as another preliminary data-exploring tool to help solve machine learning and data mining problems.  相似文献   

3.
基于核化原理的非线性典型相关判别分析   总被引:4,自引:0,他引:4  
典型相关判别分析是将传统的典型相关分析应用于判别问题,它是一类重要的特征提取算法,但其本质上只能提取数据的线性特征,应用统计学习理论中的核化原理可以将这样的线性特征提取算法推广至非线性特征提取算法,该文研究了如何将这一原理应用于典型相关判别分析,提出了基于核化原理的非线性典型相关判别分析,并且给出了求解该问题的一个自适应学习算法.数值实验表明,基于核化原理所导出的非线性典型相关判别分析比传统的典型相关判别分析更有效,另外,该文从理论上证明,所提出的新方法与Fisher核判别分析等价。  相似文献   

4.
Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here.  相似文献   

5.
Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new efficient algorithm to find the optimal solution. Based on the proposed algorithm, we are able to derive an orthogonal constrained semi-supervised learning framework. The new algorithm incorporates unlabeled data into training procedure so that it is able to preserve the discriminative structure as well as geometrical structure embedded in the original dataset. Under such a framework, many existing semi-supervised dimensionality reduction methods such as SDA, Lap-LDA, SSDR, SSMMC, can be improved using our proposed framework, which can also be used to formulate a corresponding kernel framework for handling nonlinear problems. Theoretical analysis indicates that there are certain relationships between linear and nonlinear methods. Finally, extensive simulations on synthetic dataset and real world dataset are presented to show the effectiveness of our algorithms. The results demonstrate that our proposed algorithm can achieve great superiority to other state-of-art algorithms.  相似文献   

6.
高维流式大数据的产生与发展对传统机器学习和数据挖掘算法提出了诸多挑战。本文结合流式大数据流式到达的特性,首先建立自适应增量特征提取算法模型。然后,针对噪声环境,建立基于特征空间校准的增量流形学习算法模型,解决小样本问题。最后,构造流形学习的正则化优化框架,解决高维数据流特征提取过程中产生的降维误差问题,并得到最终的最优解。实验结果表明本文提出的算法框架符合流形学习算法的3个 评价指标:稳定性、提高性以及学习曲线能迅速增加到一个相对稳定的水平;从而实现了高维数据流的高效学习。  相似文献   

7.
自闭症患者的行为和认知缺陷与潜在的脑功能异常有关。对于静息态功能磁振图像(functional magnetic resonance imaging, fMRI)高维特征,传统的线性特征提取方法不能充分提取其中的有效信息用于分类。为此,本文面向fMRI数据提出一种新型的无监督模糊特征映射方法,并将其与多视角支持向量机相结合,构建分类模型应用于自闭症的计算机辅助诊断。该方法首先采用多输出TSK模糊系统的规则前件学习方法,将原始特征数据映射到线性可分的高维空间;然后引入流形正则化学习框架,提出新型的无监督模糊特征学习方法,从而得到原输出特征向量的非线性低维嵌入表示;最后使用多视角SVM算法进行分类。实验结果表明:本文方法能够有效提取静息态fMRI数据中的重要特征,在保证模型具有优越且稳定的分类性能的前提下,还可以提高模型的可解释性。  相似文献   

8.
针对高维数据具有低秩形式和属性冗余等特点,提出一种基于属性自表达的无监督超图属性选择算法。具体地,该算法首先利用属性自表达特点用其他属性稀疏地表达每个属性,此自表达形式使用低秩假设寻找高维数据的低秩表示,然后建立超图正则化因子保持高维数据的局部结构,最后利用稀疏正则化因子进行属性选择。属性自表达特性确定属性的重要性,低秩表示相当于考虑数据的全局信息进行子空间学习,超图正则化因子考虑数据的局部结构对数据进行子空间学习。该算法实际上考虑数据全局和局部信息进行子空间学习,更是一种嵌入了子空间学习的属性选择算法。实验结果表明,该算法相比其它对比算法,能更有效地选取属性,并能取得很好的分类效果。  相似文献   

9.
云计算产业的快速发展使得虚拟化技术在各大云服务商心目中占据重要地位。为了获取更高的利润,云服务商需要在保障用户体验的前提下尽可能地利用设备性能。通过利用I/O请求的优先级和重要性等信息,研究者们已经在Linux内核中实现了很多提高程序性能的方法。然而,虚拟机中的这些信息在传递到宿主机的过程中会丢失,所以 提出了一种基于服务水平目标SLO的I/O保障框架。首先分析了I/O请求优先级等信息丢失的原因,并提出了传递这些信息需要解决的关键性问题。在此基础上,本文提出的框架通过对Linux内核、virtio协议以及KVM的I/O虚拟化程序QEMU进行扩展,成功地将虚拟机线程的SLO信息传送至宿主机并在此基础上实现了基于SLO信息的调度器。最后,通过实验验证了框架的可行性,优先级最高的线程吞吐量可以达到260 KB/s,优先级最低的线程吞吐量只有10 KB/s,成功证明了由框架传递下来的SLO信息对宿主机中调度器的调度起到了积极作用。  相似文献   

10.
Despite significant successes achieved in knowledge discovery,traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data,such as imbalanced,high-dimensional,noisy data,etc.The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data.In this context,it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model.Ensemble learning,as one research hot spot,aims to integrate data fusion,data modeling,and data mining into a unified framework.Specifically,ensemble learning firstly extracts a set of features with a variety of transformations.Based on these learned features,multiple learning algorithms are utilized to produce weak predictive results.Finally,ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way.In this paper,we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics.In addition,we present challenges and possible research directions for each mainstream approach of ensemble learning,and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning,reinforcement learning,etc.  相似文献   

11.
半监督图核降维方法   总被引:1,自引:0,他引:1       下载免费PDF全文
基于图结构的数据表示和分析,在机器学习领域正得到越来越广泛的关注。以往研究主要集中在为图数据定义一个度量其相似性关系的核函数即图核,一旦定义出图核,就可以用标准的支持向量机(SVM)来对图数据进行分类。将图核方法进行扩充,先利用核主成分分析(kPCA)对图核诱导的高维特征空间中的数据进行降维,得到与原始图数据相对应的低维向量表示的数据,然后对这些新得到的数据用传统机器学习方法进行分析;通过在kPCA中利用图数据中的成对约束形式的监督信息,得到基于图核的半监督降维方法。在MUTAG和PTC等标准图数据集上的实验结果验证了所提方法的有效性。  相似文献   

12.
方伟  接中冰  陆恒杨  张涛 《控制与决策》2024,39(4):1160-1166
覆盖旅行商问题(covering salesman problem, CSP)是旅行商问题的变体,在防灾规划、急救管理中有着广泛应用.由于传统方法求解问题实例耗时严重,近年来深度神经网络被提出用于解决该类组合优化问题,在求解速度和泛化性上有明显的优势.现有基于深度神经网络求解CSP的方法求解质量较低,特别在大规模实例上与传统的启发式方法相比存在较大差距.针对上述问题,提出一种新的基于深度强化学习求解CSP的方法,由编码器对输入特征进行编码,提出新的Mask策略对解码器使用自注意力机制构造解的过程进行约束,并提出多起点策略改善训练过程、提高求解质量.实验结果表明,所提方法对比现有基于深度神经网络的求解方法进一步缩小了最优间隙,同时有着更高的样本效率,在不同规模和不同覆盖类型的CSP中展现出更强的泛化能力,与启发式算法相比在求解速度上有10~40倍的提升.  相似文献   

13.
Dealing with high-dimensional data has always been a major problem with the research of pattern recognition and machine learning, and linear discriminant analysis (LDA) is one of the most popular methods for dimensionality reduction. However, it suffers from the problem of being too sensitive to outliers. Hence to solve this problem, fuzzy membership can be introduced to enhance the performance of algorithms by reducing the effects of outliers. In this paper, we analyze the existing fuzzy strategies and propose a new effective one based on Markov random walks. The new fuzzy strategy can maintain high consistency of local and global discriminative information and preserve statistical properties of dataset. In addition, based on the proposed fuzzy strategy, we then derive an efficient fuzzy LDA algorithm by incorporating the fuzzy membership into learning. Theoretical analysis and extensive simulations show the effectiveness of our algorithm. The presented results demonstrate that our proposed algorithm can achieve significantly improved results compared with other existing algorithms.  相似文献   

14.
Due to the fast learning speed, simplicity of implementation and minimal human intervention, extreme learning machine has received considerable attentions recently, mostly from the machine learning community. Generally, extreme learning machine and its various variants focus on classification and regression problems. Its potential application in analyzing censored time-to-event data is yet to be verified. In this study, we present an extreme learning machine ensemble to model right-censored survival data by combining the Buckley-James transformation and the random forest framework. According to experimental and statistical analysis results, we show that the proposed model outperforms popular survival models such as random survival forest, Cox proportional hazard models on well-known low-dimensional and high-dimensional benchmark datasets in terms of both prediction accuracy and time efficiency.  相似文献   

15.
Domain adaptation learning(DAL) methods have shown promising results by utilizing labeled samples from the source(or auxiliary) domain(s) to learn a robust classifier for the target domain which has a few or even no labeled samples.However,there exist several key issues which need to be addressed in the state-of-theart DAL methods such as sufficient and effective distribution discrepancy metric learning,effective kernel space learning,and multiple source domains transfer learning,etc.Aiming at the mentioned-above issues,in this paper,we propose a unified kernel learning framework for domain adaptation learning and its effective extension based on multiple kernel learning(MKL) schema,regularized by the proposed new minimum distribution distance metric criterion which minimizes both the distribution mean discrepancy and the distribution scatter discrepancy between source and target domains,into which many existing kernel methods(like support vector machine(SVM),v-SVM,and least-square SVM) can be readily incorporated.Our framework,referred to as kernel learning for domain adaptation learning(KLDAL),simultaneously learns an optimal kernel space and a robust classifier by minimizing both the structural risk functional and the distribution discrepancy between different domains.Moreover,we extend the framework KLDAL to multiple kernel learning framework referred to as MKLDAL.Under the KLDAL or MKLDAL framework,we also propose three effective formulations called KLDAL-SVM or MKLDAL-SVM with respect to SVM and its variant μ-KLDALSVM or μ-MKLDALSVM with respect to v-SVM,and KLDAL-LSSVM or MKLDAL-LSSVM with respect to the least-square SVM,respectively.Comprehensive experiments on real-world data sets verify the outperformed or comparable effectiveness of the proposed frameworks.  相似文献   

16.
随着被披露脆弱性代码样本数量的不断增加和机器学习方法的广泛应用,基于机器学习的软件脆弱性分析逐渐成为信息安全领域的热点研究方向。首先,通过分析已有研究工作,提出了基于机器学习的软件脆弱性挖掘框架;然后,从程序分析角度对已有研究工作进行了分类综述;最后,对研究成果进行了对比分析,并分析了当前基于机器学习的脆弱性分析方法面临的挑战,展望了未来的发展方向。  相似文献   

17.
A new formulation for multiway spectral clustering is proposed. This method corresponds to a weighted kernel principal component analysis (PCA) approach based on primal-dual least-squares support vector machine (LS-SVM) formulations. The formulation allows the extension to out-of-sample points. In this way, the proposed clustering model can be trained, validated, and tested. The clustering information is contained on the eigendecomposition of a modified similarity matrix derived from the data. This eigenvalue problem corresponds to the dual solution of a primal optimization problem formulated in a high-dimensional feature space. A model selection criterion called the Balanced Line Fit (BLF) is also proposed. This criterion is based on the out-of-sample extension and exploits the structure of the eigenvectors and the corresponding projections when the clusters are well formed. The BLF criterion can be used to obtain clustering parameters in a learning framework. Experimental results with difficult toy problems and image segmentation show improved performance in terms of generalization to new samples and computation times.  相似文献   

18.
We present a framework for recognition of 3D objects by integrating 2D and 3D sensory data. The major thrust of this work is to efficiently utilize all relevant data, both 2D and 3D, in the early stages of recognition, in order to reduce the computational requirements of the recognition process. To achieve this goal, we formulate the problem as a constraint–satisfaction problem (CSP). Rather than directly solving the CSP, a problem of exponential complexity, we only enforce local consistency in low-order polynomial time. This step of local-consistency enforcement can significantly decrease the computational load on subsequent recognition modules by (1) significantly reducing the uncertainty in the correspondence between scene and model features and (2) eliminating many erroneous model objects and irrelevant scene features. A novel method is presented for efficiently constructing a CSP corresponding to a combination of 2D and 3D scene features. Performance of the proposed framework is demonstrated using simulated and real experiments involving visual (2D) and tactile (3D) data.  相似文献   

19.
High-dimensional data visualization is a more complex process than the ordinary dimensionality reduction to two or three dimensions. Therefore, we propose and evaluate a novel four-step visualization approach that is built upon the combination of three components: metric learning, intrinsic dimensionality estimation, and feature extraction. Although many successful applications of dimensionality reduction techniques for visualization are known, we believe that the sophisticated nature of high-dimensional data often needs a combination of several machine learning methods to solve the task. Here, this is provided by a novel framework and experiments with real-world data.  相似文献   

20.
Recent advances in remote sensing techniques allow for the collection of hyperspectral images with enhanced spatial and spectral resolution. In many applications, these images need to be processed and interpreted in real-time, since analysis results need to be obtained almost instantaneously. However, the large amount of data that these images comprise introduces significant processing challenges. This also complicates the analysis performed by traditional machine learning algorithms. To address this issue, dimensionality reduction techniques aim at reducing the complexity of data while retaining the relevant information for the analysis, removing noise and redundant information. In this paper, we present a new real-time method for dimensionality reduction and classification of hyperspectral images. The newly proposed method exploits artificial neural networks, which are used to develop a fast compressor based on the extreme learning machine. The obtained experimental results indicate that the proposed method has the ability to compress and classify high-dimensional images fast enough for practical use in real-time applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号