首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
一种半监督局部线性嵌入算法的文本分类方法*   总被引:3,自引:0,他引:3  
针对局部线性嵌入算法(LLE)应用于非监督机器学习中的缺陷,将该算法与半监督思想相结合,提出了一种基于半监督局部线性嵌入算法的文本分类方法。通过使用文本数据的流形结构和少量的标签样本,将LLE中的距离矩阵采用分段形式进行调整;使用调整后的矩阵进行线性重建从而实现数据降维;针对半监督LLE中使用欧氏距离的缺点,采用高斯核函数将欧氏距离进行变换,并用新的核距离取代欧氏距离,提出了基于核的半监督局部线性嵌入算法;最后通过仿真实验验证了改进算法的有效性。  相似文献   

2.
Recent advances in the field of kernel-based machine learning methods allow fast processing of text using string kernels utilizing suffix arrays. kernlab provides both kernel methods’ infrastructure and a large collection of already implemented algorithms and includes an implementation of suffix-array-based string kernels. Along with the use of the text mining infrastructure provided by tm these packages provide R with functionality in processing, visualizing and grouping large collections of text data using kernel methods. The emphasis is on the performance of various types of string kernels at these tasks.  相似文献   

3.
In machine learning and statistics, kernel density estimators are rarely used on multivariate data due to the difficulty of finding an appropriate kernel bandwidth to overcome overfitting. However, the recent advances on information-theoretic learning have revived the interest on these models. With this motivation, in this paper we revisit the classical statistical problem of data-driven bandwidth selection by cross-validation maximum likelihood for Gaussian kernels. We find a solution to the optimization problem under both the spherical and the general case where a full covariance matrix is considered for the kernel. The fixed-point algorithms proposed in this paper obtain the maximum likelihood bandwidth in few iterations, without performing an exhaustive bandwidth search, which is unfeasible in the multivariate case. The convergence of the methods proposed is proved. A set of classification experiments are performed to prove the usefulness of the obtained models in pattern recognition.  相似文献   

4.
Distance metric learning is rather important for measuring the similarity (/dissimilarity) of two instances in many pattern recognition algorithms. Although many linear Mahalanobis metric learning methods can be extended to their kernelized versions for dealing with the nonlinear structure data, choosing the proper kernel and determining the kernel parameters are still tough problems. Furthermore, the single kernel embedded metric is not suited for the problems with multi-view feature representations. In this paper, we address the problem of metric learning with multiple kernels embedding. By analyzing the existing formulations of metric learning with multiple-kernel embedding, we propose a new framework to learn multi-metrics as well as the corresponding weights jointly, the objective function can be shown to be convex and it can be converted to be a multiple kernel learning-support vector machine problem, which can be solved by existing methods. The experiments on single-view and multi-view data show the effectiveness of our method.  相似文献   

5.
In this paper, we propose a novel learning algorithm, named SABC-MKELM, based on a kernel extreme learning machine (KELM) method for single-hidden-layer feedforward networks. In SABC-MKELM, the combination of Gaussian kernels is used as the activate function of KELM instead of simple fixed kernel learning, where the related parameters of kernels and the weights of kernels can be optimised by a novel self-adaptive artificial bee colony (SABC) approach simultaneously. SABC-MKELM outperforms six other state-of-the-art approaches in general, as it could effectively determine solution updating strategies and suitable parameters to produce a flexible kernel function involved in SABC. Simulations have demonstrated that the proposed algorithm not only self-adaptively determines suitable parameters and solution updating strategies learning from the previous experiences, but also achieves better generalisation performances than several related methods, and the results show good stability of the proposed algorithm.  相似文献   

6.
In the real world all events are connected. There is a hidden network of dependencies that governs behavior of natural processes. Without much argument it can be said that, of all the known data-structures, graphs are naturally suitable to model such information. But to learn to use graph data structure is a tedious job as most operations on graphs are computationally expensive, so exploring fast machine learning techniques for graph data has been an active area of research and a family of algorithms called kernel based approaches has been famous among researchers of the machine learning domain. With the help of support vector machines, kernel based methods work very well for learning with Gaussian processes. In this survey we will explore various kernels that operate on graph representations. Starting from the basics of kernel based learning we will travel through the history of graph kernels from its first appearance to discussion of current state of the art techniques in practice.  相似文献   

7.
多核学习方法   总被引:56,自引:5,他引:51  
多核学习方法是当前核机器学习领域的一个新的热点. 核方法是解决非线性模式分析问题的一种有效方法, 但在一些复杂情形下, 由单个核函数构成的核机器并不能满足诸如数据异构或不规则、样本规模巨大、样本不平坦分布等实际的应用需求, 因此将多个核函数进行组合, 以获得更好的结果是一种必然选择. 本文根据多核的构成, 从合成核、多尺度核、无限核三个角度, 系统综述了多核方法的构造理论, 分析了多核学习典型方法的特点及不足, 总结了各自的应用领域, 并凝炼了其进一步的研究方向.  相似文献   

8.
Kernel methods have been widely applied in machine learning to solve complex nonlinear problems. Kernel selection is one of the key issues in kernel methods, since it is vital for improving generalization performance. Traditionally, the selection of kernel is restricted to be positive definite which makes their applicability partially limited. Actually, in many real applications such as gene identification and object recognition, indefinite kernels frequently emerge and can achieve better performance. However, compared to positive definite ones, indefinite kernels are more complicated due to the non-convexity of the subsequent optimization problems, which leads to the incapability of most existing kernel algorithms. Some indefinite kernel methods have been proposed based on the dual of support vector machine (SVM), which mostly emphasize on how to transform the non-convex optimization to be convex by using positive definite kernels to approximate indefinite ones. In fact, the duality gap in SVM usually exists in the case of indefinite kernels and therefore these algorithms do not indeed solve the indefinite kernel problems themselves. In this paper, we present a novel framework for indefinite kernel learning derived directly from the primal of SVM, which establishes several new models not only for single indefinite kernel but also extends to multiple indefinite kernel scenarios. Several algorithms are developed to handle the non-convex optimization problems in these models. We further provide a constructive approach for kernel selection in the algorithms by using the theory of similarity functions. Experiments on real world datasets demonstrate the superiority of our models.  相似文献   

9.
The conversion functions in the hidden layer of radial basis function neural networks (RBFNN) are Gaussian functions. The Gaussian functions are local to the kernel centers. In most of the existing research, the spatial local response of the sample is inaccurately calculated because the kernels have the same shape as a hypersphere, and the kernel parameters in the network are determined by experience. The influence of the fine structure in the local space is not considered during feature extraction. In addition, it is difficult to obtain a better feature extraction ability with less computational complexity. Therefore, this paper develops a multi-scale RBF kernel learning algorithm and proposes a new multi-layer RBF neural network model. For the samples of each class, the expectation maximization (EM) algorithm is used to obtain multi-layer nested sub-distribution models with different local response ranges, which are called multi-scale kernels in the network. The prior information of each sub-distribution is used as the connection weight between the multi-scale kernels. Finally, feature extraction is implemented using multi-layer kernel subspace embedding. The multi-scale kernel learning model can efficiently and accurately describe the fine structure of the samples and is fault tolerant to setting the number of kernels to a certain extent. Considering the prior probability of each kernel as the weight makes the feature extraction process satisfy the Bayes rule, which can enhance the interpretability of feature extraction in the network. This paper also theoretically proves that the proposed neural network is a generalized version of the original RBFNN. The experimental results show that the proposed method has better performance compared with some state-of-the-art algorithms.  相似文献   

10.
结合半监督核的高斯过程分类   总被引:1,自引:0,他引:1  
提出了一种半监督算法用于学习高斯过程分类器, 其通过结合非参数的半监督核向分类器提供未标记数据信息. 该算法主要包括以下几个方面: 1)通过图拉普拉斯的谱分解获得核矩阵, 其联合了标记数据和未标记数据信息; 2)采用凸最优化方法学习核矩阵特征向量的最优权值, 构建非参数的半监督核; 3)把半监督核整合到高斯过程模型中, 构建所提出的半监督学习算法. 该算法的主要特点是: 把基于整个数据集的非参数半监督核应用于高斯过程模型, 该模型有着明确的概率描述, 可以方便地对数据之间的不确定性进行建模, 并能够解决复杂的推论问题. 通过实验结果表明, 该算法与其他方法相比具有更高的可靠性.  相似文献   

11.
Kernels are functions designed in order to capture resemblance between data and they are used in a wide range of machine learning techniques, including support vector machines (SVMs). In their standard version, commonly used kernels such as the Gaussian one show reasonably good performance in many classification and recognition tasks in computer vision, bioinformatics, and text processing. In the particular task of object recognition, the main deficiency of standard kernels such as the convolution one resides in the lack in capturing the right geometric structure of objects while also being invariant. We focus in this paper on object recognition using a new type of kernel referred to as "context dependent.” Objects, seen as constellations of interest points, are matched by minimizing an energy function mixing 1) a fidelity term which measures the quality of feature matching, 2) a neighborhood criterion which captures the object geometry, and 3) a regularization term. We will show that the fixed point of this energy is a context-dependent kernel which is also positive definite. Experiments conducted on object recognition show that when plugging our kernel into SVMs, we clearly outperform SVMs with context-free kernels.  相似文献   

12.
Many common machine learning methods such as support vector machines or Gaussian process inference make use of positive definite kernels, reproducing kernel Hilbert spaces, Gaussian processes, and regularization operators. In this work these objects are presented in a general, unifying framework and interrelations are highlighted.With this in mind we then show how linear stochastic differential equation models can be incorporated naturally into the kernel framework. And vice versa, many kernel machines can be interpreted in terms of differential equations. We focus especially on ordinary differential equations, also known as dynamical systems, and it is shown that standard kernel inference algorithms are equivalent to Kalman filter methods based on such models.In order not to cloud qualitative insights with heavy mathematical machinery, we restrict ourselves to finite domains, implying that differential equations are treated via their corresponding finite difference equations.  相似文献   

13.
Kernel methods provide high performance in a variety of machine learning tasks. However, the success of kernel methods is heavily dependent on the selection of the right kernel function and proper setting of its parameters. Several sets of kernel functions based on orthogonal polynomials have been proposed recently. Besides their good performance in the error rate, these kernel functions have only one parameter chosen from a small set of integers, and it facilitates kernel selection greatly. Two sets of orthogonal polynomial kernel functions, namely the triangularly modified Chebyshev kernels and the triangularly modified Legendre kernels, are proposed in this study. Furthermore, we compare the construction methods of some orthogonal polynomial kernels and highlight the similarities and differences among them. Experiments on 32 data sets are performed for better illustration and comparison of these kernel functions in classification and regression scenarios. In general, there is difference among these orthogonal polynomial kernels in terms of accuracy, and most orthogonal polynomial kernels can match the commonly used kernels, such as the polynomial kernel, the Gaussian kernel and the wavelet kernel. Compared with these universal kernels, the orthogonal polynomial kernels each have a unique easily optimized parameter, and they store statistically significantly less support vectors in support vector classification. New presented kernels can obtain better generalization performance both for classification tasks and regression tasks.  相似文献   

14.
多尺度核方法是当前核机器学习领域的一个热点。通常多尺度核的学习在多核处理时存在诸如多核平均组合、迭代学习时间长、经验选择合成系数等弊端。文中基于核目标度量规则,提出一种多尺度核方法的自适应序列学习算法,实现多核加权系数的自动快速求取。实验表明,该方法在回归精度、分类正确率方面比单核支持向量机方法结果更优,函数拟合与分类稳定性更强,证明该算法具有普遍适用性。  相似文献   

15.
We present new methods for fast Gaussian process (GP) inference in large-scale scenarios including exact multi-class classification with label regression, hyperparameter optimization, and uncertainty prediction. In contrast to previous approaches, we use a full Gaussian process model without sparse approximation techniques. Our methods are based on exploiting generalized histogram intersection kernels and their fast kernel multiplications. We empirically validate the suitability of our techniques in a wide range of scenarios with tens of thousands of examples. Whereas plain GP models are intractable due to both memory consumption and computation time in these settings, our results show that exact inference can indeed be done efficiently. In consequence, we enable every important piece of the Gaussian process framework—learning, inference, hyperparameter optimization, variance estimation, and online learning—to be used in realistic scenarios with more than a handful of data.  相似文献   

16.
针对传统深度核极限学习机网络仅利用端层特征进行分类导致特征不全面,以及故障诊断分类器中核函数选择不恰当等问题,提出基于多层特征表达和多核极限学习机的船舶柴油机故障诊断方法。利用深度极限学习机网络提取故障数据的多层特征;将提取出的各层特征级联为一个具有多属性特征的故障数据特征向量;使用多核极限学习机分类器准确地实现柴油机的故障诊断。在标准分类数据集和船舶柴油机仿真故障数据集上的实验结果表明,与其他极限学习机算法相比,该方法能够有效提高故障诊断的准确率和稳定性,且具有较好的泛化性能,是柴油机故障诊断一个更为优秀实用的工具。  相似文献   

17.
The kernel method has proved to be an effective machine learning tool in many fields. Support vector machines with various kernel functions may have different performances, as the kernels belong to two different types, the local kernels and the global kernels. So the composite kernel, which can bring more stable results and good precision in classification and regression, is an inevitable choice. To reduce the computational complexity of the kernel machine’s online modeling, an unbiased least squares support vector regression model with composite kernel is proposed. The bias item of LSSVR is eliminated by improving the form of structure risk in this model, and then the calculating method of the regression coefficients is greatly simplified. Simultaneously, through introducing the composite kernel to the LSSVM, the model can easily adapt to the irregular variation of the chaotic time series. Considering the real-time performance, an online learning algorithm based on Cholesky factorization is designed according to the characteristic of extended kernel function matrix. Experimental results indicate that the unbiased composite kernel LSSVR is effective and suitable for online time series with both the steep variations and the smooth variations, as it can well track the dynamic character of the series with good prediction precisions, better generalization and stability. The algorithm can also save much computation time comparing to those methods using matrix inversion, although there is a little more loss in time than that with the usage of single kernels.  相似文献   

18.
The kernel method, especially the kernel-fusion method, is widely used in social networks, computer vision, bioinformatics, and other applications. It deals effectively with nonlinear classification problems, which can map linearly inseparable biological sequence data from low to high-dimensional space for more accurate differentiation, enabling the use of kernel methods to predict the structure and function of sequences. Therefore, the kernel method is significant in the solution of bioinformatics problems. Various kernels applied in bioinformatics are explained clearly, which can help readers to select proper kernels to distinguish tasks. Mass biological sequence data occur in practical applications. Research of the use of machine learning methods to obtain knowledge, and how to explore the structure and function of biological methods for theoretical prediction, have always been emphasized in bioinformatics. The kernel method has gradually become an important learning algorithm that is widely used in gene expression and biological sequence prediction. This review focuses on the requirements of classification tasks of biological sequence data. It studies kernel methods and optimization algorithms, including methods of constructing kernel matrices based on the characteristics of biological sequences and kernel fusion methods existing in a multiple kernel learning framework.  相似文献   

19.
In the last few years, the applications of support vector machine (SVM) have substantially increased due to the high generalization performance and modeling of non-linear relationships. However, whether SVM behaves well largely depends on its adopted kernel function. The most commonly used kernels include linear, polynomial inner product functions and the Radial Basis Function (RBF), etc. Since the nature of the data is usually unknown, it is very difficult to make, on beforehand, a proper choice from the mentioned kernels. Usually, more than one kernel are applied to select the one which gives the best prediction performance but with a very time-consuming optimization procedure. This paper presents a kernel function based on Lorentzian function which is well-known in the field of statistics. The presented kernel can properly deal with a large variety of mapping problems due to its flexibility to vary. The applicability, suitability, performance and robustness of the presented kernel are investigated on bi-spiral benchmark data set as well as seven data sets from the UCI benchmark repository. The experiment results demonstrate that the presented kernel is robust and has stronger mapping ability comparing with the standard kernel functions, and it can obtain better generalization performance. In general, the proposed kernel can be served as a generic alternative for the common linear, polynomial and RBF kernels.  相似文献   

20.
针对有特殊结构的文本,传统的文本分类算法已经不能满足需求,为此提出一种基于多示例学习框架的文本分类算法。将每个文本当作一个示例包,文本中的标题和正文视为该包的两个示例;利用基于一类分类的多类分类支持向量机算法,将包映射到高维特征空间中;引入高斯核函数训练分类器,完成对无标记文本的分类预测。实验结果表明,该算法相较于传统的机器学习分类算法具有更高的分类精度,为具有特殊文本结构的文本挖掘领域研究提供了新的角度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号