首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Kernel machines are widely considered to be powerful tools in various fields of information science. By using a kernel, an unknown target is represented by a function that belongs to a reproducing kernel Hilbert space (RKHS) corresponding to the kernel. The application area is widened by enlarging the RKHS such that it includes a wide class of functions. In this study, we demonstrate a method to perform this by using parameter integration of a parameterized kernel. Some numerical experiments show that the unresolved problem of finding a good parameter can be neglected.  相似文献   

2.
Short-text classification is increasingly used in a wide range of applications. However, it still remains a challenging problem due to the insufficient nature of word occurrences in short-text documents, although some recently developed methods which exploit syntactic or semantic information have enhanced performance in short-text classification. The language-dependency problem, however, caused by the heavy use of grammatical tags and lexical databases, is considered the major drawback of the previous methods when they are applied to applications in diverse languages. In this article, we propose a novel kernel, called language independent semantic (LIS) kernel, which is able to effectively compute the similarity between short-text documents without using grammatical tags and lexical databases. From the experiment results on English and Korean datasets, it is shown that the LIS kernel has better performance than several existing kernels.  相似文献   

3.
核函数的性质及其构造方法   总被引:5,自引:0,他引:5  
王国胜 《计算机科学》2006,33(6):172-174
支持向量机是一项机器学习技术,发展至今近10年了,已经成功地用于模式识别、回归估计以及聚类等,并由此衍生出了核方法。支持向量机由核函数与训练集完全刻画。进一步提高支持向量机性能的关键,是针对给定的问题设计恰当的核函数,这就要求对核函数本身有深刻了解。本文首先分析了核函数的一些重要性质,接着对3类核函数,即平移不变核函数、旋转不变核函数和卷积核,提出了简单实用的判别准则。在此基础上,验证和构造了很多重要核函数。  相似文献   

4.
This paper discusses a method to estimate the expected value of the Gaussian kernel in the presence of incomplete data. We show how, under the general assumption of a missing-at-random mechanism, the expected value of the Gaussian kernel function has a simple closed-form solution. Such a solution depends only on the parameters of the Gamma distribution which is assumed to represent squared distances. Furthermore, we show how the parameters governing the Gamma distribution depend only on the non-central moments of the kernel arguments, via the second-order moments of their squared distance, and can be estimated by making use of any parametric density estimation model of the data distribution. We approximate the data distribution with the maximum likelihood estimate of a Gaussian mixture distribution. The validity of the method is empirically assessed, under a range of conditions, on synthetic and real problems and the results compared to existing methods. For comparison, we consider methods that indirectly estimate a Gaussian kernel function by either estimating squared distances or by imputing missing values and then computing distances. Based on the experimental results, the proposed method consistently proves itself an accurate technique that further extends the use of Gaussian kernels with incomplete data.  相似文献   

5.
This paper presents a new regularized kernel-based approach for the estimation of the second order moments of stationary stochastic processes. The proposed estimator is defined by a Tikhonov-type variational problem. It contains few unknown parameters which can be estimated by cross validation solving a sequence of problems whose computational complexity scales linearly with the number of noisy moments (derived from the samples of the process). The correlation functions are assumed to be summable and the hypothesis space is a reproducing kernel Hilbert space induced by the recently introduced stable spline kernel. In this way, information on the decay to zero of the functions to be reconstructed is incorporated in the estimation process. An application to the identification of transfer functions in the case of white noise as input is also presented. Numerical simulations show that the proposed method compares favorably with respect to standard nonparametric estimation algorithms that exploit an oracle-type tuning of the parameters.  相似文献   

6.
Polynomial Time Learnability of Simple Deterministic Languages   总被引:1,自引:0,他引:1  
Ishizaka  Hiroki 《Machine Learning》1990,5(2):151-164
This paper is concerned with the problem of learning simple deterministic languages. The algorithm described in this paper is based on the theory of model inference given by Shapiro. In our setting, however, nonterminal membership queries, except for the start symbol, are not permitted. Extended equivalence queries are used instead. Nonterminals that are necessary for a correct grammar and their intended models are introduced automatically. We give an algorithm that, for any simple deterministic language L, outputs a grammar G in 2-standard form, such that L = L(G), using membership queries and extended equivalence queries. We also show that the algorithm runs in time polynomial in the length of the longest counterexample and the number of nonterminals in a minimal grammar for L.  相似文献   

7.
We study ordinal embedding relaxations in the realm of parameterized complexity. We prove the existence of a quadratic kernel for the Betweenness problem parameterized above its tight lower bound, which is stated as follows. For a set V of variables and set C of constraints “vi is between vj and vk”, decide whether there is a bijection from V to the set {1,…,|V|} satisfying at least |C|/3+κ of the constraints in C. Our result solves an open problem attributed to Benny Chor in Niedermeier's monograph “Invitation to Fixed-Parameter Algorithms”. The betweenness problem is of interest in molecular biology. An approach developed in this paper can be used to determine parameterized complexity of a number of other optimization problems on permutations parameterized above or below tight bounds.  相似文献   

8.
提出了一种特征加权的核学习方法,其主要为了解决当前核方法在分类任务中对所有数据特征的同等对待的不足。在分类任务中,数据样本的每个特征所起的作用并不是相同的,有些特征对分类任务有促进作用,应该给予更多的关注。提出的算法集成了多核学习的优势,以加权的方式组合不同的核函数,但所需的计算复杂度更低。实验结果证明,提出的算法与支持向量机、多核学习算法相比,分类准确度优于支持向量机和多核学习算法,在计算复杂度上略高于支持向量机,但远远低于多核学习算法。  相似文献   

9.
By plotting the key value against the record position in a sorted file and then fitting a least squares polynomial through the points, a fast retrieval technique is determined. The target key of a search is inserted in the polynomial and the first access from the file is made on the basis of the evaluation. Since the maximum deviation can be determined, an efficient local search can be made. If the maximum deviation is less than half the size of the file, polynomial searching is more efficient than binary searching. While this method can be applied in many cases, it is most useful in disk oriented file systems where the goal is to minimize the number of accesses even at the expense of some additional calculation.  相似文献   

10.
经典线性算法的非线性核形式   总被引:6,自引:1,他引:6  
经典线性算法的非线性核形式是近10年发展起来的一类非线性机器学习技术.它们最显著的特点是利用满足Mercer条件的核函数巧妙地推导出线性算法的非线性形式。并表述为与样本数目有关、与维数无关的优化问题.为了提高数值计算的稳定性、控制算法的推广能力以及改善迭代过程的收敛性。部分算法还采用了正则化技术.在概述核思想与核函数、正则化技术的基础上,系统地介绍了经典线性算法的非线性核形式,同时分析它们的优缺点,井讨论了进一步发展的方向.  相似文献   

11.
In the framework of online object retrieval with learning, we address the problem of graph matching using kernel functions. An image is represented by a graph of regions where the edges represent the spatial relationships. Kernels on graphs are built from kernel on walks in the graph. This paper firstly proposes new kernels on graphs and on walks, which are very efficient for graphs of regions. Secondly we propose fast solutions for exact or approximate computation of these kernels. Thirdly we show results for the retrieval of images containing a specific object with the help of very few examples and counter-examples in the framework of an active retrieval scheme.  相似文献   

12.
13.
The kernel function is the core of the Support Vector Machine (SVM), and its selection directly affects the performance of SVM. There has been no theoretical basis on choosing a kernel function for speech recognition. In order to improve the learning ability and generalization ability of SVM for speech recognition, this paper presents the Optimal Relaxation Factor (ORF) kernel function, which is a set of new SVM kernel functions for speech recognition, and proves that the ORF function is a Mercer kernel function. The experiments show the ORF kernel function's effectiveness on mapping trend, bi-spiral, and speech recognition problems. The paper draws the conclusion that the ORF kernel function performs better than the Radial Basis Function (RBF), the Exponential Radial Basis Function (ERBF) and the Kernel with Moderate Decreasing (KMOD). Furthermore, the results of speech recognition with the ORF kernel function illustrate higher recognition accuracy.  相似文献   

14.
Patents are a type of intellectual property with ownership and monopolistic rights that are publicly accessible published documents, often with illustrations, registered by governments and international organizations. The registration allows people familiar with the domain to understand how to re-create the new and useful invention but restricts the manufacturing unless the owner licenses or enters into a legal agreement to sell ownership of the patent. Patents reward the costly research and development efforts of inventors while spreading new knowledge and accelerating innovation. This research uses artificial intelligence natural language processing, deep learning techniques and machine learning algorithms to extract the essential knowledge of patent documents within a given domain as a means to evaluate their worth and technical advantage. Manual patent abstraction is a time consuming, labor intensive, and subjective process which becomes cost and outcome ineffective as the size of the patent knowledge domain increases. This research develops an intelligent patent summarization methodology using artificial intelligence machine learning approaches to allow patent domains of extremely large sizes to be effectively and objectively summarized, especially for cases where the cost and time requirements of manual summarization is infeasible. The system learns to automatically summarize patent documents with natural language texts for any given technical domain. The machine learning solution identifies technical key terminologies (words, phrases, and sentences) in the context of the semantic relationships among training patents and corresponding summaries as the core of the summarization system. To ensure the high performance of the proposed methodology, ROUGE metrics are used to evaluate precision, recall, accuracy, and consistency of knowledge generated by the summarization system. The Smart machinery technologies domain, under the sub-domains of control intelligence, sensor intelligence and intelligent decision-making provide the case studies for the patent summarization system training. The cases use 1708 training pairs of patents and summaries while testing uses 30 randomly selected patents. The case implementation and verification have shown the summary reports achieve 90% and 84% average precision and recall ratios respectively.  相似文献   

15.
Indefinite kernels have attracted more and more attentions in machine learning due to its wider application scope than usual positive definite kernels. However, the research about indefinite kernel clustering is relatively scarce. Furthermore, existing clustering methods are mainly designed based on positive definite kernels which are incapable in indefinite kernel scenarios. In this paper, we propose a novel indefinite kernel clustering algorithm termed as indefinite kernel maximum margin clustering (IKMMC) based on the state-of-the-art maximum margin clustering (MMC) model. IKMMC tries to find a proxy positive definite kernel to approximate the original indefinite one and thus embeds a new F-norm regularizer in the objective function to measure the diversity of the two kernels, which can be further optimized by an iterative approach. Concretely, at each iteration, given a set of initial class labels, IKMMC firstly transforms the clustering problem into a classification one solved by indefinite kernel support vector machine (IKSVM) with an extra class balance constraint and then the obtained prediction labels will be used as the new input class labels at next iteration until the error rate of prediction is smaller than a prespecified tolerance. Finally, IKMMC utilizes the prediction labels at the last iteration as the expected indices of clusters. Moreover, we further extend IKMMC from binary clustering problems to more complexmulti-class scenarios. Experimental results have shown the superiority of our algorithms.  相似文献   

16.
The kernel method, especially the kernel-fusion method, is widely used in social networks, computer vision, bioinformatics, and other applications. It deals effectively with nonlinear classification problems, which can map linearly inseparable biological sequence data from low to high-dimensional space for more accurate differentiation, enabling the use of kernel methods to predict the structure and function of sequences. Therefore, the kernel method is significant in the solution of bioinformatics problems. Various kernels applied in bioinformatics are explained clearly, which can help readers to select proper kernels to distinguish tasks. Mass biological sequence data occur in practical applications. Research of the use of machine learning methods to obtain knowledge, and how to explore the structure and function of biological methods for theoretical prediction, have always been emphasized in bioinformatics. The kernel method has gradually become an important learning algorithm that is widely used in gene expression and biological sequence prediction. This review focuses on the requirements of classification tasks of biological sequence data. It studies kernel methods and optimization algorithms, including methods of constructing kernel matrices based on the characteristics of biological sequences and kernel fusion methods existing in a multiple kernel learning framework.  相似文献   

17.
Chemoinformatics is a research field concerned with the study of physical or biological molecular properties through computer science?s research fields such as machine learning and graph theory. From this point of view, graph kernels provide a nice framework which allows to naturally combine machine learning and graph theory techniques. Graph kernels based on bags of patterns have proven their efficiency on several problems both in terms of accuracy and computational time. Treelet kernel is a graph kernel based on a bag of small subtrees. We propose in this paper several extensions of this kernel devoted to chemoinformatics problems. These extensions aim to weight each pattern according to its influence, to include the comparison of non-isomorphic patterns, to include stereo information and finally to explicitly encode cyclic information into kernel computation.  相似文献   

18.
多核学习方法   总被引:51,自引:5,他引:51  
多核学习方法是当前核机器学习领域的一个新的热点. 核方法是解决非线性模式分析问题的一种有效方法, 但在一些复杂情形下, 由单个核函数构成的核机器并不能满足诸如数据异构或不规则、样本规模巨大、样本不平坦分布等实际的应用需求, 因此将多个核函数进行组合, 以获得更好的结果是一种必然选择. 本文根据多核的构成, 从合成核、多尺度核、无限核三个角度, 系统综述了多核方法的构造理论, 分析了多核学习典型方法的特点及不足, 总结了各自的应用领域, 并凝炼了其进一步的研究方向.  相似文献   

19.
Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here.  相似文献   

20.
Kernelization is a strong and widely-applied technique in parameterized complexity. A kernelization algorithm, or simply a kernel, is a polynomial-time transformation that transforms any given parameterized instance to an equivalent instance of the same problem, with size and parameter bounded by a function of the parameter in the input. A kernel is polynomial if the size and parameter of the output are polynomially-bounded by the parameter of the input.In this paper we develop a framework which allows showing that a wide range of FPT problems do not have polynomial kernels. Our evidence relies on hypothesis made in the classical world (i.e. non-parametric complexity), and revolves around a new type of algorithm for classical decision problems, called a distillation algorithm, which is of independent interest. Using the notion of distillation algorithms, we develop a generic lower-bound engine that allows us to show that a variety of FPT problems, fulfilling certain criteria, cannot have polynomial kernels unless the polynomial hierarchy collapses. These problems include k-Path, k-Cycle, k-Exact Cycle, k-Short Cheap Tour, k-Graph Minor Order Test, k-Cutwidth, k-Search Number, k-Pathwidth, k-Treewidth, k-Branchwidth, and several optimization problems parameterized by treewidth and other structural parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号