首页 | 本学科首页   官方微博 | 高级检索  
     


Learning state machine-based string edit kernels
Authors:Aurélien Bellet [Author Vitae]  Marc Bernard [Author Vitae]  Thierry Murgue [Author Vitae]  Marc Sebban [Author Vitae]
Affiliation:a Université de Lyon, F-42023 Saint-Étienne, France
b CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000 Saint-Étienne, France
c Université de Saint-Étienne, Jean-Monnet, F-42000 Saint-Étienne, France
Abstract:During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden Markov model) and compares two strings according to how they are generated by M. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings x and x built from an alphabet Σ requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine and (ii) to calculate an infinite sum over Σ* by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.
Keywords:String kernel  Marginalized kernel  Learned edit distance
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号