期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Discriminative classifiers with adaptive kernels for noise robust speech recognition

M.J.F. Gales F. Flego 《Computer Speech and Language》2010,24(4):648-662

Discriminative classifiers are a popular approach to solving classification problems. However, one of the problems with these approaches, in particular kernel based classifiers such as support vector machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noise-specific generative kernels can be obtained. These can be used to train a noise-independent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. The noise-specific kernels used in this paper are based on Vector Taylor Series (VTS) model-based compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS, and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTS-based test-set noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models. 相似文献

2.

Boosted Bayesian network classifiers 总被引：2，自引：0，他引：2

Yushi Jing Vladimir Pavlović James M. Rehg 《Machine Learning》2008,73(2):155-184

The use of Bayesian networks for classification problems has received a significant amount of recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal of classification (label prediction accuracy). Recent approaches to optimizing classification performance during parameter or structure learning show promise, but lack the favorable computational properties of maximum likelihood learning. In this paper we present boosted Bayesian network classifiers, a framework to combine discriminative data-weighting with generative training of intermediate models. We show that boosted Bayesian network classifiers encompass the basic generative models in isolation, but improve their classification performance when the model structure is suboptimal. We also demonstrate that structure learning is beneficial in the construction of boosted Bayesian network classifiers. On a large suite of benchmark data-sets, this approach outperforms generative graphical models such as naive Bayes and TAN in classification accuracy. Boosted Bayesian network classifiers have comparable or better performance in comparison to other discriminatively trained graphical models including ELR and BNC. Furthermore, boosted Bayesian networks require significantly less training time than the ELR and BNC algorithms. 相似文献

3.

Improving Semantic Concept Detection Through Optimizing Ranking Function

Sheng Gao Qibin Sun 《Multimedia, IEEE Transactions on》2007,9(7):1430-1442

In this paper, a kernel-based learning algorithm, kernel rank, is presented for improving the performance of semantic concept detection. By designing a classifier optimizing the receiver operating characteristic (ROC) curve using kernel rank, we provide a generic framework to optimize any differentiable ranking function using effective smoothing functions. kernel rank directly maximizes a 1-D quality measure of ROC, i.e., AUC (area under the ROC). It exploits the kernel density estimation to model the ranking score distributions and approximate the correct ranking count. The ranking metric is then derived and the learnable parameters are naturally embedded. To address the issues of computation and memory in learning, an efficient implementation is developed based on the gradient descent algorithm. We apply kernel rank with two types of kernel density functions to train the linear discriminant function and the Gaussian mixture model classifiers. From our experiments carried out on the development set for TREC Video Retrieval 2005, we conclude that (1) kernel rank is capable of training any differentiable classifier with various kernels; and (2) the learned ranking function performs better than traditional maximization likelihood or classification error minimization based algorithms in terms of AUC and average precision (AP). 相似文献

4.

Online Multiple Kernel Classification

Steven C. H. Hoi Rong Jin Peilin Zhao Tianbao Yang 《Machine Learning》2013,90(2):289-316

Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and their combination weights are correlated. The proposed algorithms are based on the fusion of two online learning algorithms, i.e., the Perceptron algorithm that learns a classifier for a given kernel, and the Hedge algorithm that combines classifiers by linear weights. We develop stochastic selection strategies that randomly select a subset of kernels for combination and model updating, thus improving the learning efficiency. Our empirical study with 15 data sets shows promising performance of the proposed algorithms for OMKC in both learning efficiency and prediction accuracy. 相似文献

5.

A multi-class classification strategy for Fisher scores: Application to signer independent sign language recognition

Oya Aran Author Vitae Lale Akarun^{Author Vitae} 《Pattern recognition》2010,43(5):1776-1788

Fisher kernels combine the powers of discriminative and generative classifiers by mapping the variable-length sequences to a new fixed length feature space, called the Fisher score space. The mapping is based on a single generative model and the classifier is intrinsically binary. We propose a multi-class classification strategy that applies a multi-class classification on each Fisher score space and combines the decisions of multi-class classifiers. We experimentally show that the Fisher scores of one class provide discriminative information for the other classes as well. We compare several multi-class classification strategies for Fisher scores generated from the hidden Markov models of sign sequences. The proposed multi-class classification strategy increases the classification accuracy in comparison with the state of the art strategies based on combining binary classifiers. To reduce the computational complexity of the Fisher score extraction and the training phases, we also propose a score space selection method and show that, similar or even higher accuracies can be obtained by using only a subset of the score spaces. Based on the proposed score space selection method, a signer adaptation technique is also presented that does not require any re-training. 相似文献

6.

The evaluation of evidence for exponentially distributed data

《Computational statistics & data analysis》2008,52(12):5682-5693

At present, likelihood ratios for two-level models are determined with the use of a normal kernel estimation procedure when the between-group distribution is thought to be non-normal. An extension is described for a two-level model in which the between-group distribution is very positively skewed and an exponential distribution may be thought to represent a good model. The theoretical likelihood ratio is derived. A likelihood ratio based on a biweight kernel with an adaptation at the boundary is developed. The performance of this kernel is compared alongside those of normal kernels and normal and exponential parametric models. A comparison of performance is made for simulated data where results may be compared with those of theory, using the theoretical model, as the true parameter values for the models are known. There is also a comparison for forensic data, using the concentration of aluminium in glass as an exemplar. Performance is assessed by determining the numbers of occasions on which the likelihood ratios for sets of fragments from the same group are supportive of the proposition that they are from different groups and the numbers of occasions on which the likelihood ratios for sets of fragments from different groups are supportive of the proposition that they are from the same group. 相似文献

7.

Approximate conditional inference in mixed-effects models with binary data

Woojoo Lee Youngjo Lee 《Computational statistics & data analysis》2010,54(1):173-184

The conditional likelihood approach is a sensible choice for a hierarchical logistic regression model or other generalized regression models with binary data. However, its heavy computational burden limits its use, especially for the related mixed-effects model. A modified profile likelihood is used as an accurate approximation to conditional likelihood, and then the use of two methods for inferences for the hierarchical generalized regression models with mixed effects is proposed. One is based on a hierarchical likelihood and Laplace approximation method, and the other is based on a Markov chain Monte Carlo EM algorithm. The methods are applied to a meta-analysis model for trend estimation and the model for multi-arm trials. A simulation study is conducted to illustrate the performance of the proposed methods. 相似文献

8.

The evaluation of evidence for exponentially distributed data

C.G.G. Aitken Q. Shen 《Computational statistics & data analysis》2007,51(12):5682-5693

At present, likelihood ratios for two-level models are determined with the use of a normal kernel estimation procedure when the between-group distribution is thought to be non-normal. An extension is described for a two-level model in which the between-group distribution is very positively skewed and an exponential distribution may be thought to represent a good model. The theoretical likelihood ratio is derived. A likelihood ratio based on a biweight kernel with an adaptation at the boundary is developed. The performance of this kernel is compared alongside those of normal kernels and normal and exponential parametric models. A comparison of performance is made for simulated data where results may be compared with those of theory, using the theoretical model, as the true parameter values for the models are known. There is also a comparison for forensic data, using the concentration of aluminium in glass as an exemplar. Performance is assessed by determining the numbers of occasions on which the likelihood ratios for sets of fragments from the same group are supportive of the proposition that they are from different groups and the numbers of occasions on which the likelihood ratios for sets of fragments from different groups are supportive of the proposition that they are from the same group. 相似文献

9.

Semisupervised learning for a hybrid generative/discriminative classifier based on the maximum entropy principle

Fujino A Ueda N Saito K 《IEEE transactions on pattern analysis and machine intelligence》2008,30(3):424-437

This paper presents a method for designing semi-supervised classifiers trained on labeled and unlabeled samples. We focus on probabilistic semi-supervised classifier design for multi-class and single-labeled classification problems, and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family, but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different. 相似文献

10.

Kernel Discriminant Analysis for Positive Definite and Indefinite Kernels

Pekalska E. Haasdonk B. 《IEEE transactions on pattern analysis and machine intelligence》2009,31(6):1017-1032

Kernel methods are a class of well established and successful algorithms for pattern analysis thanks to their mathematical elegance and good performance. Numerous nonlinear extensions of pattern recognition techniques have been proposed so far based on the so-called kernel trick. The objective of this paper is twofold. First, we derive an additional kernel tool that is still missing, namely kernel quadratic discriminant (KQD). We discuss different formulations of KQD based on the regularized kernel Mahalanobis distance in both complete and class-related subspaces. Secondly, we propose suitable extensions of kernel linear and quadratic discriminants to indefinite kernels. We provide classifiers that are applicable to kernels defined by any symmetric similarity measure. This is important in practice because problem-suited proximity measures often violate the requirement of positive definiteness. As in the traditional case, KQD can be advantageous for data with unequal class spreads in the kernel-induced spaces, which cannot be well separated by a linear discriminant. We illustrate this on artificial and real data for both positive definite and indefinite kernels. 相似文献

11.

Image representations and feature selection for multimedia database search 总被引：3，自引：0，他引：3

Evgeniou T. Pontil M. Papageorgiou C. Poggio T. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(4):911-920

The success of a multimedia information system depends heavily on the way the data is represented. Although there are "natural" ways to represent numerical data, it is not clear what is a good way to represent multimedia data, such as images, video, or sound. We investigate various image representations where the quality of the representation is judged based on how well a system for searching through an image database can perform-although the same techniques and representations can be used for other types of object detection tasks or multimedia data analysis problems. The system is based on a machine learning method used to develop object detection models from example images that can subsequently be used for examples to detect-search-images of a particular object in an image database. As a base classifier for the detection task, we use support vector machines (SVM), a kernel based learning method. Within the framework of kernel classifiers, we investigate new image representations/kernels derived from probabilistic models of the class of images considered and present a new feature selection method which can be used to reduce the dimensionality of the image representation without significant losses in terms of the performance of the detection-search-system. 相似文献

12.

Spatial random tree grammars for modeling hierarchal structure in images with regions of arbitrary shape

Siskind JM Sherman J Pollak I Harper MP Bouman CA 《IEEE transactions on pattern analysis and machine intelligence》2007,29(9):1504-1519

We present a novel probabilistic model for the hierarchical structure of an image and its regions. We call this model spatial random tree grammars (SRTGs). We develop algorithms for the exact computation of likelihood and maximum a posteriori (MAP) estimates and the exact expectation-maximization (EM) updates for model-parameter estimation. We collectively call these algorithms the center-surround algorithm. We use the center-surround algorithm to automatically estimate the maximum likelihood (ML) parameters of SRTGs and classify images based on their likelihood and based on the MAP estimate of the associated hierarchical structure. We apply our method to the task of classifying natural images and demonstrate that the addition of hierarchical structure significantly improves upon the performance of a baseline model that lacks such structure. 相似文献

13.

Multiple-kernel-learning-based extreme learning machine for classification design

Xiaodong Li Weijie Mao Wei Jiang 《Neural computing & applications》2016,27(1):175-184

The extreme learning machine (ELM) is a new method for using single hidden layer feed-forward networks with a much simpler training method. While conventional kernel-based classifiers are based on a single kernel, in reality, it is often desirable to base classifiers on combinations of multiple kernels. In this paper, we propose the issue of multiple-kernel learning (MKL) for ELM by formulating it as a semi-infinite linear programming. We further extend this idea by integrating with techniques of MKL. The kernel function in this ELM formulation no longer needs to be fixed, but can be automatically learned as a combination of multiple kernels. Two formulations of multiple-kernel classifiers are proposed. The first one is based on a convex combination of the given base kernels, while the second one uses a convex combination of the so-called equivalent kernels. Empirically, the second formulation is particularly competitive. Experiments on a large number of both toy and real-world data sets (including high-magnification sampling rate image data set) show that the resultant classifier is fast and accurate and can also be easily trained by simply changing linear program. 相似文献

14.

Type-2 fuzzy logic-based classifier fusion for support vector machines 总被引：1，自引：0，他引：1

Xiujuan Chen Yong Li Robert Harrison Yan-Qing Zhang 《Applied Soft Computing》2008,8(3):1222-1231

As a machine-learning tool, support vector machines (SVMs) have been gaining popularity due to their promising performance. However, the generalization abilities of SVMs often rely on whether the selected kernel functions are suitable for real classification data. To lessen the sensitivity of different kernels in SVMs classification and improve SVMs generalization ability, this paper proposes a fuzzy fusion model to combine multiple SVMs classifiers. To better handle uncertainties existing in real classification data and in the membership functions (MFs) in the traditional type-1 fuzzy logic system (FLS), we apply interval type-2 fuzzy sets to construct a type-2 SVMs fusion FLS. This type-2 fusion architecture takes considerations of the classification results from individual SVMs classifiers and generates the combined classification decision as the output. Besides the distances of data examples to SVMs hyperplanes, the type-2 fuzzy SVMs fusion system also considers the accuracy information of individual SVMs. Our experiments show that the type-2 based SVM fusion classifiers outperform individual SVM classifiers in most cases. The experiments also show that the type-2 fuzzy logic-based SVMs fusion model is better than the type-1 based SVM fusion model in general. 相似文献

15.

Composite kernel learning 总被引：2，自引：0，他引：2

Marie Szafranski Yves Grandvalet Alain Rakotomamonjy 《Machine Learning》2010,79(1-2):73-103

The Support Vector Machine is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimized in the learning process. Here, we propose Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels. Our formulation of the learning problem encompasses several setups, putting more or less emphasis on the group structure. We characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions. Finally, we illustrate the behavior of our method on multi-channel data where groups correspond to channels. 相似文献

16.

Combining feature spaces for classification

Theodoros Damoulas^{Author Vitae} Mark A. Girolami Author Vitae 《Pattern recognition》2009,42(11):2671-2683

In this paper we offer a variational Bayes approximation to the multinomial probit model for basis expansion and kernel combination. Our model is well-founded within a hierarchical Bayesian framework and is able to instructively combine available sources of information for multinomial classification. The proposed framework enables informative integration of possibly heterogeneous sources in a multitude of ways, from the simple summation of feature expansions to weighted product of kernels, and it is shown to match and in certain cases outperform the well-known ensemble learning approaches of combining individual classifiers. At the same time the approximation reduces considerably the CPU time and resources required with respect to both the ensemble learning methods and the full Markov chain Monte Carlo, Metropolis-Hastings within Gibbs solution of our model. We present our proposed framework together with extensive experimental studies on synthetic and benchmark datasets and also for the first time report a comparison between summation and product of individual kernels as possible different methods for constructing the composite kernel matrix. 相似文献

17.

Learning deep kernels in the space of dot product polynomials

Michele Donini Fabio Aiolli 《Machine Learning》2017,106(9-10):1245-1269

Recent literature has shown the merits of having deep representations in the context of neural networks. An emerging challenge in kernel learning is the definition of similar deep representations. In this paper, we propose a general methodology to define a hierarchy of base kernels with increasing expressiveness and combine them via multiple kernel learning (MKL) with the aim to generate overall deeper kernels. As a leading example, this methodology is applied to learning the kernel in the space of Dot-Product Polynomials (DPPs), that is a positive combination of homogeneous polynomial kernels (HPKs). We show theoretical properties about the expressiveness of HPKs that make their combination empirically very effective. This can also be seen as learning the coefficients of the Maclaurin expansion of any definite positive dot product kernel thus making our proposed method generally applicable. We empirically show the merits of our approach comparing the effectiveness of the kernel generated by our method against baseline kernels (including homogeneous and non homogeneous polynomials, RBF, etc...) and against another hierarchical approach on several benchmark datasets. 相似文献

18.

Local normalized linear summation kernel for fast and robust recognition

Kazuhiro Hotta Author Vitae 《Pattern recognition》2010,43(3):906-913

Kernel-based methods are effective for object detection and recognition. However, the computational cost when using kernel functions is high, except when using linear kernels. To realize fast and robust recognition, we apply normalized linear kernels to local regions of a recognition target, and the kernel outputs are integrated by summation. This kernel is referred to as a local normalized linear summation kernel. Here, we show that kernel-based methods that employ local normalized linear summation kernels can be computed by a linear kernel of local normalized features. Thus, the computational cost of the kernel is nearly the same as that of a linear kernel and much lower than that of radial basis function (RBF) and polynomial kernels. The effectiveness of the proposed method is evaluated in face detection and recognition problems, and we confirm that our kernel provides higher accuracy with lower computational cost than RBF and polynomial kernels. In addition, our kernel is also robust to partial occlusion and shadows on faces since it is based on the summation of local kernels. 相似文献

19.

Mean kernels to improve gravimetric geoid determination based on modified Stokes's integration

C. Hirt 《Computers & Geosciences》2011,37(11):1836-1842

Gravimetric geoid computation is often based on modified Stokes's integration, where Stokes's integral is evaluated with some stochastic or deterministic kernel modification. Accurate numerical evaluation of Stokes's integral requires the modified kernel to be integrated across the area of each discretised grid cell (mean kernel). Evaluating the modified kernel at the center of the cell (point kernel) is an approximation, which may result in larger numerical integration errors near the computation point, where the modified kernel exhibits a strongly nonlinear behavior. The present study deals with the computation of whole-of-the-cell mean values of modified kernels, exemplified here with the Featherstone-Evans-Olliver (1998) kernel modification [Featherstone, W.E., Evans, J.D., Olliver, J.G., 1998. A Meissl-modified Vaní?ek and Kleusberg kernel to reduce the truncation error in gravimetric geoid computations. Journal of Geodesy 72(3), 154-160]. We investigate two approaches (analytical and numerical integration), which are capable of providing accurate mean kernels. The analytical integration approach is based on kernel weighting factors which are used for the conversion of point to mean kernels. For the efficient numerical integration, Gauss-Legendre quadrature is applied. The comparison of mean kernels from both approaches shows a satisfactory mutual agreement at the level of 10⁻⁴ and better, which is considered to be sufficient for practical geoid computation requirements. Closed-loop tests based on the EGM2008 geopotential model demonstrate that using mean instead of point kernels reduces numerical integration errors by ∼65%. The use of mean kernels is recommended in remove-compute-restore geoid determination with the Featherstone-Evans-Olliver (1998) kernel or any other kernel modification under the condition that the kernel changes rapidly across the cells in the neighborhood of the computation point. 相似文献

20.

Deep Multimodal Fusion: A Hybrid Approach

Mohamed R. Amer Timothy Shields Behjat Siddiquie Amir Tamrakar Ajay Divakaran Sek Chai 《International Journal of Computer Vision》2018,126(2-4):440-456

We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the representation power of generative models. Our focus is on detecting multimodal events in time varying sequences as well as generating missing data in any of the modalities. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers. On the other hand, generative models learn a rich informative space which allows for data generation and joint feature representation that discriminative models lack. We propose a new model that jointly optimizes the representation space using a hybrid energy function. We employ a Restricted Boltzmann Machines (RBMs) based model to learn a shared representation across multiple modalities with time varying data. The Conditional RBMs (CRBMs) is an extension of the RBM model that takes into account short term temporal phenomena. The hybrid model involves augmenting CRBMs with a discriminative component for classification. For these purposes we propose a novel Multimodal Discriminative CRBMs (MMDCRBMs) model. First, we train the MMDCRBMs model using labeled data by training each modality, followed by training a fusion layer. Second, we exploit the generative capability of MMDCRBMs to activate the trained model so as to generate the lower-level data corresponding to the specific label that closely matches the actual input data. We evaluate our approach on ChaLearn dataset, audio-mocap, as well as the Tower Game dataset, mocap-mocap as well as three multimodal toy datasets. We report classification accuracy, generation accuracy, and localization accuracy and demonstrate its superiority compared to the state-of-the-art methods. 相似文献