共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a principled SVM based speaker verification system. We propose a new framework and a new sequence kernel that can make use of any Mercer kernel at the frame level. An extension of the sequence kernel based on the Max operator is also proposed. The new system is compared to state-of-the-art GMM and other SVM based systems found in the literature on the Banca and Polyvar databases. The new system outperforms, most of the time, the other systems, statistically significantly. Finally, the new proposed framework clarifies previous SVM based systems and suggests interesting future research directions. 相似文献
2.
Mitchell McLaren Driss Matrouf Robbie Vogt Jean-Francois Bonastre 《Computer Speech and Language》2011,25(2):327-340
This paper presents an extended study on the implementation of support vector machine (SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMMs) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors.This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance. 相似文献
3.
Three extensions to the Kernel-AdaTron training algorithm for Support Vector Machine classifier learning are presented. These
extensions allow the trained classifier to adhere more closely to the constraints imposed by Support Vector Machine theory.
The results of these modifications show improvements over the existing Kernel-AdaTron algorithm. A method of parameter optimisation
for polynomial kernels is also proposed. 相似文献
4.
In this study, we investigate an offline to online strategy for speaker adaptation of automatic speech recognition systems. These systems are trained using the traditional feed-forward and the recent proposed lattice-free maximum mutual information (MMI) time-delay deep neural networks. In this strategy, the test speaker identity is modeled as an iVector which is offline estimated and then used in an online style during speech decoding. In order to ensure the quality of iVectors, we introduce a speaker enrollment stage which can ensure sufficient reliable speech for estimating an accurate and stable offline iVector. Furthermore, different iVector estimation techniques are also reviewed and investigated for speaker adaptation in large vocabulary continuous speech recognition (LVCSR) tasks. Experimental results on several real-time speech recognition tasks demonstrate that, the proposed strategy can not only provide a fast decoding speed, but also can result in significant reductions in word error rates (WERs) than traditional iVector based speaker adaptation frameworks. 相似文献
5.
Vocal tract length normalization (VTLN) for standard filterbank-based Mel frequency cepstral coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion. A linear transform (LT) equivalent for frequency warping (FW) would enable more efficient MLS estimation. We recently proposed a novel LT to perform FW for VTLN and model adaptation with standard MFCC features. In this paper, we present the mathematical derivation of the LT and give a compact formula to calculate it for any FW function. We also show that our LT is closely related to different LTs previously proposed for FW with cepstral features, and these LTs for FW are all shown to be numerically almost identical for the sine-log all-pass transform (SLAPT) warping functions. Our formula for the transformation matrix is, however, computationally simpler and, unlike other previous LT approaches to VTLN with MFCC features, no modification of the standard MFCC feature extraction scheme is required. In VTLN and speaker adaptive modeling (SAM) experiments with the DARPA resource management (RM1) database, the performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation. This demonstrates that the approximations involved do not lead to any performance degradation. Performance comparable to front end VTLN was also obtained with LT adaptation of HMM means in the back end, combined with mean bias and variance adaptation according to the maximum likelihood linear regression (MLLR) framework. The FW methods performed significantly better than standard MLLR for very limited adaptation data (1 utterance), and were equally effective with unsupervised parameter estimation. We also performed speaker adaptive training (SAT) with feature space LT denoted CLTFW. Global CLTFW SAT gave results comparable to SAM and VTLN. By estimating multiple CLTFW transforms using a regression tree, and including an additive bias, we obtained significantly improved results compared to VTLN, with increasing adaptation data. 相似文献
6.
研究群决策中专家赋权问题.实际决策问题中,由于客体信息自身存在的不完备性和不确定性以及人们描述过程中的模糊性,更适合采用模糊聚类的分析方法,为此提出一种基于判断矩阵的专家模糊核聚类赋权方法.该方法运用模糊核聚类理论对专家排序向量进行分类,根据分类结果、判断矩阵一致性和排序向量的熵对各专家进行组合赋权.算例表明,所提出的方法是可行且有效的. 相似文献
7.
8.
《Ergonomics》2012,55(15):1047-1077
The present study (N=56) investigated spatio-temporal accuracy of horizontal reaching movements controlled visually through a vertical video monitor. Direct vision of the hand was precluded and the direction of hand trajectory, as perceived on the video screen, was varied by changing the angle of the camera. The orientation of the visual scene displayed on the fronto-parallel plane was thus congruent (0° condition) or non-congruent (directional bias of 15°, 30° or 45° counterclockwise) according to the horizontal working space. The goal of this study was to determine whether local learning of a directional bias can be transferred to other locations in the working space, but taking into account the magnitude of the directional bias (15°, 30° or 45°), and the position of the successive objectives (targets at different distances (TDD) or different azimuths (TDA)). Analysis of the spatial accuracy of pointing movements showed that when introducing a directional bias, terminal angular error was linearly related to the amount of angular perturbation (around 30%). Seven trials were, on average, necessary to eliminate this terminal error, whatever the magnitude of the directional bias and the position of the successive targets. When changing the location of the spatial objective, transfer of adaptation was achieved in the TDD condition but remained partial in the TDA condition. Furthermore, initial orientation of the trajectory suggested that some participants used a hand-centred frame of reference whereas others used an external one to specify movement vector. The adaptation process differed as a function of the frame of reference used, but only in the TDA condition. Adaptation for participants using a hand-centred frame of reference was more concerned with changes in the shape of the trajectory, whereas participants using an external frame of reference adapted their movement by up-dating the initial direction of hand trajectory. As a whole, these findings suggest that the processes involved in remote visual control of hand movement are complex with the result that tasks requiring video-controlled manipulation like video-controlled surgery require specific spatial abilities in actors and consequential plasticity of their visuo-motor system, in particular concerning the selection of the frame of reference for action. 相似文献
9.
The present study (N=56) investigated spatio-temporal accuracy of horizontal reaching movements controlled visually through a vertical video monitor. Direct vision of the hand was precluded and the direction of hand trajectory, as perceived on the video screen, was varied by changing the angle of the camera. The orientation of the visual scene displayed on the fronto-parallel plane was thus congruent (0 degrees condition) or non-congruent (directional bias of 15 degrees, 30 degrees or 45 degrees counterclockwise) according to the horizontal working space. The goal of this study was to determine whether local learning of a directional bias can be transferred to other locations in the working space, but taking into account the magnitude of the directional bias (15 degrees, 30 degrees or 45 degrees ), and the position of the successive objectives (targets at different distances (TDD) or different azimuths (TDA)). Analysis of the spatial accuracy of pointing movements showed that when introducing a directional bias, terminal angular error was linearly related to the amount of angular perturbation (around 30%). Seven trials were, on average, necessary to eliminate this terminal error, whatever the magnitude of the directional bias and the position of the successive targets. When changing the location of the spatial objective, transfer of adaptation was achieved in the TDD condition but remained partial in the TDA condition. Furthermore, initial orientation of the trajectory suggested that some participants used a hand-centred frame of reference whereas others used an external one to specify movement vector. The adaptation process differed as a function of the frame of reference used, but only in the TDA condition. Adaptation for participants using a hand-centred frame of reference was more concerned with changes in the shape of the trajectory, whereas participants using an external frame of reference adapted their movement by up-dating the initial direction of hand trajectory. As a whole, these findings suggest that the processes involved in remote visual control of hand movement are complex with the result that tasks requiring video-controlled manipulation like video-controlled surgery require specific spatial abilities in actors and consequential plasticity of their visuo-motor system, in particular concerning the selection of the frame of reference for action. 相似文献
10.
《Computer Speech and Language》2007,21(2):231-246
In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations, (2) reinforced learning, and (3) stochastic feature transformation to reduce the effect caused by the acoustic distortion. Specifically, during training, the clean speaker models and background models are firstly transformed by MLLR-based handset-specific transformations using a small amount of distorted speech data. Then reinforced learning is applied to adapt the transformed models to handset-dependent speaker models and handset-dependent background models using stochastically transformed speaker patterns. During a verification session, a GMM-based handset classifier is used to identify the most likely handset used by the claimant; then the corresponding handset-dependent speaker and background model pairs are used for verification. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on the combination of MLLR, reinforced learning and feature transformation outperforms CMS, Hnorm, Tnorm, and speaker model synthesis. 相似文献
11.
Sequential kernel density approximation and its application to real-time visual tracking 总被引:3,自引:0,他引:3
Han B Comaniciu D Zhu Y Davis LS 《IEEE transactions on pattern analysis and machine intelligence》2008,30(7):1186-1197
Visual features are commonly modeled with probability density functions in computer vision problems, but current methods such as a mixture of Gaussians and kernel density estimation suffer from either the lack of flexibility, by fixing or limiting the number of Gaussian components in the mixture, or large memory requirement, by maintaining a non-parametric representation of the density. These problems are aggravated in real-time computer vision applications since density functions are required to be updated as new data becomes available. We present a novel kernel density approximation technique based on the mean-shift mode finding algorithm, and describe an efficient method to sequentially propagate the density modes over time. While the proposed density representation is memory efficient, which is typical for mixture densities, it inherits the flexibility of non-parametric methods by allowing the number of components to be variable. The accuracy and compactness of the sequential kernel density approximation technique is illustrated by both simulations and experiments. Sequential kernel density approximation is applied to on-line target appearance modeling for visual tracking, and its performance is demonstrated on a variety of videos. 相似文献
12.
Five approaches that can be used to control and simplify the speech recognition task are examined. They entail the use of isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions. The five components of a speech recognition system are described: a speech capture device, a digital signal processing module, preprocessed signal storage, reference speech patterns, and a pattern-matching algorithm. Current speech recognition systems are reviewed and categorized. Speaker recognition approaches and systems are also discussed 相似文献
13.
14.
语谱图是语音信号的时频表示,含有丰富的信息。把语谱图输入到脉冲耦合神经网络(PCNN)可以获得语音的特征矢量。传统的语音特征采用PCNN50次迭代的点火次数。提出了一种新的语音特征参数,该参数基于PCNN神经元点火位置的信息。说话人识别的实验表明,新语音特征比传统的特征更能反映话者语音信号的特点,获得更好的识别结果。 相似文献
15.
AdaBoost is a famous ensemble learning method and has achieved successful applications in many fields.The existing studies illustrate that AdaBoost easily suffe... 相似文献
16.
Suykens J.A.K. Van Gestel T. Vandewalle J. De Moor B. 《Neural Networks, IEEE Transactions on》2003,14(2):447-450
In this paper, we present a simple and straightforward primal-dual support vector machine formulation to the problem of principal component analysis (PCA) in dual variables. By considering a mapping to a high-dimensional feature space and application of the kernel trick (Mercer theorem), kernel PCA is obtained as introduced by Scholkopf et al. (2002). While least squares support vector machine classifiers have a natural link with the kernel Fisher discriminant analysis (minimizing the within class scatter around targets +1 and -1), for PCA analysis one can take the interpretation of a one-class modeling problem with zero target value around which one maximizes the variance. The score variables are interpreted as error variables within the problem formulation. In this way primal-dual constrained optimization problem interpretations to the linear and kernel PCA analysis are obtained in a similar style as for least square-support vector machine classifiers. 相似文献
17.
《Advanced Engineering Informatics》2004,18(1):1-8
A multi-layer perceptron network is made adaptive by weight updating using the extended Kalman filter (EKF). When the network is used as a model for a non-linear plant, the model can be on-line adapted with input/output data to capture system time-varying dynamics and consequently used in adaptive control. The paper describes how the EKF algorithm is used to update the network model and gives the implementation procedure. The developed adaptive model is evaluated for on-line modelling and model inversion control of a simulated continuous-stirred tank reactor. The modelling and control results show the effectiveness of model adaptation to system disturbance and a global tracking control. 相似文献
18.
An overview of kernel alignment and its applications 总被引:1,自引:0,他引:1
19.
JinFeng Wang KwongSak Leung KinHong Lee ZhenYuan Wang WenZhong Wang Jun Xu 《国际智能系统杂志》2012,27(1):48-68
Nonlinear integrals (NIs) are useful integration tools. It can get a set of virtual values by projecting original data onto a virtual space for classification purpose using NIs. The classical NIs implement projection along a line with respect to the features. But, in many cases, the linear projection cannot achieve good performance for classification or regression due to the limitation of the integrand. The linear function used for the integrand is just a special type of function with respect to the features. In this paper, we propose a nonlinear integrals with polynomial kernel (NIPK). A polynomial function with respect to the features is used as the integrand of NIs. It enables the projection to be along different types of curves to the virtual space so that the virtual values gotten by NIs can be better regularized and have higher separation power for classification. We use genetic algorithm to learn the fuzzy measures so that a larger solution space can be searched. To test the capability of the NIPK, we apply it to classification on several benchmark datasets and a bioinformatics project. Experiments show that there is evident improvement on performance for the NIPK compared to classical NIs. © 2011 Wiley Periodicals, Inc. 相似文献
20.
RBF核SVM及其应用研究 总被引:8,自引:1,他引:8
因其核函数的良好性态,RBF核SVM(RBF-SVM)在实际应用中表现出良好的学习性能,但是RBF核函数中的参数对SVM的性能起决定性作用.阐述了RBF-SVM的性能随着变化而变化的规律,并将RBF-SVM引入自动羽绒识别系统中.根据自动羽绒识别系统的实际需求和RBF-SVM的性能变化规律,论述了本系统中参数的选取依据和选取过程,并且给出了的相关曲线变化图.通过研究,最后得到适合本系统的识别模型,从而提高了系统的总体识别率.同时,也验证了RBF-SVM的良好特性和其受参数的约束规律. 相似文献