共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
This paper explores the significance of stereo-based stochastic feature compensation (SFC) methods for robust speaker verification (SV) in mismatched training and test environments. Gaussian Mixture Model (GMM)-based SFC methods developed in past has been solely restricted for speech recognition tasks. Application of these algorithms in a SV framework for background noise compensation is proposed in this paper. A priori knowledge about the test environment and availability of stereo training data is assumed. During the training phase, Mel frequency cepstral coefficient (MFCC) features extracted from a speaker's noisy and clean speech utterance (stereo data) are used to build front end GMMs. During the evaluation phase, noisy test utterances are transformed on the basis of a minimum mean squared error (MMSE) or maximum likelihood (MLE) estimate, using the target speaker GMMs. Experiments conducted on the NIST-2003-SRE database with clean speech utterances artificially degraded with different types of additive noises reveal that the proposed SV systems strictly outperform baseline SV systems in mismatched conditions across all noisy background environments. 相似文献
3.
This paper explores the robustness of supervector-based speaker modeling approaches for speaker verification (SV) in noisy environments. In this paper speaker modeling is carried out in two different frameworks: (i) Gaussian mixture model-support vector machine (GMM-SVM) combined method and (ii) total variability modeling method. In the GMM-SVM combined method, supervectors obtained by concatenating the mean of an adapted speaker GMMs are used to train speaker-specific SVMs during the training/enrollment phase of SV. During the evaluation/testing phase, noisy test utterances transformed into supervectors are subjected to SVM-based pattern matching and classification. In the total variability modeling method, large size supervectors are reduced to a low dimensional channel robust vector (i-vector) prior to SVM training and subsequent evaluation. Special emphasis has been laid on the significance of a utterance partitioning technique for mitigating data-imbalance and utterance duration mismatches. An adaptive boosting algorithm is proposed in the total variability modeling framework for enhancing the accuracy of SVM classifiers. Experiments performed on the NIST-SRE-2003 database with training and test utterances corrupted with additive noises indicate that the aforementioned modeling methods outperform the standard GMM-universal background model (GMM-UBM) framework for SV. It is observed that the use of utterance partitioning and adaptive boosting in the speaker modeling frameworks result in substantial performance improvements under degraded conditions. 相似文献
4.
Yiying Zhang Zhang D. Xiaoyan Zhu 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2000,30(5):598-602
This correspondence introduces a new text-independent speaker verification method, which is derived from the basic idea of pattern recognition that the discriminating ability of a classifier can be improved by removing the common information between classes. In looking for the common speech characteristics between a group of speakers, a global speaker model can be established. By subtracting the score acquired from this model, the conventional likelihood score is normalized with the consequence of more compact score distribution and lower equal error rates. Several experiments are carried out to demonstrate the effectiveness of the proposed method 相似文献
5.
Multimedia Tools and Applications - Due to the mismatch between training and test conditions, speaker verification in real environments, continues to be a challenging problem. An effective way of... 相似文献
6.
7.
This paper presents a principled SVM based speaker verification system. We propose a new framework and a new sequence kernel that can make use of any Mercer kernel at the frame level. An extension of the sequence kernel based on the Max operator is also proposed. The new system is compared to state-of-the-art GMM and other SVM based systems found in the literature on the Banca and Polyvar databases. The new system outperforms, most of the time, the other systems, statistically significantly. Finally, the new proposed framework clarifies previous SVM based systems and suggests interesting future research directions. 相似文献
8.
Dhieb Thameur Boubaker Houcine Njah Sourour Ben Ayed Mounir Alimi Adel M. 《Multimedia Tools and Applications》2022,81(6):7817-7845
Multimedia Tools and Applications - The active modality of handwriting is broadly related to signature verification in the context of biometric user authentication systems. Signature verification... 相似文献
9.
Achim D. Brucker Burkhart Wolff 《International Journal on Software Tools for Technology Transfer (STTT)》2005,7(3):233-247
We present a method for the security analysis of realistic models over off-the-shelf systems and their configuration by formal, machine-checked proofs. The presentation follows a large case study based on a formal security analysis of a CVS-Server architecture.The analysis is based on an abstract architecture (enforcing a role-based access control), which is refined to an implementation architecture (based on the usual discretionary access control provided by the POSIX environment). Both architectures serve as a skeleton to formulate access control and confidentiality properties.Both the abstract and the implementation architecture are specified in the language Z. Based on a logical embedding of Z into Isabelle/HOL, we provide formal, machine-checked proofs for consistency properties of the specification, for the correctness of the refinement, and for security properties. 相似文献
10.
针对粒子群算法容易过早出现早熟收敛问题,提出一种改进的PSO算法。在当前粒子陷入局部最优时,该算法根据平均粒距对部分粒子以一定的概率进行变异,从而扩大粒子群的全局搜索能力。将改进的PSO算法用来训练支持向量机,并应用在说话人识别系统中。通过实验证明改进的PSO算法在收敛速度和识别精度上都得到了改善。 相似文献
11.
Detection of multiple ellipses in noisy environments is a basic yet challenging task in many vision related problems. The key area of difficulty is on distinguishing the pixels pertaining to each target in the presence of noise. To tackle with the issue, we propose a hierarchical approach which is motivated by the fact that any segment of an ellipse can identify itself in ellipse reconstruction. First, we find all the neat edges without any branches, followed by an ellipse fitting on each of them. Second, some target candidates are estimated based on the neat edges, by a proposed grouping strategy. Finally, the targets are detected based on the candidates, by a proposed selective competitive algorithm to distinguish the true pixels of each target. A real application of the proposed method is illustrated in addition to some other demonstrative experiments. 相似文献
12.
Multimedia Tools and Applications - The main novelty of this work resides in incorporating a Gammatone filter-bank as a substitute of the Mel filter-bank in the extraction pipeline of the Product... 相似文献
13.
14.
15.
Sundararajan Srinivasan Tao Ma Georgios Lazarou Joseph Picone 《International Journal of Speech Technology》2014,17(1):17-25
Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed. 相似文献
16.
Flavio J. Reyes-Díaz Gabriel Hernández-Sierra José R. Calvo de Lara 《International Journal of Speech Technology》2017,20(3):475-485
The performance of state-of-the-art speaker verification in uncontrolled environment is affected by different variabilities. Short duration variability is very common in these scenarios and causes the speaker verification performance to decrease quickly while the duration of verification utterances decreases. Linear discriminant analysis (LDA) is the most common session variability compensation algorithm, nevertheless it presents some shortcomings when trained with insufficient data. In this paper we introduce two methods for session variability compensation to deal with short-length utterances on i-vector space. The first method proposes to incorporate the short duration variability information in the within-class variance estimation process. The second proposes to compensate the session and short duration variabilities in two different spaces with LDA algorithms (2S-LDA). First, we analyzed the behavior of the within and between class scatters in the first proposed method. Then, both proposed methods are evaluated on telephone session from NIST SRE-08 for different duration of the evaluation utterances: full (average 2.5 min), 20, 15, 10 and 5 s. The 2S-LDA method obtains good results on different short-length utterances conditions in the evaluations, with a EER relative average improvement of 1.58%, compared to the best baseline (WCCN[LDA]). Finally, we applied the 2S-LDA method in speaker verification under reverberant environment, using different reverberant conditions from Reverb challenge 2013, obtaining an improvement of 8.96 and 23% under matched and mismatched reverberant conditions, respectively. 相似文献
17.
Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments
Shahin Ismail Nassif Ali Bou Nemmour Nawel Elnagar Ashraf Alhudhaif Adi Polat Kemal 《Neural computing & applications》2021,33(23):16033-16055
Neural Computing and Applications - In this work, we conducted an empirical comparative study of the performance of text-independent speaker verification in emotional and stressful environments.... 相似文献
18.
提出了一种基于韵律特征和SVM的文本无关说话人确认系统。采用小波分析方法,从语音信号的MFCC、F0和能量轨迹中提取出超音段韵律特征,通过实验研究三者的韵律特征在特征层的最佳互补融合,得到信号的韵律特征PMFCCFE,用韵律特征的GMM均值超矢量作为参数训练目标话者的SVM模型,以更有效地区分目标话者和冒认话者。在NIST06 8side-1side数据库的实验表明,以短时倒谱参数的GMM-UBM系统为基准,超音段韵律特征的GMM-SVM系统的EER相对下降了57.9%,MinDCF相对下降了41.4%。 相似文献
19.
A new speaker verification method with global speaker model and likelihood score normalization 下载免费PDF全文
In this paper a new text-independent speaker verification method GSMSV is proposed based on likelihood score normalization.In this novel method a global speaker model is established to represent the universal features of speech and normalize the likelihood score.Statistical analysis demonstrates that this normalization method can remove common factors of speech and bring the differences between speakers into prominence.As a result the equal error rate is decreased significantly,verification procedure is accelerated and system adaptability to speaking speed is improved. 相似文献
20.
This study explores a novel subspace projection-based approach for analysis of stressed speech. Studies have shown that stress influences the speech production system and it results in a large acoustic variation between the neutral and the stressed speech. This degrades the discrimination capability of an automatic speech recognition system trained on neutral speech when tested on stressed speech. An effort is made to reduce the acoustic mismatch by explicitly normalizing the stress-specific attributes. The stress-specific divergences are normalized by exploiting the subspace filtering technique. To accomplish this, an orthogonal projection based linear relationship between the speech and the stress information has been explored to filter an effective speech subspace, which consists of speech information. Speech subspace is constructed using K-means clustering followed by singular value decomposition method using neutral speech data. The speech and the stress information are separated by projecting the stressed speech orthogonally onto an effective speech subspace. Experimental results indicate that, the bases of an effective subspace comprises the first few eigenvectors corresponding to the highest eigenvalues. To further improve the system performance, both the neutral and the stressed speech are projected onto the lower dimensional subspace. The projections derived using the neutral speech employs heteroscedastic linear discriminant analysis in maximum likelihood linear transformations-based semi-tied adaptation framework. Consistent improvements are noted for the proposed technique in all the discussed cases. 相似文献