期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A New Distance Measure for a Variable‐Sized Acoustic Model Based on MDL Technique

Hoon‐Young Cho Sanghun Kim 《ETRI Journal》2010,32(5):795-800

相似文献

2.

On the use of different speech representations for speaker modeling

Ke Chen 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》2005,35(3):301-314

Numerous speech representations have been reported to be useful in speaker recognition. However, there is much less agreement on which speech representation provides a perfect representation of speaker-specific information conveyed in a speech signal. Unlike previous work, we propose an alternative approach to speaker modeling by the simultaneous use of different speech representations in an optimal way. Inspired by our previous empirical studies, we present a soft competition scheme on different speech representations to exploit different speech representations in encoding speaker-specific information. On the basis of this soft competition scheme, we present a parametric statistical model, generalized Gaussian mixture model (GGMM), to characterize a speaker identity based on different speech representations. Moreover, we develop an expectation-maximization algorithm for parameter estimation in the GGMM. The proposed speaker modeling approach has been applied to text-independent speaker recognition and comparative results on the KING speech corpus demonstrate its effectiveness. 相似文献

3.

基于声门波和声道特征的语音情感识别

下载免费PDF全文

李永伟陶建华李凯《信号处理》2023,39(4):632-638

语音情感识别是实现自然人机交互不可缺失的部分,是人工智能的重要组成部分。发音器官的调控引起情感语音声学特征的差异,从而被感知到不同的情感。传统的语音情感识别只是针对语音信号中的声学特征或听觉特征进行情感分类,忽略了声门波和声道等发音特征对情感感知的重要作用。在我们前期工作中,理论分析了声门波和声道形状对感知情感的重要影响,但未将声门波与声道特征用于语音情感识别。因此,本文从语音生成的角度重新探讨了声门波与声道特征对语音情感识别的可能性,提出一种基于源-滤波器模型的声门波和声道特征语音情感识别方法。首先,利用Liljencrants-Fant和Auto-Regressive eXogenous(ARX-LF)模型从语音信号中分离出情感语音的声门波和声道特征;然后,将分离出的声门波和声道特征送入双向门控循环单元（BiGRU）进行情感识别分类任务。在公开的情感数据集IEMOCAP上进行了情感识别验证,实验结果证明了声门波和声道特征可以有效的区分情感,且情感识别性能优于一些传统特征。本文从发音相关的声门波与声道研究语音情感识别,为语音情感识别技术提供了一种新思路。相似文献

4.

Automatic detection of obstructive sleep apnea using speech signals

Goldshtein E Tarasiuk A Zigel Y 《IEEE transactions on bio-medical engineering》2011,58(5):1373-1382

Obstructive sleep apnea (OSA) is a common disorder associated with anatomical abnormalities of the upper airways that affects 5% of the population. Acoustic parameters may be influenced by the vocal tract structure and soft tissue properties. We hypothesize that speech signal properties of OSA patients will be different than those of control subjects not having OSA. Using speech signal processing techniques, we explored acoustic speech features of 93 subjects who were recorded using a text-dependent speech protocol and a digital audio recorder immediately prior to polysomnography study. Following analysis of the study, subjects were divided into OSA (n=67) and non-OSA (n=26) groups. A Gaussian mixture model-based system was developed to model and classify between the groups; discriminative features such as vocal tract length and linear prediction coefficients were selected using feature selection technique. Specificity and sensitivity of 83% and 79% were achieved for the male OSA and 86% and 84% for the female OSA patients, respectively. We conclude that acoustic features from speech signals during wakefulness can detect OSA patients with good specificity and sensitivity. Such a system can be used as a basis for future development of a tool for OSA screening. 相似文献

5.

约束条件下的结构化高斯混合模型及非平行语料语音转换

下载免费PDF全文

车滢霞俞一彪《电子学报》2016,44(9):2282-2288

提出一种约束条件下的结构化高斯混合模型及非平行语料语音转换方法.从源与目标说话人的原始非平行语料中提取出少量相同音节,在结构化高斯混合模型的训练过程中,利用这些相同音节包含的语义信息及声学特征对应关系对K均值聚类中心进行约束,并在（Expectation Maximum,EM）迭代过程中对语音帧属于模型分量的后验概率进行修正,得到基于约束的结构化高斯混合模型（Structured Gaussian Mixture Model with Constraint condition,C-SGMM）.再利用全局声学结构（Acoustic Universal Structure,AUS）原理对源和目标说话人的约束结构化高斯混合模型的高斯分布进行匹配对准,推导出短时谱转换函数.主观和客观评价实验结果表明,使用该方法得到的转换后语音在谱失真,目标倾向性和语音质量等方面均优于传统的结构化模型语音转换方法,转换语音的平均谱失真仅为0.52,说话人正确识别率达到95.25%,目标语音倾向性指标ABX平均为0.82,性能更加接近于基于平行语料的语音转换方法. 相似文献

6.

Phonemic hidden Markov models with continuous mixture outputdensities for large vocabulary word recognition

Deng L. Kenny P. Lennig M. Gupta V. Seitz F. Mermelstein P. 《Signal Processing, IEEE Transactions on》1991,39(7):1677-1681

The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another 相似文献

7.

Gaussian mixture density modeling, decomposition, and applications 总被引：6，自引：0，他引：6

Xinhua Zhuang Yan Huang Palaniappan K. Yunxin Zhao 《IEEE transactions on image processing》1996,5(9):1293-1302

We present a new approach to the modeling and decomposition of Gaussian mixtures by using robust statistical methods. The mixture distribution is viewed as a contaminated Gaussian density. Using this model and the model-fitting (MF) estimator, we propose a recursive algorithm called the Gaussian mixture density decomposition (GMDD) algorithm for successively identifying each Gaussian component in the mixture. The proposed decomposition scheme has advantages that are desirable but lacking in most existing techniques. In the GMDD algorithm the number of components does not need to be specified a priori, the proportion of noisy data in the mixture can be large, the parameter estimation of each component is virtually initial independent, and the variability in the shape and size of the component densities in the mixture is taken into account. Gaussian mixture density modeling and decomposition has been widely applied in a variety of disciplines that require signal or waveform characterization for classification and recognition. We apply the proposed GMDD algorithm to the identification and extraction of clusters, and the estimation of unknown probability densities. Probability density estimation by identifying a decomposition using the GMDD algorithm, that is, a superposition of normal distributions, is successfully applied to automated cell classification. Computer experiments using both real data and simulated data demonstrate the validity and power of the GMDD algorithm for various models and different noise assumptions. 相似文献

8.

High-quality voice conversion system based on GMM statistical parameters and RBF neural network

CHEN Xian-tong ZHANG Ling-hua 《中国邮电高校学报(英文版)》2014,21(5):68-75

A voice conversion （VC） system was designed based on Gaussian mixture model （GMM） and radial basis function （RBF） neural network. As a voice conversion model, RBF network needs quantities of training data to improve its performance. For one speech, the networks trained by different segments of data have different transformation effects. Since trying segment by segment to obtain the best conversion effect is complex, a conversion method was proposed, that uses GMM for statistics before training RBF network to aim at the problem. The speech transformation and representation using adaptive interpolation of weighted spectrum （STRAIGHT） model is used for accurate extraction of vocal tract spectrum. Then GMM is used to classify the numerous spectral parameters. The obtained mean parameters were trained in RBF network. Experiment reveals that, the soft classification ability of GMM can promptly realize the reduction and classification of training data under the premise of ensuring the training effect. The selection complexity is decreased thereafter. Compared to the conventional RBF network training methods, this method can make the transformation of spectral parameters more effective and improve the quality of converted speech. 相似文献

9.

Music Analysis Using Hidden Markov Mixture Models

Yuting Qi Paisley J.W. Carin L. 《Signal Processing, IEEE Transactions on》2007,55(11):5209-5224

We develop a hidden Markov mixture model based on a Dirichlet process (DP) prior, for representation of the statistics of sequential data for which a single hidden Markov model (HMM) may not be sufficient. The DP prior has an intrinsic clustering property that encourages parameter sharing, and this naturally reveals the proper number of mixture components. The evaluation of posterior distributions for all model parameters is achieved in two ways: 1) via a rigorous Markov chain Monte Carlo method; and 2) approximately and efficiently via a variational Bayes formulation. Using DP HMM mixture models in a Bayesian setting, we propose a novel scheme for music analysis, highlighting the effectiveness of the DP HMM mixture model. Music is treated as a time-series data sequence and each music piece is represented as a mixture of HMMs. We approximate the similarity of two music pieces by computing the distance between the associated HMM mixtures. Experimental results are presented for synthesized sequential data and from classical music clips. Music similarities computed using DP HMM mixture modeling are compared to those computed from Gaussian mixture modeling, for which the mixture modeling is also performed using DP. The results show that the performance of DP HMM mixture modeling exceeds that of the DP Gaussian mixture modeling. 相似文献

10.

DTI segmentation using an information theoretic tensor dissimilarity measure

Wang Z Vemuri BC 《IEEE transactions on medical imaging》2005,24(10):1267-1277

In recent years, diffusion tensor imaging (DTI) has become a popular in vivo diagnostic imaging technique in Radiological sciences. In order for this imaging technique to be more effective, proper image analysis techniques suited for analyzing these high dimensional data need to be developed. In this paper, we present a novel definition of tensor "distance" grounded in concepts from information theory and incorporate it in the segmentation of DTI. In a DTI, the symmetric positive definite (SPD) diffusion tensor at each voxel can be interpreted as the covariance matrix of a local Gaussian distribution. Thus, a natural measure of dissimilarity between SPD tensors would be the Kullback-Leibler (KL) divergence or its relative. We propose the square root of the J-divergence (symmetrized KL) between two Gaussian distributions corresponding to the diffusion tensors being compared and this leads to a novel closed form expression for the "distance" as well as the mean value of a DTI. Unlike the traditional Frobenius norm-based tensor distance, our "distance" is affine invariant, a desirable property in segmentation and many other applications. We then incorporate this new tensor "distance" in a region based active contour model for DTI segmentation. Synthetic and real data experiments are shown to depict the performance of the proposed model. 相似文献

11.

基于VEMAP的说话人识别鲁棒性研究

黄文娜彭亚雄《电声技术》2016,40(11):44-47

为了改善发声力度变化对说话人识别系统性能的影响.针对不同发声力度下语音信号的分析,提出了使用发声力度最大后验概率(Vocal Effort Maximum A Posteriori,VEMAP)自适应方法更新基于高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)的说话人识别系统模型.实验表明,所提出的方法使不同发声力度下系统EER％降低了88.45％与85.16％,有效解决了因发声力度变化引起的训练语音与测试语音音量失配,从而导致说话人识别性能降低的问题,改善说话人识别系统性能效果显著. 相似文献

12.

Application of the Gibbs distribution to hidden Markov modeling inspeaker independent isolated word recognition

Zhao Y. Atlas L.E. Zhuang X. 《Signal Processing, IEEE Transactions on》1991,39(6):1291-1299

A method of integrating the Gibbs distributions (GDs) into hidden Markov models (HMMs) is presented. The probabilities of the hidden state sequences of HMMs are modeled by GDs in place of the transition probabilities. The GDs offer a general way in modeling neighbor interactions of Markov random fields where the Markov chains in HMMs are special cases. An algorithm for estimating the model parameters is developed based on Baum reestimation, and an algorithm for computing the probability terms is developed using a lattice structure. The GD models were used for experiments in speech recognition on the TI speaker-independent, isolated digit database. The observation sequences of the speech signals were modeled by mixture Gaussian autoregressive densities. The energy functions of the GDs were developed using very few parameters and proved adequate in hidden layer modeling. The results of the experiments showed that the GD models performed at least as well as the HMM models 相似文献

13.

Relation of signal set choice to the performance of optimalnon-Gaussian detectors

Johnson D.H. Orsak G.C. 《Communications, IEEE Transactions on》1993,41(9):1319-1328

The optimal procedure for detecting the presence of discrete-time signals in additive noise can be derived from the likelihood ratio test. When the noise has statistically independent, identically distributed components, the dependence of the detector's performance on signal characteristics can be related to the Kullback-Leibler (KL) distance between the distributions governing the hypotheses. Performance predictions based on the central limit theorem are shown to be poor approximations to the true performance. Performance of the optimal detector has long been known to increase exponentially with increasing KL distance. Symmetric noise amplitude distributions yield a symmetric dependence on the difference between the signals' amplitudes at each time index. Small-signal (locally optimal) detection performance is shown to depend on signal energy, whereas large-signal performance depends on the signal waveform. When a distance measure can be defined, performance depends on a different measure than that used in the detector with one exception (the Gaussian) 相似文献

14.

On using non-linear canonical correlation analysis for voice conversion based on Gaussian mixture model 总被引：1，自引：0，他引：1

Zhihua Jian Zhen Yang 《电子科学学刊(英文版)》2010,27(1):1-7

Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).T... 相似文献

15.

基于高斯混合多目标滤波器的传感器控制策略

下载免费PDF全文

陈辉贺忠良连峰黎慧波《电子学报》2019,47(3):521-530

本文基于随机有限集的高斯混合多目标滤波器（Gaussian Mixture Multi-Target Filter,GM-MTF）提出几种传感器控制策略.首先,基于容积卡尔曼高斯混合多目标非线性滤波器,借助两个高斯分布之间的巴氏距离,推导GM-MTF的整体信息增益,并以此为基础提出相应的传感器控制策略.另外,设计高斯粒子的联合采样方法对多目标滤波器的预测高斯分量进行采样,用一组带权值的粒子去近似多目标统计特性,利用理想量测集对粒子的权值进行更新,继而研究利用Rényi散度作为评价函数,提出一种适应性更好的传感器控制策略.最后,给出基于目标势的后验期望（Posterior Expected Number of Targets,PENT）评价的高斯混合实现过程.仿真实验验证了提出算法的有效性. 相似文献

16.

NCIE在多特征选择及SAR目标识别中的应用

下载免费PDF全文

何洁李文娟陈欣《太赫兹科学与电子信息学报》2023,21(2):183-188

针对合成孔径雷达(SAR)图像目标识别问题,采用非线性相关信息熵(NCIE)进行多特征选取进而实现分类。基于混合高斯模型对SAR图像提取的各类特征进行概率建模,采用KL散度评价不同特征之间的相似度。采用非线性相关信息熵评价不同特征组合的相关性,根据最大熵值确定最优特征组合。对于选取的多类特征,基于联合稀疏表示模型进行表征和分类。利用MSTAR数据集对提出方法在标准操作条件和扩展操作条件下进行测试,结果验证了其有效性。相似文献

17.

基于统计模型和KL距离的纹理图像检索

赵平尚赵伟冯兴乐《微电子学与计算机》2007,24(11):49-52,56

为了进一步提高纹理图像的检索性能,提出了一种基于统计模型离的纹理特征提取算法。根据小波分解的特点,从小波系数角度出发,以每个子带的小波系数系数直方图分布特性作为纹理特征,采用混合高斯模型和一般高斯模型分别对低频和高频信息进行描述,利用最大似然估计规则将特征提取和相似计算结合起来,采用KL距离进行度量。与一般高斯模型方法比较,该算法具有检索精度高等特点。理论分析和在纹理图像检索的对比实验数据说明该算法在纹理特征提取方面的性能较一般高斯模型方法提高了5%。相似文献

18.

Impostor Detection in Speaker Recognition Using Confusion‐Based Confidence Measures

Kyuhong Kim Hoirin Kim Minsoo Hahn 《ETRI Journal》2006,28(6):811-814

In this letter, we introduce confusion‐based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model–universal background model (GMM‐UBM) scheme, our confusion‐based measures show better performance in noise‐corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors. 相似文献

19.

The subglottic region in articulator synthesizers

K. S. Gorbunov I. S. Makarov 《Journal of Communications Technology and Electronics》2011,56(12):1504-1509

This study is devoted to the simulation of the influence of the subglottic region (trachea, bronchi, and lungs) on the acoustic characteristics of the vocal tract in frequency-time articulatory synthesizers. The proposed model is reduced to the calculation of the short-term Fourier spectrum of the vocal source, taking into account the interaction of the transfer function of the vocal tract and subglottic cavities in the frequency domain, and synthesis of the vocal signal using the method of overlapping and summing. The scheme is tested with the use of the results of measurement of the vocal tract dynamics involving magnetic resonance tomography for a number of acoustic patterns of American English. 相似文献

20.

Quality enhancement of sinusoidal transform vocoders

Chang W.-W. Wang D.-Y. 《Vision, Image and Signal Processing, IEE Proceedings -》1998,145(6):379-383

The authors present quality enhancement of sinusoidal transform coders (STCs) via the development of parametric models. The benefits of the Bark spectrum are explored for use in the design of perceptual coding of the sine-wave amplitudes. According to the results, the proposed approach provides a uniform perceptual fit across the spectrum. To enhance the accuracy of phase representation, noncausal all-pole modelling of the vocal system is also discussed. Experimental results indicate that the use of the developed new parametric models allows the STC to improve the phase accuracy as well as the synthetic speech quality 相似文献