期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Toda T. Black A.W. Tokuda K. 《IEEE transactions on audio, speech, and language processing》2007,15(8):2222-2235

In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality. 相似文献

2.

基于朗斯基函数的混合高斯模型运动目标检测_*

王宝珠胡洋郭志涛刘翠响《计算机应用研究》2016,33(12)

针对传统的混合高斯模型在进行运动目标检测时存在拖影和性能差的缺点,本文提出了一种融合朗斯基函数和帧间差分法的混合高斯背景建模算法。该改进算法通过朗斯基矩阵行列式判断相邻像素间空间域相关性,以此增加模型参数更新条件,改进模型参数更新机制;并利用帧间差分法检测运动目标轮廓的灵敏性,将两种检测结果布尔或运算,完善目标轮廓。实验结果表明,该改进算法对拖影现象达到很好的抑制作用,并使算法检测性能得到提高。相似文献

3.

Scribble-based object segmentation with modified gaussian mixture models

Raluca-Diana Şambra-Petre Titus Zaharia 《Pattern Analysis & Applications》2016,19(3):593-609

In this paper, we present an interactive segmentation method, designed to help the user to extract an object of interest from an image. The proposed approach adopts the scribble-based segmentation paradigm. The user interaction consists of specifying a set of lines, corresponding to both foreground and background scribbles. The segmentation process is based on color distributions, estimated with Gaussian mixture models (GMM). We show that such a technique presents some limitations when dealing with compressed images, even for relatively high quality compression factors: in this case, blocking artifacts may degrade the segmentation results. In order to overcome such a drawback, a modified GMM model, which re-shapes the Gaussian mixture based on the eigenvalues of the GMM components, is proposed. The experimental evaluation, carried out on a corpus of various images with different characteristics and textures, demonstrates the superiority of the modified GMM model which is able to appropriately take into account compression artifacts. 相似文献

4.

Accurate image segmentation using Gaussian mixture model with saliency map

Hui Bi Hui Tang Guanyu Yang Huazhong Shu Jean-Louis Dillenseger 《Pattern Analysis & Applications》2018,21(3):869-878

Gaussian mixture model (GMM) is a flexible tool for image segmentation and image classification. However, one main limitation of GMM is that it does not consider spatial information. Some authors introduced global spatial information from neighbor pixels into GMM without taking the image content into account. The technique of saliency map, which is based on the human visual system, enhances the image regions with high perceptive information. In this paper, we propose a new model, which incorporates the image content-based spatial information extracted from saliency map into the conventional GMM. The proposed method has several advantages: It is easy to implement into the expectation–maximization algorithm for parameters estimation, and therefore, there is only little impact in computational cost. Experimental results performed on the public Berkeley database show that the proposed method outperforms the state-of-the-art methods in terms of accuracy and computational time. 相似文献

5.

Low-complexity source coding using Gaussian mixture models, lattice vector quantization, and recursive coding with application to speech spectrum quantization

《IEEE transactions on audio, speech, and language processing》2006,14(2):524-532

In this paper, we use the Gaussian mixture model (GMM) based multidimensional companding quantization framework to develop two important quantization schemes. In the first scheme, the scalar quantization in the companding framework is replaced by more efficient lattice vector quantization. Low-complexity lattice pruning and quantization schemes are provided for the E/sub 8/ Gossett lattice. At moderate to high bit rates, the proposed scheme recovers much of the space-filling loss due to the product vector quantizers (PVQ) employed in earlier work, and thereby, provides improved performance with a marginal increase in complexity. In the second scheme, we generalize the compression framework to accommodate recursive coding. In this approach, the joint probability density function (PDF) of the parameter vectors of successive source frames is modeled using a GMM. The conditional density of the parameter vector of the current source frame based on the quantized values of the parameter vector of the previous source frames is used to generate a new codebook for every current source frame. We demonstrate the efficacy of the proposed schemes in the application of speech spectrum quantization. The proposed scheme is shown to provide superior performance with moderate increase in complexity when compared with conventional one-step linear prediction based compression schemes for both narrow-band and wide-band speech. 相似文献

6.

融入邻域作用的高斯混合分割模型及简化求解 总被引：1，自引：0，他引：1

下载免费PDF全文

石雪李玉李晓丽赵泉华《中国图象图形学报》2017,22(12):1758-1768

目的基于高斯混合模型（GMM）的图像分割方法易受噪声影响,为此采用马尔可夫随机场（MRF）将像素邻域关系引入GMM,提高算法抗噪性。针对融入邻域作用的高斯混合分割模型结构复杂、参数估计困难,难以获得全局最优分割解等问题,提出一种融入邻域作用的高斯混合分割模型及其简化求解方法。方法首先,构建融入邻域作用的GMM。为了提高GMM的抗噪性,采用MRF建模混合模型权重系数的先验分布。然后,利用贝叶斯理论建立图像分割模型,即品质函数;由于品质函数中参数较多（包括权重系数,均值,协方差）、函数结构复杂,导致参数求解困难。因此,将品质函数中的均值和协方差定义为权重系数的函数,由此简化模型结构并方便其求解;虽然品质函数中仅包含参数权重系数,但结构比较复杂,难以求得参数的解析式。最后,采用非线性共轭梯度法（CGM）求解参数,该方法仅需利用品质函数值和参数梯度值,降低了参数求解的复杂性,并且收敛快,可以得到全局最优解。结果为了有效而准确地验证提出的分割方法,分别采用本文算法和对比算法对合成图像和高分辨率遥感图像进行分割实验,并定性和定量地评价和分析了实验结果。实验结果表明本文方法的有效抗噪性,并得到很好的分割结果。从参数估计结果可以看出,本文算法有效简化了模型参数,并获得全局最优解。结论提出一种融入邻域作用的高斯混合分割模型及其简化求解方法,实验结果表明,本文算法提高了算法的抗噪性,有效地简化了模型参数,并得到全局最优参数解。本文算法对具有噪声的高分辨率遥感影像广泛适用。相似文献

7.

Modeling and evaluating Gaussian mixture model based on motion granularity

Nam Jun Cho Sang Hyoung Lee Il Hong Suh 《Intelligent Service Robotics》2016,9(2):123-139

To model manipulation tasks, we propose a novel method for learning manipulation skills based on the degree of motion granularity. Even though manipulation tasks usually consist of a mixture of fine-grained and coarse-grained movements, to the best of our knowledge, manipulation skills have so far been modeled without considering their motion granularity. To model such a manipulation skill, Gaussian mixture models (GMMs) have been represented using several well-known techniques such as principal component analysis, k-means, Bayesian information criterion, and expectation-maximization (EM) algorithms. However, in this GMM, there is a problem in that when a mixture of fine-grained and coarse-grained movements is modeled as a GMM, fine-grained movements tend to be poorly represented. To resolve this issue, we measure a continuous degree of motion granularity for every time step of a manipulation task from a GMM. Then, we remodel the GMM by weighting a conventional k-means algorithm with motion granularity. Finally, we also estimate the parameters of the GMM by weighting the conventional EM with motion granularity. To validate our proposed method, we evaluate the GMM estimated using our proposed method by comparing it with those estimated by different GMMs in terms of inference, regression, and generalization using a robot arm that performs two daily tasks, namely decorating a very small area and passing through a narrow tunnel. 相似文献

8.

基于声学分段模型的无监督语音样例检测

李勃昊张连海郑永军《数据采集与处理》2016,31(2):407-414

提出一种基于声学分段模型的无监督语音样例检测方法。该方法首先利用高斯混合模型（Gaussian mixture model, GMM)将训练数据频谱参数转换为后验概率特征向量,采用层次聚类算法确定后验概率的边界信息,得到声学分段;然后通过k means算法将片段聚类并添加标签,构建基于后验概率的声学分段模型。检索时以模型对查询样例与检索文档的解码序列代替测量矩阵以降低检索时间,通过基于最小编辑距离的动态匹配检索查询项,最小编辑距离的代价函数由模型相似度距离矩阵修正。实验结果表明,相比GMM及传统声学分段模型,本文提出的方法性能更好,检索速度得到显著提升。相似文献

9.

On robustness of speech based biometric systems against voice conversion attack

《Applied Soft Computing》2015

Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW⁻). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW⁻ based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system. 相似文献

10.

基于双因子高斯过程动态模型的声道谱转换方法

孙新建张雄伟杨吉斌曹铁勇钟新毅《自动化学报》2014,40(6):1198-1207

针对作者已经提出的双因子高斯过程隐变量模型（Two-factor Gaussian process latent variable model,TF-GPLVM）用于语音转换时未考虑语音的动态特征,并且模型训练时需要估计的参数较多的问题,提出引入隐马尔科夫模型（Hidden Markov model,HMM）对语音动态特征进行建模,并利用HMM隐状态对各帧语音进行关于语义内容的概率软分类,建立了分离精度更高、运算负荷较小的双因子高斯过程动态模型（Two-factor Gaussian process dynamic model,TF-GPDM）.基于此模型,设计了一种全新的基于说话人特征替换的语音声道谱转换方案.主、客观实验结果表明,无论是与传统的统计映射和频率弯折转换方法相比,还是与双因子高斯过程隐变量模型方法相比,本文方法都获得了语音质量和转换相似度的提升,以及两项性能的更佳平衡. 相似文献

11.

基于自适应码率分配的压缩传感深度视频编码方法

王康兰旭光李翔伟《模式识别与人工智能》2018,31(4):293-299

压缩传感深度视频(CSDV)是由深度视频经过压缩得到,它的冗余信息仍然巨大,由此,文中提出基于高斯混合模型和边缘码率分配的深度视频编码方法.在时域方向上,使用压缩传感,压缩八帧深度视频,得到一帧CSDV图像.为了减小量化的计算复杂度,将一帧CSDV图像分割成一系列大小相同且互不重合的视频块,使用Canny算子作为边界提取工具提取视频块的边界.根据每个视频块中非零像素所占的百分比,给不同的视频块分配不同的比特数.在模型中,使用高斯混合模型建模这些视频块,用于设计乘积矢量量化器,再使用乘积矢量量化器量化这些视频块. 相似文献

12.

基于多微商核函数的SVM话者确认

许敏强戴蓓蒨刘青松许东星《数据采集与处理》2011,26(5)

给出了一种基于多微商核函数(MDK)的结合高斯混合模型(GMM)和支持向量机(SVM)的方法,并应用于SVM文本无关话者确认。从GMM话者语音特征概率分布出发,用多阶微商描述GMM概率分布,将GMM和SVM结合的问题转化为用多阶微商建立SVM话者模型的问题。首先对说话人语音进行基于因子分析的参数域失配补偿,用GMM描述失配补偿后的话者语音特征的概率分布;然后对GMM求多阶微商;最后构建多微商核函数,建立多SVM话者模型。在NIST’01 2min-1min话者确认数据库上的实验表明,基于多微商核函数的SVM话者确认系统性能优于基于失配补偿的GMM系统,也比基于失配补偿的Fisher核函数SVM话者系统和基于失配补偿的Kullback-Leibler(KL)距离SVM话者系统有较大的提高。相似文献

13.

Improving GMM–UBM speaker verification using discriminative feedback adaptation

Yi-Hsiang Chao Wei-Ho Tsai Hsin-Min Wang 《Computer Speech and Language》2009,23(3):376-388

The Gaussian mixture model – Universal background model (GMM–UBM) system is one of the predominant approaches for text-independent speaker verification, because both the target speaker model and the impostor model (UBM) have generalization ability to handle “unseen” acoustic patterns. However, since GMM–UBM uses a common anti-model, namely UBM, for all target speakers, it tends to be weak in rejecting impostors’ voices that are similar to the target speaker’s voice. To overcome this limitation, we propose a discriminative feedback adaptation (DFA) framework that reinforces the discriminability between the target speaker model and the anti-model, while preserving the generalization ability of the GMM–UBM approach. This is achieved by adapting the UBM to a target speaker dependent anti-model based on a minimum verification squared-error criterion, rather than estimating the model from scratch by applying the conventional discriminative training schemes. The results of experiments conducted on the NIST2001-SRE database show that DFA substantially improves the performance of the conventional GMM–UBM approach. 相似文献

14.

基于GMM和ANN混合模型的语音转换方法

姚绍芹张玲华《数据采集与处理》2014,29(2):227-231

为了克服利用高斯混合模型(GMM)进行语音转换的过程中出现的过平滑现象,考虑到GMM模型参数的均值能够表征转换特征的频谱包络形状,本文提出一种基于GMM与ANN混合模型的语音转换,利用ANN对GMM模型参数的均值进行转换;为了获取连续的转换频谱,采用静态和动态频谱特征相结合来逼近转换频谱序列;鉴于基频对语音转换的重要性,在频谱转换的基础上,对基频也进行了分析和转换。最后,通过主观和客观实验对提出的混合模型的语音转换方法的性能进行测试,实验结果表明,与传统的基于GMM模型的语音转换方法相比,本文提出的方法能够获得更好的转换语音。相似文献

15.

Speaker identification using multi-step clustering algorithm with transformation-based GMM 总被引：1，自引：0，他引：1

Limin Xu Zhenmin Tang 《Automatic Control and Computer Sciences》2007,41(4):224-231

To improve the performance of speaker recognition, the embedded linear transformation is used to integrate both transformation and diagonal-covariance Caussian mixture into a unified framework. In the case, the mixture number of GMM must be fixed in model training. The cluster expectation-maximization (EM) algorithm is a well-known technique in which the mixture number is regarded as an estimated parameter. This paper presents a new model structure that integrates a multi-step cluster algorithm into the estimating process of GMM with the embedded transformation. In the approach, the transformation matrix, the mixture number and model parameters are simultaneously estimated according to a maximum likelihood criterion. The proposed method is demonstrated on a database of three data sessions for text independent speaker identification. The experiments show that this method outperforms the traditional GMM with cluster EM algorithm. This text was submitted by the authors in English. 相似文献

16.

基于LSP线谱对参数的GMM说话人识别系统

陈俊盛利元《微计算机信息》2010,(4)

提出一种基于线谱对LSP特征参数的说话人识别算法。利用线谱对LSP(LinearSpectrumPairs)系数良好的动态范围和滤波稳定性,以及良好的内插特性和量化特性,提取语音信号中隐含的音谱特征。本文将LSP算法和高斯混合模型(GMM)相结合,实验证明说话人识别系统中LSP比用LPCC作为特征参数有着更好的识别效果,在低码率的说话人识别方面有着良好的应用前景。相似文献

17.

A new a priori SNR estimator based on multiple linear regression technique for speech enhancement

《Digital Signal Processing》2014

We propose a new approach to estimate the a priori signal-to-noise ratio (SNR) based on a multiple linear regression (MLR) technique. In contrast to estimation of the a priori SNR employing the decision-directed (DD) method, which uses the estimated speech spectrum in previous frame, we propose to find the a priori SNR based on the MLR technique by incorporating regression parameters such as the ratio between the local energy of the noisy speech and its derived minimum along with the a posteriori SNR. In the experimental step, regression coefficients obtained using the MLR are assigned according to various noise types, for which we employ a real-time noise classification scheme based on a Gaussian mixture model (GMM). Evaluations using both objective speech quality measures and subjective listening tests under various ambient noise environments show that the performance of the proposed algorithm is better than that of the conventional methods. 相似文献

18.

Iterative joint source-channel decoding of speech spectrum parameters over an additive white Gaussian noise channel 总被引：2，自引：0，他引：2

《IEEE transactions on audio, speech, and language processing》2006,14(1):152-162

In this paper, we show how the Gaussian mixture modeling framework used to develop efficient source encoding schemes can be further exploited to model source statistics during channel decoding in an iterative framework to develop an effective joint source-channel decoding scheme. The joint probability density function (PDF) of successive source frames is modeled as a Gaussian mixture model (GMM). Based on previous work, the marginal source statistics provided by the GMM is used at the encoder to design a low-complexity memoryless source encoding scheme. The source encoding scheme has the specific advantage of providing good estimates to the probability of occurrence of a given source code-point based on the GMM. The proposed iterative decoding procedure works with any channel code whose decoder can implement the soft-output Viterbi algorithm that uses a priori information (APRI-SOVA) or the BCJR algorithm to provide extrinsic information on each source encoded bit. The source decoder uses the GMM model and the channel decoder output to provide a priori information back to the channel decoder. Decoding is done in an iterative manner by trading extrinsic information between the source and channel decoders. Experimental results showing improved decoding performance are provided in the application of speech spectrum parameter compression and communication. 相似文献

19.

A real-time CFAR thresholding method for target detection in hyperspectral images

Zhao Huijie Lou Chen Li Na 《Multimedia Tools and Applications》2017,76(13):15155-15171

In order to support immediate decision-making in critical circumstances such as military reconnaissance and disaster rescue, real-time onboard implementation of target detection is greatly desired. In this paper, a real-time thresholding method (RT-THRES) is proposed to obtain the constant false alarm rate (CFAR) thresholds for target detection in real-time circumstances. RT-THRES utilizes Gaussian mixture model (GMM) to track and fit the distribution of the target detector’s outputs. GMM is an extension to Gaussian probability density function, which could approximate any distribution smoothly. In this method, GMM is utilized to model the detector’s output, and then the detection threshold is calculated to achieve a CFAR detection. The conventional GMM’s parameter estimation by Expectation-Maximization (EM) requires all data samples in the dataset to be involved during the procedure and the the parameters would be re-estimated when new data samples available. Thus, GMM is difficult to be applied in real-time processing when newly observed data samples coming progressively. To improve GMM’s application availability in time-critical circumstance, an optimization strategy is proposed by introducing the Incremental GMM (IGMM) which allows GMM’s parameter to be estimated online incrementally. Experiments on real hyperspectral image and synthetic dataset suggest that RT-THRES can track and model the detection outputs’ distribution accurately which ensures the accuracy of the calculation of CFAR thresholds. Moreover, by applying the optimization strategy the computational consumption of RT-THRES maintains relatively low.

相似文献

20.

基于两种GMM-UBM多维概率输出的SVM语音情感识别*

黄永明章国宝董飞达飞鹏《计算机应用研究》2011,28(1):98-101

针对GMM应用于情感识别时区分能力较弱的缺点,提出了一种将GMM与SVM有效结合的算法:基于GMM-UBM多维概率输出的SVM语音情感识别方法。该方法将GMM-UBM模型对一条语音的情感特征参数的两种多维概率输出（与特征向量同维、与GMM阶数同维）作为SVM分类器的特征参数,既利用了GMM表征数据本身统计特性的能力,又保留了SVM判决能力强的特点。在柏林情感语音库与汉语情感语料库上进行的实验结果表明,该方法在语音情感识别上的平均识别率较标准GMM方法提高1.7%3.7%。相似文献