期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization

《IEEE transactions on audio, speech, and language processing》2010,18(1):141-157

In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BIC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using $Delta BIC$ , a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler 's fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy. 相似文献

2.

A Robust and Computationally Efficient Subspace-Based Fundamental Frequency Estimator

Xi Zhang J. Christensen M. G. Jensen S. H. Moonen M. 《IEEE transactions on audio, speech, and language processing》2010,18(3):487-497

This paper presents a method for high-resolution fundamental frequency $(F_{0})$ estimation based on subspaces decomposed from a frequency-selective data model, by effectively splitting the signal into a number of subbands. The resulting estimator is termed frequency-selective harmonic MUSIC (F-HMUSIC). The subband-based approach is expected to ensure computational savings and robustness. Additionally, a method for automatic subband signal activity detection is proposed, which is based on information-theoretic criterion where no subjective judgment is needed. The F-HMUSIC algorithm exhibits good statistical performance when evaluated with synthetic signals for both white and colored noises, while its evaluation on real-life audio signal shows the algorithm to be competitive with other estimators. Finally, F-HMUSIC is found to be computationally more efficient and robust than other subspace-based $F_{0}$ estimators, besides being robust against recorded data with inharmonicities. 相似文献

3.

Computationally Efficient Distributed and Delegated Certification

Arazi B. 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(9):1167-1174

Certification in public key cryptographic applications concerns the involvement of a CA (Certifying Agent) in approving the validity of users' public keys. Distributed certification pertains to the case where several CAs are involved in issuing certificates. This also includes the case of multi-attribute certification, where different CAs approve different user's attributes. In delegated certification, agents transfer certificate issuing authority in hierarchical chain. Distributed, multi-attribute and delegated certification techniques having a low computational complexity are presented in this paper. It is shown how multiplicity aspects of the various applications are mapped into a multiplied exponents mathematical implementation of the form Pi_i=1 ^m A_i ^bi mod p, which is essentially equivalent to a single exponentiation for a moderate m. A fundamental feature of the presented procedures pertains to implementing distributed/multi-attribute certification by referring to any desired subset of participating CAs. 相似文献

4.

Robust Speaker Identification and Verification 总被引：1，自引：0，他引：1

Jia-Ching Wang Chung-Hsien Yang Jhing-Fa Wang Hsiao-Ping Lee 《Computational Intelligence Magazine, IEEE》2007,2(2):52-59

Acoustic characteristics have played an essential role in biometrics. In this article, we introduce a robust, text-independent speaker identification/verification system. This system is mainly based on a subspace-based enhancement technique and probabilistic support vector machines (SVMs). First, a perceptual filterbank is created from a psycho-acoustic model into which the subspace-based enhancement technique is incorporated. We use the prior SNR of each subband within the perceptual filterbank to decide the estimator's gain to effectively suppress environmental background noises. Then, probabilistic SVMs identify or verify the speaker from the enhanced speech. The superiority of the proposed system has been demonstrated by twenty speaker data taken from AURORA-2 database with added background noises 相似文献

5.

改进的BIC说话人分割算法 总被引：1，自引：1，他引：0

下载免费PDF全文

郑继明张萍《计算机工程》2010,36(17):240-242

针对多人说话改变点检测问题,提出一种改进的BIC说话人分割算法。采用固定窗BIC算法对音频流进行分割,利用基于递归的分割算法和变长窗口的BIC算法确认潜在的分割点。实验结果表明,与其他BIC算法相比,该算法的准确率、召回率和综合性能较高。相似文献

6.

Robust Speaker Recognition in Noisy Conditions 总被引：2，自引：0，他引：2

Ji Ming Hazen T.J. Glass J.R. Reynolds D.A. 《IEEE transactions on audio, speech, and language processing》2007,15(5):1711-1723

相似文献

7.

多声源环境下的鲁棒说话人识别

张凤仪夏秀渝冉国敬何礼叶于林《计算机系统应用》2015,24(4):32-37

针对多声源干扰环境下说话人识别系统性能急剧下降的问题,提出一种提取目标语音的前端处理方法,该方法依据独立语音时频域的近似稀疏性,基于目标语音方位信息采用非线性时频掩蔽方法提取目标语音。建立了基于梅尔倒谱系数(MFCC)的高斯混合模型(GMM)说话人识别系统。仿真实验证明,该方法能有效提取目标语音,提高说话人识别系统的鲁棒性。该文多声源干扰仿真实验条件下,说话人识别系统的识别率平均提高了25%左右。相似文献

8.

声纹识别中合成语音的鲁棒性

陈联武郭武戴礼荣《模式识别与人工智能》2011,24(6):743-747

随着以隐马尔科夫模型为基础的语音合成技术的发展,冒认者很容易利用该技术生成具有目标说话人特性的合成语音,这对现有的声纹识别系统构成巨大威胁.针对此问题,文中从统计学的角度分析自然语音与合成语音在实倒谱上的区别,并提出对合成语音具有鲁棒性的声纹识别系统.实验结果初步表明,相比于传统的声纹识别系统,在对自然语音的等错误率不... 相似文献

9.

噪声环境下的鲁棒性说话人识别 总被引：5，自引：0，他引：5

白俊梅张世磊张树武徐波《中文信息学报》2006,20(1):93-99

在实际应用中,噪声或信道干扰导致说话人识别(SR)识别性能急剧下降。针对该问题,本文分析传统方法的优缺点并提出相应的系统解决方案:采用维纳滤波对语音信号进行前端处理;以MFCC声道特征结合基频(F0)韵律特征来提高识别系统的鲁棒性。实验结果表明:维纳滤波能有效地消除噪声影响;经维纳滤波处理后,使得F0-MFCC联合模型能更好的区分说话人。可以看出在噪声环境下系统的综合性能得到很大改善。相似文献

10.

一种改进的基于说话者的语音分割算法 总被引：13，自引：1，他引：13

卢坚毛兵孙正兴张福炎《软件学报》2002,13(2):274-279

语音分割是语音识别和语音文档检索等众多语音应用的基础.提出一种改进的基于说话者的语音分割算法,对GLR和BIC相结合的算法作进一步的改进:(1) 基于GLR距离方差的自适应阈值调整算法改进了不同声学特征下基于距离的语音分割算法中的阈值选取方法;(2) 引入BIC可测度概念来度量其适用范围;(3) BIC信息准则校准非冗余的候选分割点的偏差.实验结果表明,此改进算法优于原算法. 相似文献

11.

Robust and Efficient Implicit Surface Reconstruction for Point Clouds Based on Convexified Image Segmentation

Jian Liang Frederick Park Hongkai Zhao 《Journal of scientific computing》2013,54(2-3):577-602

We present an implicit surface reconstruction algorithm for point clouds. We view the implicit surface reconstruction as a three dimensional binary image segmentation problem that segments the entire space $\mathbb R ^3$ or the computational domain into an interior region and an exterior region while the boundary between these two regions fits the data points properly. The key points with using an image segmentation formulation are: (1) an edge indicator function that gives a sharp indicator of the surface location, and (2) an initial image function that provides a good initial guess of the interior and exterior regions. In this work we propose novel ways to build both functions directly from the point cloud data. We then adopt recent convexified image segmentation models and fast computational algorithms to achieve efficient and robust implicit surface reconstruction for point clouds. We test our methods on various data sets that are noisy, non-uniform, and with holes or with open boundaries. Moreover, comparisons are also made to current state of the art point cloud surface reconstruction techniques. 相似文献

12.

在线无监督说话人检索中稳健的模型自举算法 总被引：2，自引：0，他引：2

付中华张艳宁《软件学报》2007,18(3):608-616

基于回归树模型的多特征空间建模方法在回归类内部进行特征音分析,较好地解决了训练数据不足时说话人模型的训练问题,而短语音段聚类策略又进一步避免了过短的语音片断对自举训练的影响.验证实验采用了实际录制的近8小时的不同谈话数据.结果显示,即使平均自举片断长度小于5秒,新方法依然非常稳健,不仅提高了说话人改变检测的效果,而且优于通常的自举方法. 相似文献

13.

基于非参数直方图模型的鲁棒说话人识别算法

李燕萍唐振民丁辉张燕《数据采集与处理》2010,25(1)

建立一种非参数模型来刻画说话人的特征分布,并采用地面移动距离来度量分布之间的相似性.该方法能有效地利用有限的数据表达说话人的身份信息,直接计算特征分布与测试语音分布之间的距离,与传统的矢量量化和高斯混合模型相比,不需要通过对所有语音帧计算总平均失真误差和最小相似度,计算简单,主要能够降低系统对数据量的依赖性.并且通过自适应直方图均衡化方法对原始语音特征进行修正,使得噪声环境下获得的语音特征经过修正后更符合真实分布,增强了特征的抗噪性.实验表明,本文提出的方法在噪声环境下的短语音说话人识别系统中表现出较强的优势. 相似文献

14.

Robust Speaker Verification with Principal Pitch Components

Robert?M.?Nickel Email author Sachin?P.?Oswal Ananth?N.?Iyer 《International Journal of Speech Technology》2005,8(4):323-339

We are presenting a new method that improves the accuracy of text dependent speaker verification systems. The new method exploits a set of novel speech features derived from a principal component analysis of pitch synchronous voiced speech segments. We use the term principal pitch components (PPCs) or optimal pitch bases (OPBs) to denote the new feature set. Utterance distances computed from these new PPC features are only loosely correlated with utterance distances computed from cepstral features. A distance measure that combines both cepstral and PPC features provides a discriminative power that cannot be achieved with cepstral features alone. By augmenting the feature space of a cepstral baseline system with PPC features we achieve a significant reduction of the equal error probability of incorrect customer rejection versus incorrect impostor acceptance. The proposed method delivers robust performance in various noise conditions. 相似文献

15.

Robust Text-Independent Speaker Verification Using Genetic Programming

Peter Day Asoke K. Nandi 《IEEE transactions on audio, speech, and language processing》2007,15(1):285-295

Robust automatic speaker verification has become increasingly desirable in recent years with the growing trend toward remote security verification procedures for telephone banking, bio-metric security measures and similar applications. While many approaches have been applied to this problem, genetic programming offers inherent feature selection and solutions that can be meaningfully analyzed, making it well suited to this task. This paper introduces a genetic programming system to evolve programs capable of speaker verification and evaluates its performance with the publicly available TIMIT corpora. We also show the effect of a simulated telephone network on classification results which highlights the principal advantage, namely robustness to both additive and convolutive noise 相似文献

16.

Latent Prosody Analysis for Robust Speaker Identification

Yuan-Fu Liao Zi-He Chen Yau-Tarng Juang 《IEEE transactions on audio, speech, and language processing》2007,15(6):1870-1883

Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets. 相似文献

17.

信道失配环境下鲁棒说话人识别

冉国敬夏秀渝张凤仪《计算机系统应用》2015,24(3):235-240

目前说话人识别系统在理想环境下识别率已可达90%以上,但在实际通信环境下识别率却迅速下降.本文对信道失配环境下的鲁棒说话人识别进行研究.首先建立了一个基于高斯混合模型(GMM)的说话人识别系统,然后通过对实际通信信道的测试和分析,提出了两种改进方法.一是由实测数据建立了一个通用信道模型,将干净语音经通用信道模型滤波后再作为训练语音训练说话人模型;二是通过对比实测信道﹑理想低通信道及语音梅尔倒谱系数(MFCC)的特点,提出合理舍去语音第一﹑二维特征参数的方法.实验结果表明,通过处理后,系统在通信环境下的识别率提升了20%左右,与传统的倒谱均值减(CMS)方法相比,识别率提高了9%-12%. 相似文献

18.

说话人识别中语音切分算法的研究

何致远胡起秀徐光祜《计算机科学》2002,29(Z1):140-143

在说话人识别中,通常只根据帧幅度或帧能量筛选出有声帧用于训练和识别,对语音的精确切分并没有太高的要求.但是,当用于训练和识别的语音数据量较小时,如基于孤立词的文本提示与文本相关的说话人识别,为了保证数据的有效性,需要对输入的语音进行精确切分. 相似文献

19.

说话人识别中语音切分算法的研究

何致远胡起秀徐光祜《计算机工程与应用》2003,39(6):55-58

论文针对说话人识别中语音能量变化和噪声对提取有效语音数据的影响,在传统时域语音切分算法犤1,3犦的基础上,提出了三种孤立词的精确切分算法和一种连续语音的非精确切分算法。实验表明,新算法较好地克服了语音能量变化对切分的影响,在原始语音具有较高信噪比(≥10dB)的情况下,能够切除某些短时噪声和白噪声犤2犦。相似文献

20.

基于MCE训练算法的说话人辨认系统

王成儒王金甲《计算机工程》2003,29(13):105-106,114

提出了一种基于最小分类错误准则的概率神经网络的训练算法。实验结果表明，该系统及其MCE学习算法在20个说话人辨认应用中利用5s清晰语音获得98．9％的辨认率，利用l5s电话语音获得86．2％的辨认率。相似文献