首页 | 本学科首页   官方微博 | 高级检索  
     

以语音出现时频相关性为基础的语音掩模估计
引用本文:战鸽,黄兆琼,应冬文,潘接林,颜永红.以语音出现时频相关性为基础的语音掩模估计[J].软件学报,2016,27(S2):64-68.
作者姓名:战鸽  黄兆琼  应冬文  潘接林  颜永红
作者单位:中国科学院 声学研究所, 北京 100190,中国科学院 声学研究所, 北京 100190,中国科学院 声学研究所, 北京 100190,中国科学院 声学研究所, 北京 100190,中国科学院 声学研究所, 北京 100190
基金项目:国家自然科学基金(11461141004,91120001,61271426);中国科学院战略性先导科技专项(XDA06030100,XDA06030500);国家高技术研究发展计划(863)(2012AA012503);中国科学院重点部署项目(KGZD-EW-103-2)
摘    要:在二维的时频域网格结构中,相邻点上语音信号的存在与否是相关的,传统的马尔可夫链不能对二维的时频相关性进行自适应的建模.基于语音信号在时频域中的相关性,提出了一种利用二维的相关模型估计语音掩模的方法.该方法将时频域中带噪语音信号的对数功率谱划分为语音和非语音类,利用时域中的状态转移概率和前向因子描述语音信号的时域相关性,同时利用频域中的状态转移概率和邻域因子描述语音信号的频域相关性.通过全局的统计最优化,该模型将时域相关性和频域相关性相结合.给出了该模型的序贯化更新方法,逐帧更新模型并估计语音出现概率.在当前已知对数功率谱和模型参数的条件下,通过最大化后验概率得到的语音信号状态矩阵可以作为语音掩模的最优估计.将该方法与几种现有的语音掩模在线估计方法进行比较,实验结果显示出了该方法的优越性.

关 键 词:语音掩模  时频相关性  语音出现概率  邻域因子  在线估计
收稿时间:6/1/2015 12:00:00 AM
修稿时间:1/5/2016 12:00:00 AM

Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
ZHAN Ge,HUANG Zhao-Qiong,YING Dong-Wen,PAN Jie-Lin and YAN Yong-Hong.Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence[J].Journal of Software,2016,27(S2):64-68.
Authors:ZHAN Ge  HUANG Zhao-Qiong  YING Dong-Wen  PAN Jie-Lin and YAN Yong-Hong
Affiliation:Institute of Acoustics, The Chinese Academy of Sciences, Beijing 100190, China,Institute of Acoustics, The Chinese Academy of Sciences, Beijing 100190, China,Institute of Acoustics, The Chinese Academy of Sciences, Beijing 100190, China,Institute of Acoustics, The Chinese Academy of Sciences, Beijing 100190, China and Institute of Acoustics, The Chinese Academy of Sciences, Beijing 100190, China
Abstract:This paper proposes a method to estimate the spectrographic speech mask based on a two-dimensional (2-D) correlation model. The proposed method is motivated by a fact that the time and frequency correlations of speech presence are interwoven with each other in the time-frequency domain. Conventional Markov chain is incapable of simultaneously modeling the time and frequency correlations in an adaptive way. The 2-D correlation model is presented to describe the correlation of speech presence in the TF domain, where the speech presence and absence are taken as two states of the model. The time correlation is modeled by the time state-transition probability and the forward factor, while the frequency state-transition probability and the corresponding neighbor factor are defined to describe the frequency correlation. The time and frequency correlations are incorporated into the model by maximizing the Q-function. A sequential scheme is presented to online estimate the parameter set. Given the observed spectrum and the parameter set, the state matrix that maximizes the posteriori probability is regarded as the optimal estimate of the speech mask. The proposed method was compared with some well-established methods. The experimental results confirmed its superiority.
Keywords:speech mask  time-frequency correlation  speech presence probability  neighbor factor  online estimation
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号