Speech Detection in Non-Stationary Noise Based on the 1/f Process Speech detection in non-stationary noise based on the 1/f process期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Speech Detection in Non-Stationary Noise Based on the 1/f Process

作者姓名：	王帆郑方吴文虎

作者单位：	CenterofSpeechTechnology,StateKeyLaboratoryofIntelligentTechnologyandSystems

摘要：	In this paper,an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments.The Gaussian 1/f process ,a mathematical model for statistically self-similar radom processes based on fractals,is selected to model the speech and the background noise.An optimal Bayesian two-class classifier is developed to discriminate them by their 1/f wavelet coefficients with Karhunen-Loeve-type properties.Multiple templates are trained for the speech signal,and the parameters of the background noise can be dynamically adapted in runtime to model the variation of both the speech and the noise.In our experiments,a 10-minute long speech with different types of noises ranging from 20dB to 5dB is tested using this new detection method.A high performance with over 90% detection accuracy is achieved when average SNR is about 10dB.
关键词：	语音识别非稳定噪声 1／f过程
Speech detection in non-stationary noise based on the 1/f process

Fan Wang,Fang Zheng,Wenhu Wu.Speech Detection in Non-Stationary Noise Based on the 1/f Process[J].Journal of Computer Science and Technology,2002,17(1):0-0.

Authors:	Fan Wang Fang Zheng Wenhu Wu

Affiliation:	(1) Center of Speech Technology, State Key Laboratory of Intelligent, Technology and Systems Department of Computer Science and Technology, Tsinghua University, 100084 Beijing, P.R. China

Abstract:	In this paper, an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments. The Gaussian 1/f process, a mathematical model for statistically self-similar random processes based on fractals, is selected to model both the speech and the background noise. An optimal Bayesian two-class classifier is developed to discriminate them by their 1/f wavelet coefficients with Karhunen-Loeve-type properties. Multiple templates are trained for the speech signal, and the parameters of the background noise can be dynamically adapted in runtime to model the variation of both the speech and the noise. In our experiments, a 10-minute long speech with different types of noises ranging from 20dB to 5dB is tested using this new detection method. A high performance with over 90% detection accuracy is achieved when average SNR is about 10dB. WANG Fan was born in 1974. He received his B.S. degree in computer science and technology from the Department of Computer Science and technology, Tsinghua University in 1998. He is currently a Ph.D. candidate and research assistant, majoring in computer applications. His current research interests focus on robust speech recognition and understanding. In 2000, he received the Excellent Student Paper Award in the ’2000 International Symposium on Chinese Spoken Language Processing (ISCSLP’2000). He is an ACM member and the chair of Tsinghua ACM Student Chapter. ZHENG Fang is currently an associate professor of Tsinghua University. He is director of the Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems. He received his B.S., M.S. and Ph.D. degrees in computer science and technology from Tsinghua University, in 1990, 1992 and 1997 respectively. He has been working on speech recognition and understanding at the Department of Computer science and Technology, Tsinghua University, since 1988. He has published over 80 technical papers on acoustic/language modeling, isolated/continuous speech recognition, keyword spotting, dictating, language understanding and so on. He is an IEEE member and a member of the Editorial Committee of the Journal of Chinese Information Processing. WU Wenhu received his B.S. degree in automation in 1961 from Tsinghua University. Since then, he has been with Tsinghua University, where he is currently a full professor in the Department of Computer Science and Technology. His major research interests include speech recognition and language understanding, speech synthesis, digital processing of speech signals, and so on. As a principal or key undertaker, he has been taking part in many state important tasks and the ‘863’ Hi-Tech projects and has been awarded several times.

Keywords:	speech detection 1/f process wavelet robust speech recognition
本文献已被维普万方数据 SpringerLink 等数据库收录！
	点击此处可从《计算机科学技术学报》浏览原始摘要信息
	点击此处可从《计算机科学技术学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏