Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD |
| |
Authors: | Md. Khademul Islam Molla Keikichi Hirose Md. Kamrul Hasan |
| |
Affiliation: | 1. Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan 2. Department of Computer Science and Engineering, The University of Rajshahi, Rajshahi, Bangladesh 3. Department of Electrical and Electronics Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
|
| |
Abstract: | This paper introduces a robust voiced/non-voiced (VnV) speech classification method using bivariate empirical mode decomposition (bEMD). Fractional Gaussian noise (fGn) is employed as the reference signal to derive a data adaptive threshold for VnV discrimination. The analyzing speech signal and fGn are combined to generate a complex signal which is decomposed into a finite number of complex-valued intrinsic mode functions (IMFs) by using bEMD. The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. The log-energies of both types of IMFs are calculated. There exist similarities between the IMF log-energy representation of fGn and unvoiced speech signals. Hence, the upper confidence limit from IMF log-energies of fGn is used as data adaptive threshold for VnV classification. If the subband log-energy of speech segment exceeds the threshold, the segment is classified as voiced and unvoiced otherwise. The experimental results show that the proposed algorithm performs better than the recently reported methods without requiring any training data for a wide range of SNRs. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|