期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Subsegmental,segmental and suprasegmental processing of linear prediction residual for speaker information

Debadatta Pati S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2011,14(1):49-64

This work processes linear prediction (LP) residual in the time domain at three different levels, extracts speaker information, and demonstrates their significance and also different nature for text-independent speaker recognition. The subsegmental analysis considers LP residual in blocks of 5 msec with shift of 2.5 msec to extract speaker information. The segmental analysis extracts speaker information by processing in blocks of 20 msec with shift of 2.5 msec. The suprasegmental speaker information is extracted by viewing in blocks of 250 msec with shift of 6.25 msec. The speaker identification and verification studies performed using NIST-99 and NIST-03 databases demonstrate that the segmental analysis provides best performance followed by subsegmental analysis. The suprasegmental analysis gives the least performance. However, the evidences from all the three levels of processing seem to be different and combine well to provide improved performance, demonstrating different speaker information captured at each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract information further improves the speaker recognition performance. 相似文献

2.

Speaker verification using excitation source information

Debadatta Pati S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2012,15(2):241-257

In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance. 相似文献

3.

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Debadatta Pati S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2015,18(3):333-350

相似文献

4.

Optical properties of multilayer graphene nanoplatelet (mGNP)/poly(methyl methacrylate) (PMMA) composite flexible thin films prepared by solvent casting

Sai Makireddi Sethy Debadatta Francis Sanal Kumar M. S. Yogendra Varghese Francis V. Balasubramaniam Krishnan 《Journal of Materials Science: Materials in Electronics》2021,32(22):26750-26757

Journal of Materials Science: Materials in Electronics - Multi-layer graphene nanoplatelet (mGNP)/poly(methyl methacrylate) (PMMA) nanocomposite flexible thin films were prepared at various GNP... 相似文献

5.

Implicit excitation source features for robust language identification

Dipanjan Nandi Debadatta Pati K. Sreenivasa Rao 《International Journal of Speech Technology》2015,18(3):459-477

In present work, the robustness of excitation source features has been analyzed for language identification (LID) task. The raw samples of linear prediction (LP) residual signal, its magnitude and phase components are processed at sub-segmental, segmental and supra-segmental levels for capturing the robust language-specific phonotactic information. Present LID study has been carried out on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC). Gaussian mixture models are used to develop the LID systems using robust language-specific excitation source information. Robustness of excitation source information has been evinced in view of (i) background noise, (ii) varying amount of training data and (iii) varying length of test samples. Finally, the robustness of proposed excitation source features is compared with the well-known spectral features using LID performances obtained from IITKGP-MLILSC database. Segmental level excitation source features obtained from raw samples of LP residual signal and its phase component perform better at low SNR levels, compared with the vocal tract features. 相似文献