排序方式: 共有5条查询结果,搜索用时 15 毫秒
1
1.
This work processes linear prediction (LP) residual in the time domain at three different levels, extracts speaker information,
and demonstrates their significance and also different nature for text-independent speaker recognition. The subsegmental analysis
considers LP residual in blocks of 5 msec with shift of 2.5 msec to extract speaker information. The segmental analysis extracts
speaker information by processing in blocks of 20 msec with shift of 2.5 msec. The suprasegmental speaker information is extracted
by viewing in blocks of 250 msec with shift of 6.25 msec. The speaker identification and verification studies performed using
NIST-99 and NIST-03 databases demonstrate that the segmental analysis provides best performance followed by subsegmental analysis.
The suprasegmental analysis gives the least performance. However, the evidences from all the three levels of processing seem
to be different and combine well to provide improved performance, demonstrating different speaker information captured at
each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract
information further improves the speaker recognition performance. 相似文献
2.
Debadatta Pati S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2012,15(2):241-257
In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance
by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the
subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level
is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at
the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification
experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract
information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed
system helps the vocal tract system to further improve the overall performance. 相似文献
3.
4.
Sai Makireddi Sethy Debadatta Francis Sanal Kumar M. S. Yogendra Varghese Francis V. Balasubramaniam Krishnan 《Journal of Materials Science: Materials in Electronics》2021,32(22):26750-26757
Journal of Materials Science: Materials in Electronics - Multi-layer graphene nanoplatelet (mGNP)/poly(methyl methacrylate) (PMMA) nanocomposite flexible thin films were prepared at various GNP... 相似文献
5.
Dipanjan Nandi Debadatta Pati K. Sreenivasa Rao 《International Journal of Speech Technology》2015,18(3):459-477
In present work, the robustness of excitation source features has been analyzed for language identification (LID) task. The raw samples of linear prediction (LP) residual signal, its magnitude and phase components are processed at sub-segmental, segmental and supra-segmental levels for capturing the robust language-specific phonotactic information. Present LID study has been carried out on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC). Gaussian mixture models are used to develop the LID systems using robust language-specific excitation source information. Robustness of excitation source information has been evinced in view of (i) background noise, (ii) varying amount of training data and (iii) varying length of test samples. Finally, the robustness of proposed excitation source features is compared with the well-known spectral features using LID performances obtained from IITKGP-MLILSC database. Segmental level excitation source features obtained from raw samples of LP residual signal and its phase component perform better at low SNR levels, compared with the vocal tract features. 相似文献
1