首页 | 本学科首页   官方微博 | 高级检索  
     


Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News
Authors:Wai-Kit Lo  Helen M. Meng  P.C. Ching
Affiliation:1. The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Abstract:This paper presents the application of a multi-scale paradigm to Chinese spoken document retrieval (SDR) for improving retrieval performance. Multi-scale refers to the use of both words and subwords for retrieval. Words are basic units in a language that carry lexical meaning, and subword units (such as phonemes, syllables or characters) are building components for words. Retrieval using subword indexing units is better than retrieval using words because of the robustness of subword units to out-of-vocabulary (OOV) words during speech recognition and ambiguities in word segmentation. Experimental results have demonstrated that subword bigrams can bring improvement in retrieval performance over words (~9.56%). Application of multi-scale fusion to SDR aims at combining the lexical information of words and the robustness of subwords. This work presents the first detailed investigation for a Cantonese broadcast news retrieval task using two different multi-scale fusion approaches: pre-retrieval fusion and post-retrieval fusion. Multi-scale retrieval using both words and syllable bigrams achieves improvement in retrieval performance (~1.90%) over retrieval on the composite scales.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号