首页 | 本学科首页   官方微博 | 高级检索  
     


Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news
Authors:Lei Xie
Affiliation:(1) School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
Abstract:This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.
Keywords:Multimedia retrieval  Spoken document retrieval  Speech prosody  Story segmentation  Topic segmentation
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号