汉语文语转换系统中停顿指数的自动标注 Assigning Break Indices for Unrestricted Texts in Mandarin Text to Speech System期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

汉语文语转换系统中停顿指数的自动标注

引用本文：	赵永贞,刘挺,王志伟,陈惠鹏,邵艳秋.汉语文语转换系统中停顿指数的自动标注[J].中文信息学报,2004,18(5):49-56.

作者姓名：	赵永贞刘挺王志伟陈惠鹏邵艳秋

作者单位：	哈尔滨工业大学计算机学院信息检索研究室;哈尔滨工业大学计算机学院语音处理研究室

摘要：	本文采用了一个基于C-TOBI的停顿指数标注的语料库,利用有指导的学习方法对自动停顿指数标注方面做了一些有益的探索。本文共实现了三种方法:基本的马尔科夫模型,引入了词长信息的马尔科夫模型,引入词长信息的马尔科夫模型结合基于转换的错误驱动的学习方法。然后通过对3000句的真实文本进行开放测试,以基本的马尔科夫模型的结果作为基准,实验结果不断改进,最终达到了78.6%的准确率,错误代价降低了14.5%。
关键词：	计算机应用中文信息处理文语转换停顿指数马尔科夫模型基于转换的错误驱动的学习
文章编号：	1003-0077(2004)05-0048-08
修稿时间：	2004年3月13日
Assigning Break Indices for Unrestricted Texts in Mandarin Text to Speech System

ZHAO Yong-zhen,LIU Ting,WANG Zhi-wei,CHEN Hui-peng,SHAO Yan-qiu.Assigning Break Indices for Unrestricted Texts in Mandarin Text to Speech System[J].Journal of Chinese Information Processing,2004,18(5):49-56.

Authors:	ZHAO Yong-zhen LIU Ting WANG Zhi-wei CHEN Hui-peng SHAO Yan-qiu

Affiliation:	Information Retrieval Laboratory , Department of computer , HIT ; Speech Processing Laboratory , Department of computer , HIT

Abstract:	This paper uses a corpus with break indices based on C-TOBI. Applying supervised learning method, some useful attempts are made in the field of automatic break indices intonation. Three approaches, namely, the basic Markov model approach, the Markov model using word length approach, and the Markov model using word length combining transformation-based error-driven learning approach, are presented. After implementing these three approaches, open tests are made on a corpus of 3,000 sentences. The performances are getting better and the last approach produces the highest accuracy, 78.5%, and results in 14.5% decrease in error-cost taking the result of Markov model as baseline.

Keywords:	computer application Chinese information processing text to speech break indices Markov model transformation-based error-driven learning
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏