首页 | 本学科首页   官方微博 | 高级检索  
     

汉语语音文档检索中后验概率的索引方法
引用本文:郑铁然,韩纪庆.汉语语音文档检索中后验概率的索引方法[J].哈尔滨工业大学学报,2009(8):97-102.
作者姓名:郑铁然  韩纪庆
作者单位:哈尔滨工业大学计算机科学与技术学院
基金项目:国家重点基础研究发展计划资助项目(2007CB311100);国家高技术研究发展计划资助项目(2006AA01Z197)
摘    要:基于音节Lattice形式的语音识别结果来实现汉语语音文档检索,不但可以成功规避词表外词问题,而且Lattice这种多候选形式也能有效补偿识别错误对检索性能的影响.在基于音节Lattice的汉语语音文档检索研究中,针对已有索引方法的不足,提出了一种基于后验概率的索引方法,对向量空间模型进行改进,以音节和K步邻接音节对作为索引项,以它们在语音文档中的后验概率值作为索引项权重.检索实验表明,文中的方法更适用于基于音节Lattice的语音文档检索任务,各项改进都达到了预期效果.

关 键 词:汉语语音文档检索  音节Lattice  K步邻接音节对  后验概率  改进的向量空间模型

Posterior probability based indexing method for Chinese spoken document retrieval
ZHENG Tie-ran,HAN Ji-qing.Posterior probability based indexing method for Chinese spoken document retrieval[J].Journal of Harbin Institute of Technology,2009(8):97-102.
Authors:ZHENG Tie-ran  HAN Ji-qing
Affiliation:(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
Abstract:Syllable lattice based Chinese speech retrieval methods can avoid the problem of out of vocabulary (OOV) words and compensate the retrieval performance loss resulted by recognition error. For absence of effective indexing method in lattice based retrieval approaches,a posterior probability based indexing method is proposed in this paper,which introduces syllables and K step neighbor syllable pairs as index items and takes the posterior probability as weighted value for an improved vector space model. It is proven by a series of retrieval experiments that our method is more suitable for lattice based spoken document retrieval tasks and the improvement accomplishes its anticipated purposes.
Keywords:Chinese spoken document retrieval  syllable lattice  K step neighbor syllable pairs posterior probability  improved vector space model
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号