首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于EM非监督训II练的自组织分词歧义解决方案
引用本文:王伟,钟义信,孙建,杨力. 一种基于EM非监督训II练的自组织分词歧义解决方案[J]. 中文信息学报, 2001, 15(2): 38-44
作者姓名:王伟  钟义信  孙建  杨力
作者单位:北京邮电大学智能中心181,
基金项目:国家自然科学基金资助(6998201)
摘    要:
摘要本文旨在提供一种基于非监督训练的分词歧义解决方案和一种分词算法。基于EM的思想,每个句子所对应的所有(或一定范围内)的分词结果构成训练集,通过这个训练集和初始的语言模型可以估计出一个新的语言模型。最终的语言模型通过多次迭代而得到。通过一种基于该最终语言模型的统计分词算法,对于每个句子至少带有一个歧义的测试集的正确切分精度达到85.36%(以句子为单位)。

关 键 词:EM算法;分词歧义;非监督
修稿时间:2000-05-26

A Self-organized Schemefor Word Segmentation Ambiguity Resolution Based on EM Training Algorithm
WANG Wei,ZHONG Yi-xin,SUN Jian,YANG Li. A Self-organized Schemefor Word Segmentation Ambiguity Resolution Based on EM Training Algorithm[J]. Journal of Chinese Information Processing, 2001, 15(2): 38-44
Authors:WANG Wei  ZHONG Yi-xin  SUN Jian  YANG Li
Abstract:
This paper is mainly to present a word segmentation ambiguity resolution scheme based on unsupervised training. According to the idea of EM, a language model is built increasingly by collection the fractional counts of patterns (such as bigram pair)from the augmentations of all the segmentation candidates of a sentence. The learned language model is incorporated into a statistical segmentor. Experiments show that this scheme can resolve 85.36 96 ambiguity on test set each sentence of which has at least one ambiguous part(and the accuracy rate is based on sentence).
Keywords:
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号