首页 | 本学科首页   官方微博 | 高级检索  
     

基于边界点词性特征统计的韵律短语切分
引用本文:牛正雨,柴佩琪.基于边界点词性特征统计的韵律短语切分[J].中文信息学报,2001,15(5):20-26.
作者姓名:牛正雨  柴佩琪
作者单位:同济大学计算机科学与工程系
摘    要:由于基于规则方法的文本处理系统在系统建立时需要总结大量的规则,而且很难保证它在处理大规模真实文本时的强壮性,因此本文在使用统计方法进行韵律短语切分方面做了一些有益的探索。先对文本进行自动分词和自动词性标注,然后利用从已经经过人工标注的语料库中得到的韵律短语切分点的边界模式以及概率信息,对文本中的韵律短语切分点进行自动预测,最后利用规则进行适当的纠错。通过对一千句的真实文本进行封闭和开放测试, 词性标注的正确率在95%左右,韵律短语切分的召回率在60%左右,正确率达到了80%。

关 键 词:韵律短语切分  自动词性标注  语料库  统计方法  
修稿时间:2001年1月15日

A Statistical Approach Based on Boundary POS Feature to Prosodic Phrasing
NIU Zheng,yu,CHAI Pei,qi.A Statistical Approach Based on Boundary POS Feature to Prosodic Phrasing[J].Journal of Chinese Information Processing,2001,15(5):20-26.
Authors:NIU Zheng  yu  CHAI Pei  qi
Affiliation:Department of Computer Science and Engineering ,Tongji University
Abstract:It is often difficult to construct a rule based parser and adapt it to largescale real text.So we tried a statistical approach to prosodic phrasing.At first the text was segmented into Chinese words,then word sequences are tagged automatically by POS tagger.The boundary pattern and boundary distribution probabilities are used in the algorithm to predict phrase breaks.The boundary distribution probabilities are derived from hand annotated corpus.The errors caused by statistical method are corrected by rules.Through close testing and open testing on about 1000 sentences,the correct POS tagging rate is about 95%,the recalling rate of prosodic phrasing is around 60%,and the correct rate of prosodic phrasing is about 80%.
Keywords:prosodic phrasing  part-of-speech tagging  corpus  statistical approach
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号