首页 | 官方网站   微博 | 高级检索  
     

以“的”字结构为核心的最长名词短语识别研究
引用本文:钱小飞.以“的”字结构为核心的最长名词短语识别研究[J].计算机工程与应用,2010,46(18):138-141.
作者姓名:钱小飞
作者单位:中国传媒大学,文学院,北京,100024
摘    要:以“的”字结构为核心的最长名词短语是汉语最长名词短语的一个特殊子类。以该短语的自动识别为基础重新分化了汉语MNP的识别任务。在考察其结构和分布特征的基础上,提出“先识别右边界,识别成果参与左边界识别”的策略,并使用边界分布概率模型分治了左右边界。实验基于85万字的新闻语料上进行训练,并在42万字的同质语料上进行了开放测试,取得了80.63%的正确率和75.68%的召回率。

关 键 词:最长名词短语  “的”字结构  识别  浅层句法分析
收稿时间:2008-12-23
修稿时间:2009-3-13  

Recognition of MNP with"De-Phrase"core
QIAN Xiao-fei.Recognition of MNP with"De-Phrase"core[J].Computer Engineering and Applications,2010,46(18):138-141.
Authors:QIAN Xiao-fei
Affiliation:QIAN Xiao-fei School of Chinese Language and Literature,Communication University of China,Beijing 100024,China
Abstract:The MNP with "De-Phrase" core is a special subclass of MNP.The identification of the phrase in this paper gives a new subsumption to the task of Chinese MNP recognition.The paper first analyzes the distribution and the structure feature of the phrase,then it advances a strategy of "Identify the right boundary first,then identify the left one".Furthermore,it adopts the method "Boundary Distribution Probability" to recognize the phrase.A corpus(about 0.85 million Chinese Characters) of news is used for the automatic identification training and anothe(rabout 0.42 million Chinese Characters) is used for test,and the experiment achieves 80.63% in precision and 75.68% in recall.
Keywords:Maximal Noun Phrase(MNP)  De-Phrase  identification  shallow parsing
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号