首页 | 本学科首页   官方微博 | 高级检索  
     

基于马尔科夫逻辑网的中文专利最大名词短语识别
引用本文:蔡东风,赵奇猛,饶 齐,王裴岩. 基于马尔科夫逻辑网的中文专利最大名词短语识别[J]. 中文信息学报, 2016, 30(4): 21-28
作者姓名:蔡东风  赵奇猛  饶 齐  王裴岩
作者单位:沈阳航空航天大学 知识工程研究中心,辽宁 沈阳 110136
基金项目:国家“十二五”科技支撑计划项目(2012BAH14F00);国家自然科学基金(61073123)
摘    要:缺少标注语料和难以识别动词和名词类是阻碍中文专利最大名词短语识别的主要问题。针对上述问题,该文提出了一种基于马尔科夫逻辑网的中文最大名词短语识别方法。该方法避免对开放类的名词短语的识别,而将主要精力放在了相对封闭的分隔符的识别上,利用句子自身特征、领域迁移特征以及双语对齐特征来识别最大名词短语的边界。结果说明,双语信息较好地促进了动词、介词、连词等MNP边界的识别。MNP识别的F值可达83.27%。

关 键 词:最大名词短语  马尔科夫逻辑网  中文专利  

Chinese Patents Maximal-length Noun Phrases Identification Using Markov Logic
CAI Dongfeng,ZHAO Qimeng,RAO Qi,WANG Peiyan. Chinese Patents Maximal-length Noun Phrases Identification Using Markov Logic[J]. Journal of Chinese Information Processing, 2016, 30(4): 21-28
Authors:CAI Dongfeng  ZHAO Qimeng  RAO Qi  WANG Peiyan
Affiliation:Knowledge Engineering Research Center, Shenyang Aerospace University, Shenyang, Liaoning 110136, China
Abstract:The main problems that limited the development of Maximal-length Noun Phrases recognition on Chinese patent literatures are the lack of annotated corpus and the difficulty of recognizing verbs and nouns. This paper presents a new Markov Logic approach to maximal-length noun phrases identification from Chinese patents. Instead of recognizing various of noun phrases, the approach focuses on the identification of MNPs boundary markers. To recognize Chinese patents MNPs, three categories of features, i.e. word features from sentences, transfer features from TreeBanks and bilingual features from patents abstractions, are employed. The experiment results show that bilingual features can bring a notable improvement on identification of MNP boundary markers such as verbs, prepositions and conjunctions. And the F-score on MNP identification reaches 83.27%.
Keywords:MNP  MLN  Chinese patent  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号