首页 | 本学科首页   官方微博 | 高级检索  
     

基于句法树的藏语最长名词短语识别
引用本文:龙从军,刘汇丹,周毛克.基于句法树的藏语最长名词短语识别[J].中文信息学报,2019,33(2):59-66.
作者姓名:龙从军  刘汇丹  周毛克
作者单位:1.中国社会科学院 民族学与人类学研究所,北京 100081;
2.中国科学院软件研究所,北京 100190;
3.中国社会科学院大学,北京 102488
基金项目:国家语委科研项目(ZDI135-17)
摘    要:最长名词短语携带着丰富的句法和语义信息,经常与句法成分对应,在句子中充当一定的语义角色。最长名词短语识别在自然语言处理中占重要地位,是分析和理解句子结构、意义的基础。该文通过梳理不同概念的最长名词短语的含义,从句法树角度界定了藏语最长名词短语的基本概念;从句法树库中抽取6 038个句子,分析了最长名词短语的结构类型、边界特征和出现频次,最后采用序列标注模型和句法分析模型对最长名词短语进行识别。序列标注模型识别结果的正确率、召回率和F1值分别为87.14%、84.72%、85.92%。句法分析模型识别结果的正确率、召回率、F1值分别为85.02%、84.51%、84.76%。

关 键 词:藏语句法树  最长名词短语  名词短语类型  

Longest Noun Phrases Detection in Tibetan
LONG Congjun,LIU Huidan,ZHOU Maoke.Longest Noun Phrases Detection in Tibetan[J].Journal of Chinese Information Processing,2019,33(2):59-66.
Authors:LONG Congjun  LIU Huidan  ZHOU Maoke
Affiliation:1.Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences, Beijing 100081, China;
2.Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
3.University of Chinese Academy of Social Sciences, Beijing 102488, China
Abstract:The longest noun phrases carry abundant syntactic and semantic information, corresponding to a syntactic components for most cased. By comparing the essence of the different longest noun phrases, this paper defines the longest noun phrase in Tibetan language from the perspective of syntactic tree. Total of 6 038 sentences are extracted from a Tibetan treebank, and the structure type, boundary feature and frequency of longest noun phrases are analyzed. Two approaches, the sequence annotation model and the parsing algorithm, are investigated to detect the longest noun phrases in Tibetan. Experiments proves the better performance of the sequence labeling approach, yielding 87.14% precision, 84.72% recall and 85.92% F-value respectively.
Keywords:Tibetan syntax tree  the longest noun phrase  type of noun phrase  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号