首页 | 本学科首页   官方微博 | 高级检索  
     

基于树库转换的藏语依存句法树库构建方法
引用本文:周毛克,龙从军,赵小兵,李林霞.基于树库转换的藏语依存句法树库构建方法[J].中文信息学报,2022,36(7):77-85+97.
作者姓名:周毛克  龙从军  赵小兵  李林霞
作者单位:1.中央民族大学 中国少数民族语言文学学院,北京 100081;
2.国家语言资源监测与研究少数民族语言中心,北京 100081;
3.中国社会科学院 民族学与人类学研究所,北京 100081;
4.中央民族大学 信息工程学院,北京 100081
基金项目:国家语委中心项目(ZDI135-98);中央民族大学研究生科研实践项目(BZKY2022073)
摘    要:构建藏语依存树库是实现藏语句法分析的重要基础,对藏语本体研究和信息处理具有重要价值。基于此,该文提出了一种基于树库转换的藏语依存树库构建方法。该方法首先扩充了前期构建的藏语短语结构树库,然后根据藏语短语结构树和依存树的特征设计树库转换规则,实现藏语短语结构树到依存结构树的初步转换,最后对自动转换结果进行人工校验,得到了2.2万句藏语依存树。为了对转换结果做出量化评价,该文抽取了依存树库中5%的依存树,对其依存关系进行校验和统计,最终依存关系的准确率达到89.36%,中心词的准确率达到92.09%。此外,该文使用基于神经网络的句法分析模型验证了依存树库的有效性。在该模型上,UAS值和LAS值分别达到83.62%和81.90%。研究证明,使用半自动的树库转换方法能够有效地完成藏语依存树库构建工作。

关 键 词:藏语  依存树库  树库转换

Construction of the Tibetan Dependency Treebank Based on Treebank Conversion
ZHOU Maoke,LONG Congjun,ZHAO Xiaobing,LI Linxia.Construction of the Tibetan Dependency Treebank Based on Treebank Conversion[J].Journal of Chinese Information Processing,2022,36(7):77-85+97.
Authors:ZHOU Maoke  LONG Congjun  ZHAO Xiaobing  LI Linxia
Affiliation:1.School of Chinese Ethnic Minority Languages and Literatures, Minzu University of China, Beijing 100081, China;
2.National Language Resource Monitoring and Research Center of Minority Languages, Beijing 100081, China;
3.Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences, Beijing 100081, China;
4.School of Information Engineering, Minzu University of China, Beijing 100081, China
Abstract:The construction of the Tibetan Dependency Treebank is a fundamental task for subsequent technology development. This paper proposes a method for constructing a Tibetan Dependency Treebank based on treebank conversion. First, the existing Tibetan Phrase Structure Treebank is expanded. Then, treebank conversion rules are designed based on the characteristics of the Tibetan phrase structure tree and the dependency tree. Finally, the automatic conversion result is proofread manually, achieving 22,000 Tibetan dependency trees. This paper extracts 5% of the sentences in the dependency treebank, and the accuracy rate of the dependency relationship of the final sample reached 89.36%, and the head word reached 92.09%. A neural network-based dependency parsing model trained by the treebank achieves 83.62% UAS and 81.90% LAS.
Keywords:Tibetan  dependency treebank  treebank conversion  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号