首页 | 本学科首页   官方微博 | 高级检索  
     

基于Siamese循环神经网络的泰文句子切分方法
引用本文:线岩团,张志菊,王红斌,文永华. 基于Siamese循环神经网络的泰文句子切分方法[J]. 计算机工程与科学, 2021, 43(12): 2238-2242. DOI: 10.3969/j.issn.1007-130X.2021.12.018
作者姓名:线岩团  张志菊  王红斌  文永华
作者单位:(1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;2.昆明理工大学云南省人工智能重点实验室, 云南 昆明 650500)
基金项目:国家自然科学基金(61363044,61462054)
摘    要:泰文很少运用标点符号,句子间没有明显的分隔符,需要根据语义进行断句,为泰文词法分析、句法分析和机器翻译等自然语言处理任务带来了额外的困难。针对泰文断句问题提出一种基于Siamese循环神经网络的句子自动切分方法。相比传统泰文断句方法,该方法无需人工定义特征,而是采用统一的循环神经网络分别对候选断句点前后的词序列进行编码;然后,通过综合前后词序列的编码向量作为特征来构建泰文句子切分模型。在ORCHID泰文语料上的实验结果表明,所提出的方法优于传统泰文句子切分方法。

关 键 词:泰文  句子切分  循环神经网络  
收稿时间:2020-07-28
修稿时间:2020-11-04

Thai sentence segmentation based on Siamese recurrent neural network
XIAN Yan-tuan,ZHANG Zhi-ju,WANG Hong-bin,WEN Yong-hua. Thai sentence segmentation based on Siamese recurrent neural network[J]. Computer Engineering & Science, 2021, 43(12): 2238-2242. DOI: 10.3969/j.issn.1007-130X.2021.12.018
Authors:XIAN Yan-tuan  ZHANG Zhi-ju  WANG Hong-bin  WEN Yong-hua
Affiliation:(1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
Abstract:Thai rarely use punctuation, and there are no obvious separators between sentences. Sentences need to be segmented by semantics, which brings extra difficulties to natural language processing tasks such as lexical analysis, syntactic analysis and machine translation. This paper proposes a sentence segmentation method based on dual-path neural network. Compared with the traditional Thai sentence segmentation method, this method does not need to define the feature manually, but uses a unified circular neural network to encode the sequence of words before and after the candidate interval. Then, the coding vector of the sequence before and after the sequence is used as the feature to construct the Thai segmentation classification model. Experimental results on the Orchid97 Thai corpus show that the proposed method is superior to the traditional Thai sentence segmentation method.
Keywords:Thai language  sentence segmentation  recurrent neural network     
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号