基于双语协同训练的最大名词短语识别研究 Title Recognition of Maximal-Length Noun Phrase Based on Bilingual Co-Training期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双语协同训练的最大名词短语识别研究

引用本文：	李业刚,黄河燕,史树敏,鉴萍,苏超.基于双语协同训练的最大名词短语识别研究[J].软件学报,2015,26(7):1615-1625.

作者姓名：	李业刚黄河燕史树敏鉴萍苏超

作者单位：	北京理工大学北京市海量语言信息处理与云计算应用工程技术研究中心, 北京 100081;北京理工大学计算机学院, 北京 100081;山东理工大学计算机科学与技术学院, 山东淄博 255049,北京理工大学北京市海量语言信息处理与云计算应用工程技术研究中心, 北京 100081;北京理工大学计算机学院, 北京 100081,北京理工大学北京市海量语言信息处理与云计算应用工程技术研究中心, 北京 100081;北京理工大学计算机学院, 北京 100081,北京理工大学北京市海量语言信息处理与云计算应用工程技术研究中心, 北京 100081;北京理工大学计算机学院, 北京 100081,北京理工大学北京市海量语言信息处理与云计算应用工程技术研究中心, 北京 100081;北京理工大学计算机学院, 北京 100081

基金项目：	国家重点基础研究发展计划(973)(2013CB329300); 国家自然科学基金(61132009, 61201352, 61202244)

摘要：	针对传统方法对双语最大名词短语识别一致性差以及跨领域识别能力弱的缺点,提出一种基于半监督学习的双语最大名词短语识别算法.利用汉英最大名词短语的互译性和识别的互补性,把平行的汉语句子和英语句子这两个数据集看作一个数据集的两个不同的视图进行双语协同训练.在协同训练中,把双语对齐标注一致率作为标记置信度估计依据,进行增量标记数据的选择.实验结果表明:该算法显著提高了双语最大名词短语的识别能力,在跨领域测试和同领域测试中,F值分别比目前最好的最大名词短语识别模型提高了4.52%和3.08%.
关键词：	最大名词短语半监督学习标注投射双语协同训练短语识别
收稿时间：	2014/2/23 0:00:00
修稿时间：	2014/5/21 0:00:00
Title Recognition of Maximal-Length Noun Phrase Based on Bilingual Co-Training

LI Ye-Gang,HUANG He-Yan,SHI Shu-Min,JIAN Ping and SU Chao.Title Recognition of Maximal-Length Noun Phrase Based on Bilingual Co-Training[J].Journal of Software,2015,26(7):1615-1625.

Authors:	LI Ye-Gang HUANG He-Yan SHI Shu-Min JIAN Ping and SU Chao

Affiliation:	Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;College of Computer Science and Technology, Shandong University of Technology, Zibo 255049, China,Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China,Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China,Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China and Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China

Abstract:	This article focuses on the problem of weak cross-domain ability on bilingual maximal-length noun phrase recognition. A bilingual noun phrase recognition algorithm based on semi-supervised learning is proposed. The approach can make full use of both the English features and the Chinese features in a unified framework, and it regards the two language corpus as different view of one dataset. Instances with the highest confidence score are selected and merged, and then added to the labeled data set to train the classifier. Experimental results on test sets show the effectiveness of the proposed approach which outperforms 4.52% over the baseline in cross-domain, and 3.08% over the baseline in similar domain.

Keywords:	maximal-length noun phrase semi-supervised learning label projection bilingual co-training phrase identification

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏