首页 | 本学科首页   官方微博 | 高级检索  
     

结构化学习的噪声可学习性分析及其应用
引用本文:于墨,赵铁军,胡鹏龙,郑德权.结构化学习的噪声可学习性分析及其应用[J].软件学报,2013,24(10):2340-2353.
作者姓名:于墨  赵铁军  胡鹏龙  郑德权
作者单位:哈尔滨工业大学 计算机科学与技术学院 语言语音教育部-微软重点实验室, 黑龙江 哈尔滨 150001;哈尔滨工业大学 计算机科学与技术学院 语言语音教育部-微软重点实验室, 黑龙江 哈尔滨 150001;哈尔滨工业大学 计算机科学与技术学院 语言语音教育部-微软重点实验室, 黑龙江 哈尔滨 150001;哈尔滨工业大学 计算机科学与技术学院 语言语音教育部-微软重点实验室, 黑龙江 哈尔滨 150001
基金项目:国家自然科学基金(61173073); 国家高技术研究发展计划(863)(2011AA01A207)
摘    要:噪声可学习性理论指出,有监督学习方法的性能会受到训练样本标记噪声的严重影响.然而,已有相关理论研究仅针对二类分类问题.致力于探究结构化学习问题受噪声影响的规律性.首先,注意到在结构化学习问题中,标注数据的噪声会在训练过程中被放大,使得训练过程中标记样本的噪声率高于标记样本的错误率.传统的噪声可学习性理论并未考虑结构化学习中的这一现象,从而低估了问题的复杂性.从结构化学习问题的噪声放大现象出发,提出了新的结构化学习问题的噪声可学习性理论.在此基础上,提出了有效训练数据规模的概念,这一指标可用于在实践中描述噪声学习问题的数据质量,并进一步分析了实际应用中的结构化学习模型在高噪声环境下向低阶模型回退的情况.实验结果证明了该理论的正确性及其在跨语言映射和协同训练方法中的应用价值和指导意义.

关 键 词:结构化学习  噪声PAC  可学习性  词性标注  自然语言处理  协同训练  跨语言映射  半监督学习
收稿时间:2012/6/11 0:00:00
修稿时间:2/4/2013 12:00:00 AM

Theoretical Analysis on Structured Learning with Noisy Data and its Applications
YU Mo,ZHAO Tie-Jun,HU Peng-Long and ZHENG De-Quan.Theoretical Analysis on Structured Learning with Noisy Data and its Applications[J].Journal of Software,2013,24(10):2340-2353.
Authors:YU Mo  ZHAO Tie-Jun  HU Peng-Long and ZHENG De-Quan
Affiliation:MOE-MS Key Laboratory of Natural Language Processing and Speech, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;MOE-MS Key Laboratory of Natural Language Processing and Speech, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;MOE-MS Key Laboratory of Natural Language Processing and Speech, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;MOE-MS Key Laboratory of Natural Language Processing and Speech, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:Performance of supervised machine learning can be badly affected by noises of labeled data, as indicated by existing well studied theories on learning with noisy data. However these theories only focus on two-class classification problems. This paper studies the relation between noise examples and their effects on structured learning. Firstly, the paper founds that noise of labeled data increases in structured learning problems, leading to a higher noise rate in training procedure than on labeled data. Existing theories do not consider noise increament in structured learning, thus underestimate the complexities of learning problems. This paper provides a new theory on learning from noise data with structured predictions. Based on the theory, the concept of "effective size of training data" is proposed to describe the qualities of noisy training data sets in practice. The paper also analyzes the situations when structured learning models will go back to lower order ones in applications. Experimental results are given to confirm the correctness of these theories as well as their practical values on cross-lingual projection and co-training.
Keywords:structured learning  PAC learning with noise  pos-tagging  natural language processing  co-training  cross-lingual projection  semi-supervised learning
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号