首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于多桥映射的跨领域文本分类方法
引用本文:杨奇奇,张玉红,胡学钢.一种基于多桥映射的跨领域文本分类方法[J].计算机应用研究,2018,35(4).
作者姓名:杨奇奇  张玉红  胡学钢
作者单位:安徽省合肥市合肥工业大学计算机与信息学院,安徽省合肥市合肥工业大学计算机与信息学院,安徽省合肥市合肥工业大学计算机与信息学院
基金项目:国家自然科学基金资助项目
摘    要:摘要:跨领域分类旨在利用已标记的源领域信息来为概率分布不同,未标记的目标领域训练一个精确的分类器。已有工作大多以文本主题为特征表现形式,并基于共享主题来建立领域间独有主题的映射关系,从而达到跨领域学习的目的。然而,现实中领域间的连接可以是多角度的,而这种基于单一共享主题的映射方式,存在语义表示不完备和偏差性等问题,从而影响跨领域分类精度。基于此,提出一种基于多桥映射的跨领域分类方法,通过提取多重的共享主题和领域独有主题,并以多重共享主题为桥梁来建立领域独有主题之间的多重映射关系,从而实现跨领域的分类。在20Newsgroups和Reuters-21578数据集上的实验结果表明,和同类算法相比,所提算法在分类精度上具有优越性。

关 键 词:跨领域分类  多桥映射  主题  文本分类
收稿时间:2016/12/12 0:00:00
修稿时间:2018/2/26 0:00:00

A Cross-Domain Text Classification Approach based on Multi-Bridge Mapping
yangqiqi,zhangyuhong and huxuegang.A Cross-Domain Text Classification Approach based on Multi-Bridge Mapping[J].Application Research of Computers,2018,35(4).
Authors:yangqiqi  zhangyuhong and huxuegang
Affiliation:HeFei University of Technology, School of Computer Science and Information Engineering,,
Abstract:Abstract: Cross-domain text classification aims to exploit labeled data in one domain to train an accurate classification for another target domain, where the distribution is different form the source domain. To achieve the cross-domain learning, many existing works using topics as a new feature representation, usually build a mapping between the domain-specific topics using the shared topics as a bridge. However, the connection in domains can be multi-angle, those mapping methods based on the single shared topics have several weaknesses, which impact classification precision, for example, the semantic representation is incomplete and has the deviation. Motivated by this, a new approach based on multi-bridge mapping for cross-domain text classification is proposed. It first extracts both multi-layer shared and domain-specific topics, and then build multi-mapping between the domain-specific topics in different domains by using the bridge of multi-layer shared topics. Experimental results conducted on 20newsgroups and Reuters-21578 datasets demonstrate the effectiveness of the proposed approach..
Keywords:cross-domain classification  multi-bridge mapping  topics  text classification
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号