首页 | 本学科首页   官方微博 | 高级检索  
     

大样本多源域与小目标域的跨领域快速分类学习
引用本文:顾 鑫, 王士同. 大样本多源域与小目标域的跨领域快速分类学习[J]. 计算机研究与发展, 2014, 51(3): 519-535.
作者姓名:顾鑫  王士同
作者单位:江南大学数字媒体学院;江苏北方湖光光电有限责任公司;
基金项目:国家自然科学基金项目(61170122,61272210);江苏省自然科学基金项目(BK2011003);江苏省333高层次人才培养工程基金项目(BRA2011192)
摘    要:传统的跨领域分类学习一般考虑均衡的单一源域到单一目标域的学习,但在现实世界中数据往往是不平衡的.当用于解决不平衡分类问题时,由于分类器的偏向性,其分类精度、抗噪性能往往有不同程度的下降.为了克服域间不平衡性,提出了一种不平衡多源跨领域分类算法(imbalance multisource classfication on cross-domain learning, IMCCL),该算法依据被众多实验证明有效的“逻辑回归模型”与“后验概率最大法则”构建多个训练域分类器并综合指导目标域的数据分类.为了充分高效利用大样本的源域数据,满足大样本的快速运算,在结合CDdual算法的基础上,提出了IMCCL的快速算法(IMCCL-CDdual).将其应用到文本数据分类与图像识别分类的实验结果表明:该算法具有较高的识别率、快速的识别速度和抗干扰性和领域自适应性.

关 键 词:跨领域  多源  逻辑回归  后验概率  分类  不平衡

Fast Cross-Domain Classification Method for Large Multisources/Small Target Domains
Gu Xin, Wang Shitong. Fast Cross-Domain Classification Method for Large Multisources/Small Target Domains[J]. Journal of Computer Research and Development, 2014, 51(3): 519-535.
Authors:Gu Xin  Wang Shitong
Abstract:Most of current cross-domain classifiers are proposed for single source and single target domains and basically based on the assumption that there is a balance between these two domains. However, this assumption is often violated in the real world. When these classifiers are applied to imbalanced domains, their classification performance and robustness to noise will heavily degrade. For example, Baysian classifier depends heavily on the estimation of the sample distributions of source and target domains. When large source domain but only a small target domain are available, the classification accuracy of this classifier will degrade a lot. In order to address this imbalanced issue and use abundant data in the source domain to do an effective transfer learning between small target domain and multisource domains, a novel fast cross-domain classification method called IMCCL for “small-target+multisource” datasets is proposed here. The proposed method IMCCL is rooted at logistic regression model and MAP. Accordingly, the proposed IMCCL is integrated together with the latest advance—CDdual algorithm—to develop its fast version IMCCL-CDual for “small-target+large-multisource” domains. This fast classification method is also theoretically analyzed. Our experimental results on artificial and real datasets indicate the effectiveness of the proposed method IMCCL-CDual in classification accuracy, the classification speed, robustness and domain adaption.
Keywords:cross-domain  multi-source  logistic regression  posteriori probability  classification  unbalance
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号