首页 | 本学科首页   官方微博 | 高级检索  
     

基于主题模型的多示例多标记学习方法
引用本文:严考碧,李志欣,张灿龙. 基于主题模型的多示例多标记学习方法[J]. 计算机应用, 2015, 35(8): 2233-2237. DOI: 10.11772/j.issn.1001-9081.2015.08.2237
作者姓名:严考碧  李志欣  张灿龙
作者单位:1. 广西师范大学 广西多源信息挖掘与安全重点实验室, 广西 桂林 541004;2. 广西信息科学实验中心, 广西 桂林 541004
基金项目:国家自然科学基金资助项目(61165009,61262005,61363035,61365009);国家973计划项目(2012CB326403);广西自然科学基金资助项目(2012GXNSFAA053219,2013GXNSFAA019345,2014GXNSFAA118368)。
摘    要:针对现有的大部分多示例多标记(MIML)算法都没有考虑如何更好地表示对象特征这一问题,将概率潜在语义分析(PLSA)模型和神经网络(NN)相结合,提出了基于主题模型的多示例多标记学习方法。算法通过概率潜在语义分析模型学习到所有训练样本的潜在主题分布,该过程是一个特征学习的过程,用于学习到更好的特征表达,用学习到的每个样本的潜在主题分布作为输入来训练神经网络。当给定一个测试样本时,学习测试样本的潜在主题分布,将学习到的潜在主题分布输入到训练好的神经网络中,从而得到测试样本的标记集合。与两种经典的基于分解策略的多示例多标记算法相比,实验结果表明提出的新方法在现实世界中的两种多示例多标记学习任务中具有更优越的性能。

关 键 词:主题模型  特征表达  多示例多标记学习  场景分类  文本分类  
收稿时间:2015-03-27
修稿时间:2015-05-30

Multi-instance multi-label learning method based on topic model
YAN Kaobi,LI Zhixin,ZHANG Canlong. Multi-instance multi-label learning method based on topic model[J]. Journal of Computer Applications, 2015, 35(8): 2233-2237. DOI: 10.11772/j.issn.1001-9081.2015.08.2237
Authors:YAN Kaobi  LI Zhixin  ZHANG Canlong
Affiliation:1. Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin Guangxi 541004, China;
2. Guangxi Experiment Center of Information Science, Guilin Guangxi 541004, China
Abstract:Concerning that most of the current methods for Multi-Instance Multi-Label (MIML) problem do not consider how to represent features of objects in an even better way, a new MIML approach combined with Probabilistic Latent Semantic Analysis (PLSA) model and Neural Network (NN) was proposed based on topic model. The proposed algorithm learned the latent topic allocation of all the training examples by using the PLSA model. The above process was equivalent to the feature learning for getting a better feature expression. Then it utilized the latent topic allocation of each training example to train the neural network. When a test example was given, the proposed algorithm learned its latent topic distribution, then regarded the learned latent topic allocation of the test example as an input of the trained neural network to get the multiple labels of the test example. The experimental results on comparison with two classical algorithms based on decomposition strategy show that the proposed method has superior performance on two real-world MIML tasks.
Keywords:topic model   feature expression   multi-instance multi-label learning   scene classification   text categorization
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号