首页 | 本学科首页   官方微博 | 高级检索  
     

基于FCM的文本迁移学习算法
引用本文:田宏泽,古平.基于FCM的文本迁移学习算法[J].计算机应用研究,2018,35(7).
作者姓名:田宏泽  古平
作者单位:重庆大学 计算机学院,重庆大学 计算机学院
基金项目:中央高校基本科研基金资助项目(106112013CDJZR180014)
摘    要:传统的机器学习方法是在训练数据和测试数据分布一致的前提下进行的。然而,在一些现实世界中的应用,训练数据和测试数据来自不同的领域。在不考虑数据分布的情况下,传统的机器学习算法可能会失效,针对这一问题,提出一种基于模糊C均值(FCM)的文本迁移学习算法。首先,通过简单分类器对测试样本分类,接着,利用自然邻算法构建样本初始模糊隶属度;然后,利用FCM算法通过迭代更新样本模糊隶属度,修正样本标签;最后,对样本孤立点进行处理,得到最终分类结果。实验结果表明,该算法具有较好的正确率,有效的解决了在训练数据和测试数据分布不一致的情况下的文本分类问题。

关 键 词:模糊C-均值  自然邻  迁移学习  孤立点
收稿时间:2017/3/8 0:00:00
修稿时间:2018/5/25 0:00:00

Text classification algorithm for transfer learning based on FCM
Hongze Tian and Ping Gu.Text classification algorithm for transfer learning based on FCM[J].Application Research of Computers,2018,35(7).
Authors:Hongze Tian and Ping Gu
Affiliation:College of Computer Science,Chongqing University,
Abstract:The traditional machine learning methods work under the assumption that the training data and test data are in the same distribution. However, in some real-world applications, training data and test data come from different domains. The traditional learning methods may fail without considering the shift of the data distribution. This paper proposed a text classification algorithm for transfer learning based on Fuzzy C-Means to solve this problem. First, it classified the test data with a simple classifier. Second, initialized the fuzzy membership degree of each data based on Natural Nearest Neighbor algorithm. Then, updated the fuzzy membership degree based on FCM and refined the labels of test data. Finally, classified the outliers in test data. In the experiment, 20 newsgroups data set and SRAA data set are used to evaluate our algorithm. The results indicate that our algorithm make a great improvement in classification accuracy.
Keywords:Fuzzy C-Means  Natural Nearest Neighbor  Transfer Learning  Outliers
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号