首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于近邻元分析的文本分类算法
引用本文:刘丛山,李祥宝,杨煜普.一种基于近邻元分析的文本分类算法[J].计算机工程,2012,38(15):139-141.
作者姓名:刘丛山  李祥宝  杨煜普
作者单位:上海交通大学自动化系系统控制与信息处理教育部重点实验室,上海,200240
基金项目:国家“863”计划基金资助项目“云制造服务平台关键技术”
摘    要:在近邻元分析(NCA)算法的基础上,提出K近邻元分析分类算法K-NCA。利用NCA算法完成对训练样本集的距离测度学习和降维,定义类偏斜因子,引入K近邻思想,得到测试样本的类条件概率估计,并通过该概率进行类别判定,实现文本分类器功能。实验结果表明,K-NCA算法的分类效果较好。

关 键 词:近邻元分析  距离测度学习  降维  K近邻  文本分类
收稿时间:2011-09-29

Text Classification Algorithm Based on Neighborhood Component Analysis
LIU Cong-shan , LI Xiang-bao , YANG Yu-pu.Text Classification Algorithm Based on Neighborhood Component Analysis[J].Computer Engineering,2012,38(15):139-141.
Authors:LIU Cong-shan  LI Xiang-bao  YANG Yu-pu
Affiliation:(Key Laboratory of System Control and Information Processing,Ministry of Education,Department of Automation,Shanghai Jiaotong University,Shanghai 200240,China)
Abstract:This paper proposes a novel algorithm named K-NCA based on Neighborhood Component Analysis(NCA).It uses NCA to learn a Mahalanobis distance measure and reduces the dimension of the input dataset.The algorithm defines a class imbalance factor and introduces K Nearest Neighbor(KNN) to compute the test sample’s class-conditional probability estimation.The sample’s class label is decided by this probability.A text classifier is designed to accomplish the algorithm.Experimental results show that K-NCA algorithm can improve the accuracy of text classification.
Keywords:Neighborhood Component Analysis(NCA)  distance metric learning  dimension reduction  K Nearest Neighbor(KNN)  text classification
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号