一种基于向量夹角的k近邻多标记文本分类算法 An kNN Algorithm Based on Vector Angle for Multi-label Text Categorization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于向量夹角的k近邻多标记文本分类算法

引用本文：	广凯,潘金贵.一种基于向量夹角的k近邻多标记文本分类算法[J].计算机科学,2008,35(4):205-206.

作者姓名：	广凯潘金贵

作者单位：	南京大学计算机软件新技术国家重点实验室,南京,210093

摘要：	在多标记学习中,一个示例可以有多个概念标记.学习系统的目标是通过对由多标记样本组成的训练集进行学习,以尽可能正确地预测未知样本所对应的概念标记集.k近邻算法已被应用到多标记学习中,该算法将测试示例转化为多维向量,根据其k个近邻样本的标记向量来确定该测试示例的标记向量.传统的k近邻算法是基于向量的空间距离来选取近邻,而在自然语言处理中,文本间的相似度常用文本向量的夹角来表示,所以本文将文本向量间的夹角关系作为选取k近邻的标准并结合k近邻算法提出了一种多标记文本学习算法.实验表明,该算法在文档分类的准确率上体现出较好的性能.
关键词：	机器学习多标记学习文本分类
An kNN Algorithm Based on Vector Angle for Multi-label Text Categorization

GUANG Kai,PAN Jin-Gui.An kNN Algorithm Based on Vector Angle for Multi-label Text Categorization[J].Computer Science,2008,35(4):205-206.

Authors:	GUANG Kai PAN Jin-Gui

Affiliation:	GUANG Kai PAN Jin-Gui(State Key Lab For Novel Software Technology,Department of Computer Science , Technology,Nanjing University,Nanjing 210093)

Abstract:	In multi-label learning, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance. k nearest neighbors (kNN) algorithm is recently applied to multi-label categorization. In detail, each instance is transformed into a vector and the label vector of the test instance is determined by its k nearest neighbors, which are chosen by the Euclidean distance of a couple of vectors. In this paper, a multi-label l...

Keywords:	Machine learning Multi-label learning Text categorization
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏