基于Hadoop平台的一种改进K-means文本聚类算法 An Improved Algorithm for Text Clustering Based on Hadoop Platform期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Hadoop平台的一种改进K-means文本聚类算法

引用本文：	潘俊辉,王辉,张强,王浩畅.基于Hadoop平台的一种改进K-means文本聚类算法[J].微型电脑应用,2022(1).

作者姓名：	潘俊辉王辉张强王浩畅

作者单位：	东北石油大学计算机与信息技术学院

基金项目：	国家自然科学基金(61702093);东北石油大学青年科学基金(2020QNL-02)。

摘要：	K-means算法是进行文本聚类时使用最为广泛的一种推荐算法之一。该算法在进行文本聚类时每个属性的作用是同等的,而实际中每个属性对文本的影响是不同的,导致聚类效果受到影响。针对该缺点,通过引入属性权重提出了一种改进的K-means聚类算法,并在Hadoop平台加以实现,以更好体现改进算法的效率。通过实验进行了测试,表明所提出的改进算法在效率和精度方面均有所提高。
关键词：	K-MEANS 文本聚类属性权重 HADOOP
An Improved Algorithm for Text Clustering Based on Hadoop Platform

PAN Junhui,WANG Hui,ZHANG Qiang,WANG Haochang.An Improved Algorithm for Text Clustering Based on Hadoop Platform[J].Microcomputer Applications,2022(1).

Authors:	PAN Junhui WANG Hui ZHANG Qiang WANG Haochang

Affiliation:	(School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China)

Abstract:	The K-means algorithm is one of the most widely used recommendation algorithms in text clustering.In this algorithm,each attribute has the same effect in text clustering,while in practice,each attribute may have different and equal effect on text,which results in the clustering effect being affected.Aiming at this shortcoming,an improved K-means clustering algorithm is proposed by introducing attribute weight and implemented in Hadoop platform which may bo better to reflect the efficiency of the improved algorithm.The experimental results show that the efficiency and accuracy of the improved algorithm are increased.

Keywords:	K-means text clustering attribute weights Hadoop
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏