基于MapReduce的JP算法设计与实现 Design and Implementation of JP Algorithm Based on MapReduce期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于MapReduce的JP算法设计与实现

引用本文：	曹泽文,周姚.基于MapReduce的JP算法设计与实现[J].计算机工程,2012,38(24):14-16.

作者姓名：	曹泽文周姚

作者单位：	国防科学技术大学信息系统与管理学院,长沙,410073

摘要：	针对大规模文本聚类分析所面临的海量、高维、稀疏等难题，提出一种基于云计算的海量文本聚类解决方案。选择经典聚类算法Jarvis-Patrick(JP)作为案例，采用云计算平台的MapReduce编程模型对JP聚类算法进行并行化改造，利用搜狗实验室提供的语料库在 Hadoop平台上进行实验验证。实验结果表明，JP算法并行化改造可行，且相对于单节点环境，该算法在处理大规模文本数据时具有更好的时间性能。
关键词：	文本挖掘聚类分析文本聚类海量数据云计算并行数据挖掘
收稿时间：	2012-04-16
修稿时间：	2012-06-14
Design and Implementation of JP Algorithm Based on MapReduce

CAO Ze-wen , ZHOU Yao.Design and Implementation of JP Algorithm Based on MapReduce[J].Computer Engineering,2012,38(24):14-16.

Authors:	CAO Ze-wen ZHOU Yao

Affiliation:	(College of Information System and Management, National University of Defense Technology, Changsha 410073, China)

Abstract:	This paper analyzes the prevalent problems such as massiveness, high-dimension and sparse of feature vector of the ordinary algori- thms in clustering textual data, then proposes a massive text clustering based on cloud computing technology as a feasible solution. The classical Jarvis-Patrick(JP) algorithm is chosen as a case. It is implemented using MapReduce programming mode and is testified on the cloud computing platform-Hadoop with Sogou corpus provided by Sogou laboratory. Experimental results indicate that the JP algorithm can be paralleled in MapReduce framework and paralled algorithm can handle massive textual data and get a better time performance than single-node environment.

Keywords:	text mining clustering analysis text clustering massive data cloud computing parallel data mining
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏