基于概念的网页相似度处理算法研究 Concept based algorithm of dealing near-replicas of documents on the Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于概念的网页相似度处理算法研究

引用本文：	郭晨娟,李战怀.基于概念的网页相似度处理算法研究[J].计算机应用,2006,26(12):3030-3032.

作者姓名：	郭晨娟李战怀

作者单位：	西北工业大学,计算机学院,陕西,西安,710072

基金项目：	西北工业大学校科研和教改项目

摘要：	针对海量网页信息，提出适于搜索引擎使用的网页相似度处理算法。算法依据网页抽象形成的概念，在倒排文档基础上建立相似度处理模型。该模型缩小了需要进行相似度计算的网页文档范围，节约大量时间和空间资源，为优化相似度计算奠定了良好基础。
关键词：	相似网页概念抽取聚类分析消重
文章编号：	1001-9081（2006）12-3030-03
收稿时间：	2006-06-21
修稿时间：	2006-06-212006-08-27
Concept based algorithm of dealing near-replicas of documents on the Web

GUO Chen-juan,LI Zhan-huai.Concept based algorithm of dealing near-replicas of documents on the Web[J].journal of Computer Applications,2006,26(12):3030-3032.

Authors:	GUO Chen-juan LI Zhan-huai

Abstract:	To solve near-replicas of documents on the Web obtained by search engine, a similarity dealing algorithm was proposed. Based on concepts extracted from the Web pages and inverted file, the algorithm built a model which shrank the scale of the Web pages processed. The algorithm saved a great deal of temporal and spatial resources and provides a good foundation for near-replicas detection.

Keywords:	near-repllcas documents concept extraction cluster analysis near-replicas detection
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏