首页 | 本学科首页   官方微博 | 高级检索  
     

基于搜索结果的聚类算法
引用本文:罗钊航,李旭伟.基于搜索结果的聚类算法[J].计算机与现代化,2012(11):35-38.
作者姓名:罗钊航  李旭伟
作者单位:四川大学计算机学院,四川成都610065
摘    要:当前的搜索引擎中,存在大量的冗余搜索结果,且不能对搜索结果进行指导分类。本文提出一种基于密度的聚类算法,能够有效地对搜索结果进行聚类优化和分类。该算法选取搜索结果中权重高于一定值的网页,提取网页的特征值与候选关键字,标注特征范围,再进行网页相似度比较,最大限度地消除冗余网页,并根据网页的候选关键字提供分类,从而提高搜索结果的精准性和满意度,达到更智能的效果。

关 键 词:基于密度的聚类算法  网页相似度  聚类  冗余网页

Optimization of Search Results Based on Clustering Algorithm
LUO Zhao-hang,LI Xu-wei.Optimization of Search Results Based on Clustering Algorithm[J].Computer and Modernization,2012(11):35-38.
Authors:LUO Zhao-hang  LI Xu-wei
Affiliation:(College of Computer Science,Sichuan University,Chengdu 610065,China)
Abstract:Nowadays there are many redundancy pages in results of search engine,and the results are not classified.An optimization algorithm of webpage search results based on an improved DBSCAN(density-based spatial clustering of applications with noise) algorithm is proposed and effective to cluster and classify the results.The algorithm selects the webpages with search weights above a certain value from all search results,then it extracts the eigenvalue of pages and candidate keys,compares the pages similarity to maximize the elimination of duplication and redundancy pages.Meanwhile,classifications are provided in accordance with the candidate keys of pages,thereby the precision and satisfaction of search engine could be improved with the effect of more intelligence.
Keywords:DBSCAN algorithm  page similarity  clustering  redundancy page
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号