首页 | 本学科首页   官方微博 | 高级检索  
     

弱链接文档搜索引擎研究
引用本文:陈哲,魏衍君.弱链接文档搜索引擎研究[J].电脑与微电子技术,2013(19):3-7.
作者姓名:陈哲  魏衍君
作者单位:f商丘职业技术学院计算机系.商丘476000
摘    要:聚类技术能将大规模数据按照数据的相似性划分成用户可迅速理解的簇.从而使用户更快地了解大量文档中所包含的内容。因此.聚类技术成为搜索引擎中不可或缺的部分和研究热点。Web上的AJAX应用和PowerPoint文件等弱链接文档由于缺乏足够的超链接信息,导致搜索该类文档时.排序结果不佳。针对该问题.给出一个弱链接文档的搜索引擎框架,并重点描述一个基于网页搜索结果的弱链接文档排序算法.基于聚类的弱链接文档排序算法利用聚类算法从高质量的网页搜索结果中提取与查询相关的主题.并根据主题的相关网页的排名确定该主题的重要性.根据识别的带权重的主题计算弱链接文档的排序值。实验结果表明该算法能够为弱链接文档产生较好的排序结果.

关 键 词:搜索引擎  聚类技术  弱链接文档

Research on Weak-Linked Document in Search Engine
Authors:CHEN Zhe  WEI Yan-jun
Affiliation:(Department of Computer, Shangqiu Vocational and Technical College, Shangqiu 476000)
Abstract:Clustering technology can partition a large number of documents into a small number of clusters according to document similarities.The generated clusters help people to understand documents quickly.Clustering technology plays an important role in SE and attracts a lot of interests from both industry and academic.The current search engine cannot rank well weak-linked docu ments such as PowerPoint files and AJAX applications.Current search engines return therefore either completely irrelevant results or poorly ranked documents when searching for these files. Proposes novel framework for correctly retrieving and Ranking weak-linked documents based on Clustering.The experiments show that our approach considerably improves the result quality of current search engines and that of latent semantic indexing.
Keywords:Search Engine  Clustering Technology  Weak-Linked Document
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号