首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hadoop的分布式搜索引擎关键技术
引用本文:王俊生,施运梅,张仰森. 基于Hadoop的分布式搜索引擎关键技术[J]. 北京机械工业学院学报, 2011, 0(4): 53-56,61
作者姓名:王俊生  施运梅  张仰森
作者单位:北京信息科技大学计算机学院,北京100101
基金项目:国家自然科学基金项目(60873013);北京市自然科学基金B类重点项目(KZ200811232019)
摘    要:实现了基于Hadoop的分布式搜索引擎,着重讨论了实现分布式搜索引擎涉及的3个关键性技术:索引表的建立、分词的处理和索引前的预处理。通过实验对比了集中式搜索引擎和分布式搜索引擎,结果表明了基于hadoop的分布式搜索引擎在处理数据方面强劲的优势。

关 键 词:Hadoop  分布式搜索引擎  Map/Reduce  索引表  分词

Key technologies of distributed search engine based on Hadoop
WANG Jun-sheng,SHI Yun-mei,ZHANG Yang-sen. Key technologies of distributed search engine based on Hadoop[J]. Journal of Beijing Institute of Machinery, 2011, 0(4): 53-56,61
Authors:WANG Jun-sheng  SHI Yun-mei  ZHANG Yang-sen
Affiliation:( School of Computer of Sciences, Beijing Information Science and Technology University, Beijing 100101, China)
Abstract:To solve the bottleneck produced by the centralized search engines, more and more people are now doing researches by using distributed technologies. A distributed search engine is realized by Hadoop in this paper. Then three key points about distributed search engine are analysed, including the building of index table, the processing of segmentation and preprocessing of index table. Finally, experiments comparing the centralized search engine and distributed search engine shows the strength of distrib- uted search engine based on Hadoop in dealing with data.
Keywords:distributed search engine  Hadoop  Map/Reduce  index table  segmentation
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号