首页 | 本学科首页   官方微博 | 高级检索  
     

基于Lucene的Web搜索引擎的研究和实现
引用本文:周凤丽,林晓丽. 基于Lucene的Web搜索引擎的研究和实现[J]. 微机发展, 2012, 0(1): 140-142,160
作者姓名:周凤丽  林晓丽
作者单位:武汉科技大学城市学院信息工程学部,湖北武汉430083
基金项目:湖北省教育科学“十一五”规划2009年度立项课题(20098236)
摘    要:互联网的快速发展也使搜索引擎不断的发展着,而搜索引擎逐渐转向商业化运行,使得搜索引擎的技术细节越来越隐蔽。文章研究和分析了搜索引擎工具Lucene的原理、模型和索引器,设计了一个搜索引擎系统。该系统采用了非递归的方式负责Web站点的网页爬取以及爬取过程中URL链接的存储、处理等,并通过多线程技术管理多个抓取线程,实现了并发抓取网页,提高了系统的运行效率。最后采用JSP技术设计了一个简易的新闻搜索引擎客户端,系统可以稳定运行,基本符合搜索引擎原理的探索,具有一定的现实意义。

关 键 词:网络爬虫  应用系统  搜索引擎  多线程

Research and Implementation of Web Search Engine Based on Lucene
ZHOU Feng-li,LIN Xiao-li. Research and Implementation of Web Search Engine Based on Lucene[J]. Microcomputer Development, 2012, 0(1): 140-142,160
Authors:ZHOU Feng-li  LIN Xiao-li
Affiliation:(Information Engineering Department, Wuhan University of Science and Technology City Institute, Wuhan 430083, China )
Abstract:Search engine has made a constant development with the development of the interact,but its gradual shifting to commercial operation makes the technical details of search engine more and more hidden. Based on research and analysis of the system strocture,model and indexer of Lucene,it implements a search engine system, this system uses a non-recursive mode to take responsibility for Web craw- ling in the Web and distributing,handling of URL links in the process of crawling,it manages multiple crawling threads by multi-threa- ding technology,implements concurrently Web pages crawling and improves the system operating efficiency. And then, use JSP technolo- gy to design a simple news search engine clients. The system can run stable in line which achieves the search engine' s principles and has certain significance.
Keywords:Web spider  application system  search engine  multi-threading
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号