首页 | 本学科首页   官方微博 | 高级检索  
     

数字图书馆主题搜索引擎的设计与实现*
引用本文:林其东,陈传波,郑乐丹,张一曼b. 数字图书馆主题搜索引擎的设计与实现*[J]. 计算机应用研究, 2009, 26(8): 2952-2955. DOI: 10.3969/j.issn.1001-3695.2009.08.044
作者姓名:林其东  陈传波  郑乐丹  张一曼b
作者单位:1. 温州大学,图书馆,浙江,温州,325035
2. 华中科技大学,软件学院,武汉,430074
3. 温州大学,瓯江学院,浙江,温州,325035
基金项目:温州大学校级科研基金资助项目(2007L029)
摘    要:提出构建数字图书馆主题搜索引擎的总体系统设计。利用一个预处理系统尽量选择高质量的种子站点,从而产生Web主题定义数据;在系统控制器的协调下,各主题爬行器同步地采集爬行器所推荐的Web资源,对下载的资源进行文本分类与主题识别;将已经下载的Web资源按学科分类存储在Web主题资源库中,通过全局信息库建立索引,接入通用接口进行依主题检索。依赖数字图书馆各方面特点,提出支持多线程主题爬行器的设计,并提出一种新颖的URL主题相关性剪切算法EPR,为实现数字图书馆主题搜索引擎原型提供重要的设计。基于开源Lucene平

关 键 词:数字图书馆   主题   爬行器   搜索引擎   EPR算法

Design and implementation of search engine system for digital library
LIN Qi-dong,CHEN Chuan-bo,ZHENG Le-dan,ZHANG Yi-manb. Design and implementation of search engine system for digital library[J]. Application Research of Computers, 2009, 26(8): 2952-2955. DOI: 10.3969/j.issn.1001-3695.2009.08.044
Authors:LIN Qi-dong  CHEN Chuan-bo  ZHENG Le-dan  ZHANG Yi-manb
Affiliation:1.a.Library;b.College of Oujiang;Wenzhou University;Wenzhou Zhejiang 325000;China;2.College of Software;Huazhong University of Science & Technology;Wuhan 430074;China
Abstract:This paper advanced the total system design for topic-specific search engine of digital library.It made use of a pretreatment system to select the seed station with high quality, thus giving Web topic defined data. Every topic crawler collected synchronistically Web resource recommended by crawlers with regulation of system controller,then classified text and identified topic in download resource, which was stored into Web topic resource database according to discipline classification.Others could search the topic resource through the index of whole information database.According to every specially characterist of digital library,this paper brang up the design for topic-specific crawler of multi-thread, and gave anovel URL pruning algorithm-EPR,for the design to realize topic-specific search engine prototype of digital library. Lucene-based open-source platform for the expansion of the system and the formation of the final system,the experiment results show that the research work of this article is effective,especially in EPR algorithm, which are really creative and valuable in real application environment.
Keywords:digital library   topic-specific   crawler   search engines   algorithm-EPR
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号