首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.  相似文献   

3.
搜索引擎中的网络数据挖掘技术   总被引:4,自引:0,他引:4  
万维网包含大量的信息,而且随着其快速的增长而变得越来越复杂,这就导致了现在用户定位相关和高质量信息的搜索变得越来越难。将网络数据挖掘技术应用于搜索引擎将大大改善搜索引擎的搜索效率以及搜索质量。提出了具体的算法,并阐述了此算法在搜索引擎中的应用。  相似文献   

4.
基于Agent的元搜索引擎的研究与设计   总被引:20,自引:2,他引:18  
论文提出并介绍了一种基于Agent的元搜索引擎系统,旨在帮助Internet用户快速准确地搜索到符合自己需求的Internet信息。该系统采用元搜索引擎的结构,以Agent作为架构系统的基本组件,利用Agent的自治性和协作性来完成用户的个性化Internet信息的搜索。在系统设计中,提出了基于用户喜好的成员搜索引擎的调度策略,能够提高系统的性能和易用性。最后分析了研究该系统的意义及课题尚待解决的问题。  相似文献   

5.
This study presents an analysis of users' queries directed at different search engines to investigate trends and suggest better search engine capabilities. The query distribution among search engines that includes spawning of queries, number of terms per query and query lengths is discussed to highlight the principal factors affecting a user's choice of search engines and evaluate the reasons of varying the length of queries. The results could be used to develop long to short term business plans for search engine service providers to determine whether or not to opt for more focused topic specific search offerings to gain better market share.  相似文献   

6.
基于相关术语集的搜索引擎选择   总被引:1,自引:0,他引:1  
欧洁 《计算机科学》2003,30(7):56-59
1 引言 Web从1991年出现以来,已经发展成为一个巨大的全球化信息空间,而且其信息容量仍在以指数形式飞速增长。面对海量Web信息资源,如何有效地检索Web信息,以帮助用户从大量文档信息集合中找到对给定查询请求有用的文档子集,也就成为一项重要而迫切的研究课题。  相似文献   

7.
首先介绍了传统搜索引擎的基本原理以及结构,指出了传统搜索引擎存在的不足,然后介绍了元搜索引擎的定义、运作机制及其发展的方向。在此理论基础上提出了新一代元搜索引擎基于用户的调度改进理念。实验表明,该改进提高了用户的检索效率和质量。  相似文献   

8.
Voting techniques for expert search   总被引:2,自引:2,他引:2  
In an expert search task, the users’ need is to identify people who have relevant expertise to a topic of interest. An expert search system predicts and ranks the expertise of a set of candidate persons with respect to the users’ query. In this paper, we propose a novel approach for predicting and ranking candidate expertise with respect to a query, called the Voting Model for Expert Search. In the Voting Model, we see the problem of ranking experts as a voting problem. We model the voting problem using 12 various voting techniques, which are inspired from the data fusion field. We investigate the effectiveness of the Voting Model and the associated voting techniques across a range of document weighting models, in the context of the TREC 2005 and TREC 2006 Enterprise tracks. The evaluation results show that the voting paradigm is very effective, without using any query or collection-specific heuristics. Moreover, we show that improving the quality of the underlying document representation can significantly improve the retrieval performance of the voting techniques on an expert search task. In particular, we demonstrate that applying field-based weighting models improves the ranking of candidates. Finally, we demonstrate that the relative performance of the voting techniques for the proposed approach is stable on a given task regardless of the used weighting models, suggesting that some of the proposed voting techniques will always perform better than other voting techniques. Extended version of ‘Voting for candidates: adapting data fusion techniques for an expert search task’. C. Macdonald and I. Ounis. In Proceedings of ACM CIKM 2006, Arlington, VA. 2006. doi: 10.1145/1183614.1183671.  相似文献   

9.
A Probabilistic Approach for Distillation and Ranking of Web Pages   总被引:1,自引:0,他引:1  
Greco  Gianluigi  Greco  Sergio  Zumpano  Ester 《World Wide Web》2001,4(3):189-207
A great number of recent papers have investigated the possibility of introducing more effective and efficient algorithms for search engines. In traditional search engines the resulting ranking is carried out using textual information only and, as showed by several works, they are not very useful for extracting relevant information. Present research, instead, takes a new approach, called Topic Distillation, whose main task is finding relevant documents using a different similarity criterion: retrieved documents are those related to the query topic, but which do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then, by applying an iterative procedure, obtain the authoritative pages. In this paper, we present a different approach which computes the authoritative pages by analyzing the structure of the base set. The technique applies a statistical approach to the co-citation matrix (of the base set) to find the most co-cited pages and combines a link analysis approach with the content page evaluation. Several experiments have shown the validity of our approach.  相似文献   

10.
随着Internet的迅猛发展,网络信息呈爆炸式增长。Web信息检索是一个从Web海量数据中检索用户感兴趣信息的综合技术,它从一定程度上满足了用户对信息的需求,但返回页面的数量依然十分巨大。如何对搜索结果进行排序已成为影响搜索质量的一个重要问题。本文介绍了两种页面排序算法PageRank和HITS,并对网页排序算法的若干改进进行了讨论。  相似文献   

11.
An inverted index is a core data structure of Information Retrieval systems, especially in search engines. Since the search environments have become more dynamic, many on-line index maintenance strategies have been proposed. Previous strategies were designed for HDDs. Consequently, in order to avoid expensive random access cost, Merge-based strategies have been preferred to In-place index update strategies on HDDs. However, flashSSDs have become solid alternatives to HDDs. FlashSSDs currently are adopted in a wide range of areas due to their superior features such as the short access latency, energy efficiency, and high bandwidth. In this article, we first reexamined potentials of In-place index update strategies on flashSSDs. Thanks to the insignificant access latency of flashSSDs, we discovered that In-place index update strategies outperform Merge-based strategies, since In-place index update strategies generate much less amount of I/O than Merge-based strategies despite inducing frequent random accesses. Based on this discovery, we suggest a new inverted index maintenance strategy based on an In-place index update strategy for flashSSDs, called Multipath Flash In-place Strategy (MFIS). To enhance the index maintenance performance, MFIS stores the posting list of each term non-contiguously and exploits the internal parallelism of flashSSDs. Thus, MFIS not only induces the minimum amount of I/O but also utilizes the maximum bandwidth of flashSSDs. Furthermore, MFIS is designed to show high query processing performance by utilizing the internal parallelism of flashSSDs even though the posting list of each term is stored non-contiguously. In our experiments, the index maintenance performance of MFIS was considerably better than other previous maintenance strategies. The index maintenance performance was up to 14.93, 4.04, 5.12, and 2.33 times higher than Merge-based strategies such as Immediate Merge, Geometric Partitioning, Hybrid, and SSD-aware Hybrid, respectively. The query processing performance of MFIS was up to 1.62 times higher than non-contiguous In-place. In addition, MFIS showed almost the best query processing performance as Merge-based strategies did. In conclusion, MFIS is the best on-line inverted index maintenance strategy on flashSSDs in terms of both index maintenance and query processing performance.  相似文献   

12.
搜索引擎是一种能够通过Internet接受用户的查询指令,并向用户提供符合其查询要求的信息资源网址的系统;是网络信息检索的首选工具.目前,各类搜索引擎层出不穷,市场发展需求巨大.因此对搜索引擎技术的研究是很有意义的.本文着重分析了搜索引擎的工作原理以及目前搜索引擎存在的问题,并为解决相关问题提出发展新的搜索引擎模式的建议.  相似文献   

13.
搜索引擎技术研究与发展   总被引:20,自引:0,他引:20  
印鉴  陈忆群  张钢 《计算机工程》2005,31(14):54-56,104
介绍搜索引擎技术。首先以工作方式作分类介绍,接着介绍各部分工作原理和技术研究,包括如搜索器策略、检索策略、搜索结果处理、信息检索Agent、多媒体搜索引擎等关键技术。最后展望搜索引擎发展重要方向。  相似文献   

14.
Series feature aggregation for content-based image retrieval   总被引:1,自引:0,他引:1  
Feature aggregation is a critical technique in content-based image retrieval (CBIR) systems that employs multiple visual features to characterize image content. Most previous feature aggregation schemes apply parallel topology, e.g., the linear combination scheme, which suffer from two problems. First, the function of individual visual feature is limited since the ranks of the retrieved images are determined only by the combined similarity. Second, the irrelevant images seriously affect the retrieval performance of feature aggregation scheme since all images in a collection will be ranked. To address these problems, we propose a new feature aggregation scheme, series feature aggregation (SFA). SFA selects relevant images using visual features one by one in series from the images highly ranked by the previous visual feature. The irrelevant images will be effectively filtered out by individual visual features in each stage, and the remaining images are collectively described by all visual features. Experiments, conducted with IAPR TC-12 benchmark image collection (ImageCLEF2006) that contains over 20,000 photographic images and defined queries, have shown that the proposed SFA can outperform conventional parallel feature aggregation schemes.  相似文献   

15.
模糊聚类在Web信息检索中的应用研究   总被引:4,自引:0,他引:4  
何鹏  徐立臻  庄晓青 《计算机工程》2002,28(10):241-242,260
如何从大量信息中快速、有效地进行Web信息检索已经成为一项重要的研究课题,但是传统的搜索引擎所提供的搜索结果仅仅按照与查询的相关性从高到低排成一个有序列表,不具备层次性,用户使用起来并不方便,该文基于Web资源中词语的不分明性即模糊性,提出采用模糊聚类的方法自动组织搜索引擎的结果来解决这个问题。  相似文献   

16.
Searching for relevant information on the World Wide Web is often a laborious and frustrating task for casual and experienced users. To help improve searching on the Web based on a better understanding of user characteristics, we investigate what types of knowledge are relevant for Web-based information seeking, and which knowledge structures and strategies are involved. Two experimental studies are presented, which address these questions from different angles and with different methodologies. In the first experiment, 12 established Internet experts are first interviewed about search strategies and then perform a series of realistic search tasks on the World Wide Web. From this study a model of information seeking on the World Wide Web is derived and then tested in a second study. In the second experiment two types of potentially relevant types of knowledge are compared directly. Effects of Web experience and domain-specific background knowledge are investigated with a series of search tasks in an economics-related domain (introduction of the Euro currency). We find differential and combined effects of both Web experience and domain knowledge: while successful search performance requires the combination of the two types of expertise, specific strategies directly related to Web experience or domain knowledge can be identified.  相似文献   

17.
基于数据融合的Web元搜索模型比较研究   总被引:1,自引:0,他引:1  
丁一  杨朋英 《计算机仿真》2007,24(4):120-123
没有一个搜索引擎系统在任何情况下所表现出来的性能都比其他的搜索引擎要好,因此研究元搜索引擎是必要的.文中提出了三种元搜索中的传统数据融合方法:基于线性组合的相似度融合、基于排序的Unbiased和Biased-Bayes融合.其中相似度融合通过分析部分Web文档的内容来产生线性组合的参数,Unbiased则将各搜索引擎的结果表均衡地融合在一起,Biased-Bayes则利用了ODP的分类服务和Bayes概率模型来计算文档的相关度.通过实验证明它们是行之有效的融合方法,比较传统的方法的性能有一定提高,在效率上比纯粹分析所有文档的内容来进行融合的方法更好.  相似文献   

18.
In this paper a novel approach is proposed for generating the optimal ranked clicked URLs using genetic algorithm (GA) based on clustered web query sessions for effective personalized web search. Experimental study was conducted on the data set of web query sessions captured in the domains academics, entertainment and sports to test the effectiveness of clusterwise optimal ranked clicked URLs for personalized web search (PWS). The results, which are verified statistically shows an improvement in the average precision of the personalized web search based on optimal ranked clicked URLs over both Classic IR and personalized web search without optimal ranked clicked URLs. Thus the effectiveness of personalized web search using optimal ranked clicked URLs is confirmed for better customizing the web search according to the information need of the user.  相似文献   

19.
Search engines are among the most popular as well as useful services on the web. There is a need, however, to cater to the preferences of the users when supplying the search results to them. We propose to maintain the search profile of each user, on the basis of which the search results would be determined. This requires the integration of techniques for measuring search quality, learning from the user feedback and biased rank aggregation, etc. For the purpose of measuring web search quality, the “user satisfaction” is gauged by the sequence in which he picks up the results, the time he spends at those documents and whether or not he prints, saves, bookmarks, e-mails to someone or copies-and-pastes a portion of that document. For rank aggregation, we adopt and evaluate the classical fuzzy rank ordering techniques for web applications, and also propose a few novel techniques that outshine the existing techniques. A “user satisfaction” guided web search procedure is also put forward. Learning from the user feedback proceeds in such a way that there is an improvement in the ranking of the documents that are consistently preferred by the users. As an integration of our work, we propose a personalized web search system.  相似文献   

20.
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler self-evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain-specific web documents. Our implementation presents a different approach to focused crawling and aims to overcome the limitations imposed by the need to provide initial data for training, while maintaining a high recall/precision ratio. We compare its efficiency with other well-known web information retrieval techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号