首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mukherjea  Sougata  Hirata  Kyoji  Hara  Yoshinori 《World Wide Web》1999,2(3):115-132
Search engines are useful because they allow the user to find information of interest from the World Wide Web. However, most of the popular search engines today are textual; they do not allow the user to find images from the Web. This paper describes AMORE, a Web search engine that allows the user to retrieve images from the Web by specifying relevant keywords or a similar image. Text and image search can also be combined. Moreover, we have developed a Query Result Visualization Environment that allows the organization of the results if many images are retrieved. In this paper we present AMORE's user interface and explain the technique for retrieving images visually similar to a user specified image. The method of automatically assigning relevant keywords to the images is then explained. Finally, the architecture of the system as well as some interesting observations of our experiences with AMORE are discussed. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

2.
Search engines are useful because they allow the user to find information of interest from the World Wide Web (WWW). However, most of the popular search engines today are textual; they do not allow the user to find images from the web. For effective retrieval, determining the semantics of the images is essential. In this paper, we describe the problems in determining the semantics of images on the WWW and the approach of AMORE, a WWW search engine that we have developed. AMORE's techniques can be extended to other media like audio and video. We explain how we assign keywords to the images based on HTML pages and the method to determine similar images based on the assigned text. We also discuss some statistics showing the effectiveness of our technique. Finally, we present the visual interface of AMORE with the help of several retrieval scenarios.  相似文献   

3.
4.
一种基于用户标记的搜索结果排序算法   总被引:1,自引:0,他引:1  
随着计算机网络的快速发展,网络上的信息量也日益纷繁复杂.如何准确、快速地帮助人们从海量网络数据中获取所需信息,这是目前搜索引擎首要解决的问题,为此,各种搜索排序算法应运而生.但是目前,网页信息的表达形式都十分简单,用户描述查询的形式更是十分简单,这就造成了在判断网页内容与用户查询相关性时十分困难.首先对现有的搜索引擎排序算法进行了分类总结,分析它们的优缺点.然后提出了一种基于用户反馈的语义标记的新方法,最后采用多种评估方法与Google搜索结果进行对比分析.实验结果表明,利用该方法所得到的排序结果比Google的排序结果更接近用户需求.  相似文献   

5.
The Internet is one of the most important sources of knowledge in the present time. It offers a huge volume of information which grows dramatically every day. Web search engines (e.g. Google, Yahoo…) are widely used to find specific data among that information. However, these useful tools also represent a privacy threat for the users: the web search engines profile them by storing and analyzing all the searches that they have previously submitted. To address this privacy threat, current solutions propose new mechanisms that introduce a high cost in terms of computation and communication. In this paper, we propose a new scheme designed to protect the privacy of the users from a web search engine that tries to profile them. Our system uses social networks to provide a distorted user profile to the web search engine. The proposed protocol submits standard queries to the web search engine; thus it does not require any change in the server side. In addition to that, this scheme does not require the server to collaborate with the users. Our protocol improves the existing solutions in terms of query delay. Besides, the distorted profiles still allow the users to get a proper service from the web search engines.  相似文献   

6.
搜索引擎中的聚类浏览技术   总被引:1,自引:0,他引:1  
搜索引擎大多以文档列表的形式将搜索结果显示给用户,随着Web文档数量的剧增,使得用户查找相关信息变得越来越困难,一种解决方法是对搜索结果进行聚类提高其可浏览性。搜索引擎的聚类浏览技术能使用户在更高的主题层次上查看搜索结果,方便地找到感兴趣的信息。本文介绍了搜索引擎的聚类浏览技术对聚类算法的基本要求及其分类方法,研究分析了主要聚类算法及其改进方法的特点,讨论了对聚类质量的评价,最后指出了聚类浏览技术的发展趋势。  相似文献   

7.
集成搜索引擎的文本数据库选择   总被引:8,自引:0,他引:8  
用户需要检索的信息往往分散存储在多个搜索多个搜索引擎各自的数据库里,对普通用户而言,访问多个搜索引擎并从返回的结果中分辨出确实有网页是一件费时费力的工作,集成搜索引擎则可以提供给用户一个同时记问多个搜索引擎人集成环境,集成搜索引擎能将其接收到的用户查询提交给底层的多个搜索引擎进行搜索,作为一种搜索工具,集成搜索引擎具有如WEB查询覆盖面比传统引擎更大,引警有更好的可扩展性等优点,讨论了解决集成搜索引擎的数据库选择问题的多种技术,针对用户提交的查询要求,通过数据库选择可以选定最有可能返回有用信息的底层搜索引擎。  相似文献   

8.
随着Web信息的快速增长和人们对信息检索质量要求的提高,传统的搜索引擎已不能很好地满足人们的需求. 本文提出了一种个性化元搜索引擎模型.个性化是指模型可以针对不同的用户建立不同的用户兴趣模型,然后根据用户兴趣,模型对搜索结果进行过滤、重排序处理,使得显示给用户的搜索结果更具有针对性.本文阐述了各主要功能模块工作原理,并详细介绍了根据用户兴趣模型对搜索结果进行排序的算法,实验表明该算法能够有效地提高用户的检索质量.  相似文献   

9.
The World Wide Web (the Web for short) is rapidly becoming an information flood as it continues to grow exponentially. This causes difficulty for users to find relevant pieces of information on the Web. Search engines and robots (spiders) are two popular techniques developed to address this problem. Search engines are indexing facilities over searchable databases. As the Web continues to expand, search engines are becoming redundant because of the large number of Web pages they return for a single search. Robots are similar to search engines; rather than indexing the Web, they traverse (“walk through”) the Web, analyzing and storing relevant documents. The main drawback of these robots is their high demand on network resources that results in networks being overloaded. This paper proposes an alternate way in assisting users in finding information on the Web. Since the Web is made up of many Web servers, instead of searching all the Web servers, we propose that each server does its own housekeeping. A software agent named SiteHelper is designed to act as a housekeeper for the Web server and as a helper for a Web user to find relevant information at a particular site. In order to assist the Web user finding relevant information at the local site, SiteHelper interactively and incrementally learns about the Web user's areas of interest and aids them accordingly. To provide such intelligent capabilities, SiteHelper deploys enhanced HCV with incremental learning facilities as its learning and inference engines.  相似文献   

10.
The storage and retrieval of multimedia has become a requirement for many information systems. This paper presents a comprehensive survey of image search engines, with many clarifying comments. First, we looked at image search engine architecture, followed by the role of the crawler in detecting images. We reviewed the common World Wide Web based systems for image retrieval developed in research institutions and in commercial business. A comparative performance study between the existing engines is also presented.  相似文献   

11.
Cellary  W. Wiza  W. Walczak  K. 《Computer》2004,37(5):87-89
The exponential growth in Web sites is making it increasingly difficult to extract useful information on the Internet using existing search engines. Despite a wide range of sophisticated indexing and data retrieval features, search engines often deliver satisfactory results only when users know precisely what they are looking for. Traditional textual interfaces present results as a list of links to Web pages. Because most users are unwilling to explore an extensive list, search engines arbitrarily reduce the number of links returned, aiming also to provide quick response times. Moreover, their proprietary ranking algorithms often do not reflect individual user preferences. Those who need comprehensive general information about a topic or have vague initial requirements instead want a holistic presentation of data related to their queries. To address this need, we have developed Periscope, a 3D search result visualization system that displays all the Web pages found in a synthetic, yet comprehensible format.  相似文献   

12.
网络上的专业搜索引擎数量众多,普通用户在选择时往往无所适从。文章提出了一个自动的查询导向系统,可以将用户查询自动导向到合适的专业搜索引擎,解决了这个矛盾。  相似文献   

13.
对特定区域搜索引擎的自动分类系统的研究   总被引:2,自引:1,他引:2  
谢世朋  胡茂林 《微机发展》2005,15(9):16-17,20
随着因特网的飞速发展,特定区域搜索引擎(Domain-spedfic search engines)正变的越来越重要,因为这种搜索引擎通常能提供更精确的结果和一些一般的搜索引擎所不能提供的信息。然而特定区域搜索引擎通常需要花很多的时间来组建和维持。文中提出一个基于机器学习的方法来自动完成和维持这种特定区域搜索引擎,即运用最大加权依赖树分类方法改进以往的方法进行自动分类,使分类结果更为精确。运用此技术可以组建一个新的特定区域搜索引擎,将给人们的生活、学习提供方便。  相似文献   

14.
In this paper, we tackle the private information retrieval (PIR) problem associated with the use of Internet search engines. We address the desire for a user to retrieve information from the Web without the search provider learning about it. Traditional PIR protocols present two main shortcomings for their application: (i) They assume cooperation by the database, which is not affordable for a real‐world search engine like Google and (ii) their computational complexity is linear in the size of the database, which is unfeasible in the case of the Web. More recent approaches relax PIR conditions to overcome these limitations and present some level of privacy. Mostly, they aim to distort server logs regardless of the loss of information that is involved. Server logs are used by search engines for profiling and, thereby, provide personalized results. This becomes a user's need given the growth of the Web and can also be used for targeted advertising. This study focuses on a noncooperative agent for private search that considers profiling as valuable data used for both sides of the search process. It is based on the assumption that the user's identity is formed by the union of various areas of interests or facets. Managing the HTTP connections properly, submitted queries are mapped to different server logs according to these facets. The rationale is that these logs cannot be used for tracing the user while they are still helpful for profiling. We present a personalized query classification approach based on the user's browsing history and to provide empirical results; we developed an attacking algorithm against the agent that shows that the disclosure risk is reduced.  相似文献   

15.
This paper investigates the composition of search engine results pages. We define what elements the most popular web search engines use on their results pages (e.g., organic results, advertisements, shortcuts) and to which degree they are used for popular vs. rare queries. Therefore, we send 500 queries of both types to the major search engines Google, Yahoo, Live.com and Ask. We count how often the different elements are used by the individual engines. In total, our study is based on 42,758 elements. Findings include that search engines use quite different approaches to results pages composition and therefore, the user gets to see quite different results sets depending on the search engine and search query used. Organic results still play the major role in the results pages, but different shortcuts are of some importance, too. Regarding the frequency of certain host within the results sets, we find that all search engines show Wikipedia results quite often, while other hosts shown depend on the search engine used. Both Google and Yahoo prefer results from their own offerings (such as YouTube or Yahoo Answers). Since we used the .com interfaces of the search engines, results may not be valid for other country-specific interfaces.  相似文献   

16.
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the search systems, the effectiveness of a metasearch engine is mainly determined by the quality of the results it returns in response to user queries. Since these services do not maintain their own document index, they exploit multiple search engines using a rank aggregation method in order to classify the collected results. However, the rank aggregation methods which have been proposed until now, utilize a very limited set of parameters regarding these results, such as the total number of the exploited resources and the rankings they receive from each individual resource. In this paper we present QuadRank, a new rank aggregation method, which takes into consideration additional information regarding the query terms, the collected results and the data correlated to each of these results (title, textual snippet, URL, individual ranking and others). We have implemented and tested QuadRank in a real-world metasearch engine, QuadSearch, a system developed as a testbed for algorithms related to the wide problem of metasearching. The name QuadSearch is related to the current number of the exploited engines (four). We have exhaustively tested QuadRank for both effectiveness and efficiency in the real-world search environment of QuadSearch and also, using a task from the recent TREC-2009 conference. The results we present in our experiments reveal that in most cases QuadRank outperformed all component engines, another metasearch engine (Dogpile) and two successful rank aggregation methods, Borda Count and the Outranking Approach.  相似文献   

17.
ISeeker--一个高效的元搜索引擎   总被引:4,自引:0,他引:4  
彭洪汇  林作铨 《计算机工程》2003,29(10):41-42,52
介绍了一个高效的元搜索引擎系统ISeeker,提出了一套全面的搜索引擎评价和选择算法,在对检索结果进行融合处理时尽可能选择最好的结果,而且在用户察看结果时进行在线学习和调整。  相似文献   

18.
19.
Web搜索引擎框架研究   总被引:43,自引:1,他引:42  
Web搜索引擎是Internet上非常有用的信息检索工具,但是由于目前搜索引擎检索出的信息量庞大,且一个特定的搜索引擎主要包含某一特定领域的信息,这使得用户很难从某一个搜索引擎获得准确的导航信息。文中提出一个新的Web搜索引擎框架GSE,并提出了一个适合于Web信息获取与处理的语言WERPL。通过WIRPL可以将多个Web搜索引擎结合起来,为用户提供一个一致、高效、准确的Web搜索引擎。  相似文献   

20.
大多数搜索引擎没有考虑到用户的个性和兴趣,大大降低了搜索的准确性。采用Web挖掘技术对存放在Web缓存中的历史页面进行挖掘,获取用户的兴趣信息,使用最优二叉树的形式来表示用户兴趣,利用获取的用户兴趣信息来构建个性化模型,并且利用智能Agent跟踪用户的兴趣变化,不断地对用户兴趣个性化模型进行更新。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号