首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into account.  相似文献   

2.
网络搜索分析在优化搜索引擎方面具有举足轻重的作用,而且对用户个人搜索特性进行分析能够提高搜索引擎的精准度。目前,大多数已有模型(比如点击图模型及其变体),注重研究用户群体的共同特点。然而,关于如何做到既可以获取用户群体共同特点又可以获取用户个人特点方面的研究却非常少。本文研究了基于个人用户网络搜索分析新问题,即通过研究用户搜索的突发性现象,获取个人用户搜索查询的主题分布情况。提出了两个搜索主题模型,即搜索突发性模型(SBM)和耦合敏感搜索突发性模型(CS-SBM)。SBM假设查询词和URL主题是无关的,CS-SBM假设查询词和URL之间是有主题关联的,得到的主题分布信息存储在偏Dirichlet先验中,采用Beta分布刻画用户搜索的时间特性。实验结果表明,每一个用户的网络搜索轨迹都有多种基于用户的独有特点。同时,在使用大量真实用户查询日志数据情况下,与LDA、DCMLDA、TOT相比,本文提出的模型具有明显的泛化性能优势,并且有效地描绘了用户搜索查询主题在时间上的变化过程。  相似文献   

3.
In this study, we aim to develop a pricing mechanism that reduces the effects resulted by vindictive advertisers who bid on sponsored search auctions run by search engine providers. In particular, we aim to ensure payment fairness and price stability in these auctions. With the generalized second price principle, advertisers pay the next-ranked bid value rather than the price that they bid. Vindictive bidders take advantage of this principle to manipulate the payment of a previously-ranked advertiser. Vindictive bidding results in unfair outcomes and eliminates equilibria. However, it is difficult to compute rational payments for all advertisers as advertisers’ valuations are private. Our proposed mechanism decreases the payment to make up for the utility loss that is induced by vindictive bidding. The vindictive advertiser is simultaneously punished with an additional payment. According to our theoretical analyses and simulations, the proposed mechanism efficiently decreases the effects that result from vindictive bidding, and guarantees equilibrium outcomes.  相似文献   

4.
One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time.  相似文献   

5.
6.
We develop an infinite horizon alternative-move model of the unique second-price sponsored search auction. We use this model to explain two distinguishable bidding patterns observed in our bidding data: bidding war cycle and stable bid. With examples, we show that only a small portion of the value generated may be extracted by search engines, if advertisers are engaged in bidding war cycles. Finally, we show the impact of auction design on advertiser bids and search engine revenue.  相似文献   

7.
基于Web Service的查询检索服务   总被引:1,自引:1,他引:1  
针对具有化工信息检索功能的多个网络应用不能统一访问的问题,设计了基于Web Service技术的信息查询、检索服务方案。在微软公司的,NET平台上,利用Web Service将分别处于不同服务器上的化工专业信息检索建立成统一的服务体系,使用ASP.NET和ADO.NET技术为用户提供统一的查询界面和多种访问方式,通过基于XML的格式传递数据,实现了多个服务的信息共享,达到了一次查询获得多个服务提供的所有可用信息的目标。本文介绍了其中的关键技术及实现方法。  相似文献   

8.
Identifying and interpreting user intent are fundamental to semantic search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving the result quality, and in this work we focus on intent word discovery and understanding. Our approach towards intent word detection is motivated by the hypotheses that query intent words satisfy certain distributional properties in large query logs similar to function words in natural language corpora. Following this idea, we first prove the effectiveness of our corpus distributional features, namely, word co-occurrence counts and entropies, towards function word detection for five natural languages. Next, we show that reliable detection of intent words in queries is possible using these same features computed from query logs. To make the distinction between content and intent words more tangible, we additionally provide operational definitions of content and intent words as those words that should match, and those that need not match, respectively, in the text of relevant documents. In addition to a standard evaluation against human annotations, we also provide an alternative validation of our ideas using clickthrough data. Concordance of the two orthogonal evaluation approaches provide further support to our original hypothesis of the existence of two distinct word classes in search queries. Finally, we provide a taxonomy of intent words derived through rigorous manual analysis of large query logs.  相似文献   

9.
Information Systems and e-Business Management - Users tend to use their own terms to search items in structured search systems such as restaurant searches (e.g. Yelp), but due to users’ lack...  相似文献   

10.
从中英文用户的搜索习惯差异的角度出发,引入中文分词技术对中文搜索引擎的搜索日志进行了分析。重点分析了用户输入搜索词的一些规律,包括选择的语言、搜索词的长度和频率、高级搜索技巧的使用以及搜索词的修改情况;还提出了用户提交搜索词的模型,给出了历史搜索词对搜索结果的影响因子算法。  相似文献   

11.
Graph conductance queries, also known as personalized PageRank and related to random walks with restarts, were originally proposed to assign a hyperlink-based prestige score to Web pages. More general forms of such queries are also very useful for ranking in entity-relation (ER) graphs used to represent relational, XML and hypertext data. Evaluation of PageRank usually involves a global eigen computation. If the graph is even moderately large, interactive response times may not be possible. Recently, the need for interactive PageRank evaluation has increased. The graph may be fully known only when the query is submitted. Browsing actions of the user may change some inputs to the PageRank computation dynamically. In this paper, we describe a system that analyzes query workloads and the ER graph, invests in limited offline indexing, and exploits those indices to achieve essentially constant-time query processing, even as the graph size scales. Our techniques—data and query statistics collection, index selection and materialization, and query-time index exploitation—have parallels in the extensive relational query optimization literature, but is applied to supporting novel graph data repositories. We report on experiments with five temporal snapshots of the CiteSeer ER graph having 74–702 thousand entity nodes, 0.17–1.16 million word nodes, 0.29–3.26 million edges between entities, and 3.29–32.8 million edges between words and entities. We also used two million actual queries from CiteSeer’s logs. Queries run 3–4 orders of magnitude faster than whole-graph PageRank, the gap growing with graph size. Index size is smaller than a text index. Ranking accuracy is 94–98% with reference to whole-graph PageRank.  相似文献   

12.
This paper describes and evaluates a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator near being the default operation. Suggestions are offered to the searcher when the length of the result list falls outside predefined bounds. If the list is too long, the engine specializes the query through the use of super phrases; if the list is too short, the engine generalizes the query through the use of proximal subphrases.We describe methods for generating both types of suggestions and present algorithms for ranking the suggestions. Specifically, we present the problem of counting proximal subphrases for specialization and the problem of counting unordered super phrases for generalization.The uptake of our approach was evaluated by analyzing search log data from before and after the suggestion feature was added to a commercial version of the search engine. We looked at approximately 1.5 million queries and found that, after they were added, suggestions represented nearly 30% of the total queries. Efficacy was evaluated through a controlled study of 24 participants performing nine searches using three different search engines. We found that the engine with phrasal query suggestions had better high-precision recall than both the same search engine without suggestions and a search engine with a similar interface but using an Okapi BM25 ranking algorithm.  相似文献   

13.
14.
基于文档与搜索结果上下文的查询扩展方法   总被引:1,自引:0,他引:1  
蒋辉  阳小华 《计算机应用》2009,29(3):852-853
在查询扩展方法中,如果通过查询结果中关键词的上下文来计算候选关键词的权重,将权重大的词作为查询扩展词,其候选关键词来源于文档中关键词的上下文,这种方法存在主题漂移的问题。为了解决这个问题,提出一种将初始查询结果过滤,只选择与源文档语境相似的搜索结果,来帮助选择查询扩展词的方法。实验结果表明该方法能获得更合适的查询扩展词。  相似文献   

15.
In this paper, we propose a multimodal query suggestion method for video search which can leverage multimodal processing to improve the quality of search results. When users type general or ambiguous textual queries, our system MQSS provides keyword suggestions and representative image examples in an easy-to-use dropdown manner which can help users specify their search intent more precisely and effortlessly. It is a powerful complement to initial queries. After the queries are formulated as multimodal query (i.e., text, image), the new queries are input to individual search models, such as text-based, concept-based and visual example-based search model. Then we apply multimodal fusion method to aggregate the above-mentioned several search results. The effectiveness of MQSS is demonstrated by evaluations over a web video data set.  相似文献   

16.
One of the challenges in image search is to learn with few labeled examples. Existing solutions mainly focus on leveraging either unlabeled data or query logs to address this issue, but little is known in taking both into account. This work presents a novel learning scheme that exploits both unlabeled data and query logs through a unified Manifold Ranking (MR) framework. In particular, we propose a local scaling technique to facilitate MR by self-tuning the scale parameter, and a soft label propagation strategy to enhance the robustness of MR against erroneous query logs. Further, within the proposed MR framework, a hybrid active learning method is developed, which is effective and efficient to select the informative and representative unlabeled examples, so as to maximally reduce users’ labeling effort. An empirical study shows that the proposed scheme is significantly more effective than the state-of-the-art approaches.  相似文献   

17.
Unstructured Peer-to-Peer (P2P) networks have become a very popular architecture for content distribution in large-scale and dynamic environments. Searching for content in unstructured P2P networks is a challenging task because the distribution of objects has no association with the organization of peers. Proposed methods in recent years either depend too much on objects replication rate or suffer from a sharp decline in performance when objects stored in peers change rapidly, although their performance is better than flooding or random walk algorithms to some extent. In this paper, we propose a novel query routing mechanism for improving query performance in unstructured P2P networks. We design a data structure called traceable gain matrix (TGM) that records every query's gain at each peer along the query hit path, and allows for optimizing query routing decision effectively. Experimental results show that our query routing mechanism achieves relatively high query hit rate with low bandwidth consumption in different types of network topologies under static and dynamic network conditions.  相似文献   

18.
Experienced users who query search engines have a complex behavior. They explore many topics in parallel, experiment with query variations, consult multiple search engines, and gather information over many sessions. In the process they need to keep track of search context — namely useful queries and promising result links, which can be hard. We present an extension to search engines called SearchPad that makes it possible to keep track of ‘search context' explicitly. We describe an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot. Our design of SearchPad has several desirable properties: (i) portability across all major platforms and browsers; (ii) instant start requiring no code download or special actions on the part of the user; (iii) no server side storage; and (iv) no added client–server communication overhead. An added benefit is that it allows search services to collect valuable relevance information about the results shown to the user. In the context of each query SearchPad can log the actions taken by the user, and in particular record the links that were considered relevant by the user in the context of the query. The service was tested in a multi-platform environment with over 150 users for 4 months and found to be usable and helpful. We discovered that the ability to maintain search context explicitly seems to affect the way people search. Repeat SearchPad users looked at more search results than is typical on the Web, suggesting that availability of search context may partially compensate for non-relevant pages in the ranking.  相似文献   

19.
Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manual annotation effort. We investigate and evaluate this approach using a collection of 97,628 photographic images. The results indicate that the contribution of search log based training data is positive despite their inherent noise; in particular, the combination of manual and automatically generated training data outperforms the use of manual data alone. It is therefore possible to use clickthrough data to perform large-scale image annotation with little manual annotation effort or, depending on performance, using only the automatically generated training data. An extensive presentation of the experimental results and the accompanying data can be accessed at .  相似文献   

20.
ABSTRACT

Understanding the search behaviour of online users is among the long-tail practices of Interactive Information Retrieval that helps identify the user information needs. The Interactive Social Book Search (SBS), under the umbrella of Interactive Information Retrieval (IIR), aims to understand the user interactions with book collections and the associated professionally-curated and socially-constructed metadata on the baseline and multistage user interfaces (UIs). This paper reports on the book search behaviour of users by reviewing research publications related to the Interactive SBS published during the last two decades. It presents a holistic view of the overall progress of Interactive SBS by summarising and visualising the experimental structure, search systems, datasets, demographics of participants, and findings to identify the research trends and possible future directions. Based on the collected evidence, it attempts to answer how the search system, user interface (UI), and the nature of tasks affect the book search behaviour of users. The article is the first of its kind that attempts to understand the book search behaviour of users in the context of Social Book Search with implications for usability experts and others working in UI design, web search engines, book search engines, digital libraries, collaborative social cataloguing websites, and e-Commerce applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号