首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Search engine users often encounter the difficulty of phrasing the precise query that could lead to satisfactory search results. Query recommendation is considered an effective assistant in enhancing keyword-based queries in search engines and Web search software. In this paper, we present a Query-URL Bipartite based query reCommendation approach, called QUBiC. It utilizes the connectivity of a query-URL bipartite graph to recommend related queries and can significantly improve the accuracy and effectiveness of personalized query recommendation systems comparing with the conventional pairwise similarity based approach. The main contribution of the QUBiC approach is its three-phase framework for personalized query recommendations. The first phase is the preparation of queries and their search results returned by a search engine, which generates a historical query-URL bipartite collection. The second phase is the discovery of similar queries by extracting a query affinity graph from the bipartite graph, instead of operating on the original bipartite graph directly using biclique-based approach or graph clustering. The query affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (dissimilarity) measure. The third phase is the ranking of similar queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering (HAC). By utilizing the query affinity graph and the HAC-based ranking, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. Furthermore, the flexibility of the HAC strategy makes it possible for users to interactively participate in the query recommendation process, and helps to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus adaptively recommending related queries on demand. Our experimental evaluation results show that the QUBiC approach is highly efficient and more effective compared to the conventional query recommendation systems, yielding about 13.3 % as the most improvement in terms of precision.  相似文献   

2.
Query reformulation, including query recommendation and query auto-completion, is a popular add-on feature of search engines, which provide related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the increasing coverage and bandwidth of mobile networks, a large percentage of search engine queries are issued from mobile devices. This makes it possible to improve the quality of query recommendation and auto-completion by considering the physical locations of the query issuers. However, limited research has been done on location-aware query reformulation for search engines. In this paper, we propose an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history. Based on this, we extend popular query recommendation and auto-completion approaches to our location-aware setting, which suggest query reformulations that are semantically relevant to the original query and give results that are spatially close to the query issuer. In addition, we extend the bookmark coloring algorithm for graph proximity search to support our proposed query recommendation approaches online, and we adapt an A* search algorithm to support our query auto-completion approach. We also propose a spatial partitioning based approximation that accelerates the computation of our proposed spatial proximity. We conduct experiments using a real query log, which show that our proposed approaches significantly outperform previous work in terms of quality, and they can be efficiently applied online.  相似文献   

3.
In this paper a novel approach is proposed for generating the optimal ranked clicked URLs using genetic algorithm (GA) based on clustered web query sessions for effective personalized web search. Experimental study was conducted on the data set of web query sessions captured in the domains academics, entertainment and sports to test the effectiveness of clusterwise optimal ranked clicked URLs for personalized web search (PWS). The results, which are verified statistically shows an improvement in the average precision of the personalized web search based on optimal ranked clicked URLs over both Classic IR and personalized web search without optimal ranked clicked URLs. Thus the effectiveness of personalized web search using optimal ranked clicked URLs is confirmed for better customizing the web search according to the information need of the user.  相似文献   

4.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.  相似文献   

5.
Query recommendation helps users to describe their information needs more clearly so that search engines can return appropriate answers and meet their needs. State-of-the-art researches prove that the use of users’ behavior information helps to improve query recommendation performance. Instead of finding the most similar terms previous users queried, we focus on how to detect users’ actual information need based on their search behaviors. The key idea of this paper is that although the clicked documents are not always relevant to users’ queries, the snippets which lead them to the click most probably meet their information needs. Based on analysis into large-scale practical search behavior log data, two snippet click behavior models are constructed and corresponding query recommendation algorithms are proposed. Experimental results based on two widely-used commercial search engines’ click-through data prove that the proposed algorithms outperform practical recommendation methods of these two search engines. To the best of our knowledge, this is the first time that snippet click models are proposed for query recommendation task.  相似文献   

6.
互联网上很多资源蕴含人类群体智慧.分类网站目录人工地对网站按照主题进行组织.基于网站目录中具有主题标注的URL设计URL主题分类器,结合伪相关反馈技术以及搜索引擎查询日志,提出了自动、快速、有效的查询主题分类方法.具体地,方法为2种策略的结合.策略1通过计算搜索结果中URL的主题分布预测查询主题,策略2基于查询日志点击关系,利用具有主题标注的URL,对查询进行标注获取数据并训练统计分类器预测查询主题.实验表明,方法可获得比当前最好算法更好的准确率,更好的在线处理效率并且可基于查询日志自动获取训练数据,具有良好的可扩展性.  相似文献   

7.
信息检索的效果很大程度上取决于用户能否输入恰当的查询来描述自身信息需求。很多查询通常简短而模糊,甚至包含噪音。查询推荐技术可以帮助用户提炼查询、准确描述信息需求。为了获得高质量的查询推荐,在大规模“查询-链接”二部图上采用随机漫步方法产生候选集合。利用摘要点击信息对候选列表进行重排序,使得体现用户意图的查询排在比较高的位置。最终采用基于学习的算法对推荐查询中可能存在的噪声进行过滤。基于真实用户行为数据的实验表明该方法取得了较好的效果。  相似文献   

8.
基于用户搜索行为的query-doc关联挖掘   总被引:1,自引:0,他引:1  
朱亮  陆静雅  左万利 《自动化学报》2014,40(8):1654-1666
query和doc之间的关联关系是搜索引擎期望获取的一类有价值的信息. query和doc间准确的关联分析不仅可以帮助搜索结果排序,也在query和doc之间的桥接中起到重要作用,以实现相关query和doc之间的信息传递,有利于更深入的query理解和doc理解,并在此基础上开展相关应用.本文提出了一种基于用户搜索行为的query和doc关联关系挖掘算法,该方法首先对用户搜索点击日志中的数据进行整理与分析,构建query与doc间的二部图,再通过采用马尔可夫随机游走模型对二部图数据进行建模,挖掘二部图中的点击数据和session数据,最终挖掘出点击日志中用户没有点击到的doc数据,从而预测出query和doc间的隐含关联关系,同时也可以利用该算法得到query和query潜在的关联关系.基于以上理论基础,我们实现了一套完整的日志挖掘系统,通过大量的实验对比,该系统在各方面均取得了优异的表现,其中对检索结果相关性的性能提升可以达到71.23%,这充分表明,本文所提出的理论和算法能够很好地解决query和doc之间的隐含关系挖掘问题,为提高搜索结果的召回率、实现查询推荐和检索结果聚类奠定了良好的前提基础.  相似文献   

9.
In order to understand user intents behind their queries, many researchers study similar query finding. Recently, the click graph has shown its utility in describing the relationship between queries and URLs. The previous approaches mainly either generate related terms or find relevant queries based on the co-clicked URLs. However, these approaches may suffer from the complexity of natural language processing and click-through data sparseness. In this paper, we tackle this problem through three query probability distribution representation models: Click Model, Term Model, and Semantic Model. The Click Model extracts credible transition probability from queries to URLs, and describes a query without considering web contents. The Term Model focuses on representing a query via term distribution over its main entities and purposes, which can better capture information needs behind short and ambiguous keyword queries. The Semantic Model learns potential intent distribution of queries to distinguish user intents behind a query. Among the three models, we apply pairwise similarity metrics and graph-based personalized pagerank to find similar queries. Compared to traditional representation models, our representation models are verified to be effective and efficient, especially for long tail queries.  相似文献   

10.
Query suggestions help users refine their queries after they input an initial query.Previous work on query suggestion has mainly concentrated on approaches that are similarity-based or context-based,developing models that either focus on adapting to a specific user(personalization)or on diversifying query aspects in order to maximize the probability of the user being satisfied(diversification).We consider the task of generating query suggestions that are both personalized and diversified.We propose a personalized query suggestion diversification(PQSD)model,where a user's long-term search behavior is injected into a basic greedy query suggestion diversification model that considers a user's search context in their current session.Query aspects are identified through clicked documents based on the open directory project(ODP)with a latent dirichlet allocation(LDA)topic model.We quantify the improvement of our proposed PQSD model against a state-of-the-art baseline using the public america online(AOL)query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification.The experimental results show that PQSD achieves its best performance when only queries with clicked documents are taken as search context rather than all queries,especially when more query suggestions are returned in the list.  相似文献   

11.
查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。  相似文献   

12.
13.
Web search engine: Characteristics of user behaviors and their implication   总被引:5,自引:0,他引:5  
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.  相似文献   

14.
Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user's personal information requirement and provide more effective information retrieval. In this paper, we propose a seed URLs selection approach based on user-interest ontology. In order to enrich semantic query, we first intend to apply Formal Concept Analysis to construct user-interest concept lattice with user log profile. By using concept lattice merger, we construct the user-interest ontology which can describe the implicit concepts and relationships between them more appropriately for semantic representation and query match. On the other hand, we make full use of the user-interest ontology for extracting the user interest topic area and expanding user queries to receive the most related pages as seed URLs, which is an entrance of the focused crawler. In particular, we focus on how to refine the user topic area using the bipartite directed graph. The experiment proves that the user-interest ontology can be achieved effectively by merging concept lattices and that our proposed approach can select high quality seed URLs collection and improve the average precision of focused Web crawler.  相似文献   

15.
查询推荐是搜索引擎系统中的一项重要技术,其通过推荐更合适的查询以提高用户的搜索体验。现有方法能够找到直接通过某种属性关联的相似查询,却忽略了具有间接关联的语义相关查询。该文将用户查询及查询间直接联系建模为查询关系图,并在图结构相似度算法SimRank的基础上提出了加权SimRank (简称WSimRank)用于查询推荐。WSimRank综合考虑了查询关系图的全局信息,因而能挖掘出查询间的间接关联和语义关系。然而,WSimRank复杂度太高而难以实用,该文将WSimRank转换为一个状态层次图的遍历和计算过程,进而采用动态规划、剪枝等策略对其进行优化从而可以实际应用。在大规模真实Web搜索日志上的实验表明, WSimRank在各项评价指标上均优于SimRank和传统查询推荐方法,其MAP指标接近0.9。  相似文献   

16.
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.  相似文献   

17.
In Web search, with the aid of related query recommendation, Web users can revise their initial queries in several serial rounds in pursuit of finding needed Web pages. In this paper, we address the Web search problem on aggregating search results of related queries to improve the retrieval quality. Given an initial query and the suggested related queries, our search system concurrently processes their search result lists from an existing search engine and then forms a single list aggregated by all the retrieved lists. We specifically propose a generic rank aggregation framework which consists of three steps. First we build a so-called Win/Loss graph of Web pages according to a competition rule, and then apply the random walk mechanism on the Win/Loss graph. Last we sort these Web pages by their ranks using a PageRank-like rank mechanism. The proposed framework considers not only the number of wins that an item won in competitions, but also the quality of its competitor items in calculating the ranking of Web page items. Experimental results show that our search system can clearly improve the retrieval quality in a parallel manner over the traditional search strategy that serially returns result lists. Moreover, we also provide empirical evidences as to demonstrate how different rank aggregation methods affect the retrieval quality.  相似文献   

18.
Scalability of a recommendation system is an important factor for large e-commerce sites containing millions of products visited by millions of users. Similarity search is the core operation in recommendation systems. In this paper, we explain a framework to alleviate performance bottleneck of similarity search for very large-scale recommendation systems. The framework employs inverted index for real-time similarity search and handles dynamic updates. As the inverted index gets larger, retrieving recommendations become computationally expensive. There are various works devoted to solve this problem, such as clustering and preprocessing to compute recommendations offline. Our solution is based on bipartite graph partitioning for minimizing the affinity between entities in different partitions. Number of operations in similarity search is reduced by executing search within the closest partitions to the query. Parts are balanced, so that computational loads of partitions are almost the same, which is significant for reducing the computational cost. Sequential experiments with several different recommendation approaches and large datasets consisting of millions of users and items validate the scalability of the proposed recommendation framework. Accuracy drops only by a small factor due to partitioning, if any. Even slight improvements in recommendation accuracy are observed in our collaborative filtering experiments.  相似文献   

19.
Searching for relevant code in the local code base is a common activity during software maintenance. However, previous research indicates that 88% of manually composed search queries retrieve no relevant results. One reason that many searches fail is existing search tools’ dependence on string matching algorithms, which cannot find semantically related code. To solve this problem by helping developers compose better queries, researchers have proposed numerous query recommendation techniques, relying on a variety of dictionaries and algorithms. However, few of these techniques are empirically evaluated by usage data from real-world developers. To fill this gap, we designed a multi-recommendation system that relies on the cooperation between several query recommendation techniques. We implemented and deployed this recommendation system within the Sando code search tool and conducted a longitudinal field study. Our study shows that over 34% of all queries were adopted from recommendation; and recommended queries retrieved results 11% more often than manual queries.  相似文献   

20.
Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号