首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
针对搜索引擎查询结果缓存与预取问题,该文提出了一种基于查询特性的搜索引擎查询结果缓存与预取方法,该方法包括用来指导预取的查询结果页码预测模型和缓存与预取算法框架,用于提高搜索引擎系统性能。通过对国内某著名中文商业搜索引擎的某段时间的用户查询日志分析得出,用户对不同查询返回的查询结果所浏览的页数具有显著的非均衡性,结合该特性设计查询结果页码预测模型来进行预取和分区缓存。在该搜索引擎两个月的大规模真实用户查询日志上的实验结果表明,与传统的方法相比,该方法可以获得3.5%~8.45%的缓存命中率提升。  相似文献   

2.
正确识别搜索引擎日志中的短语,对搜索引擎用短语词典构建和提高搜索引擎性能具有重要的作用。该文提出一种应用条件随机场实现对搜狗日志语料中“N+V”和“N1+N2+V”型短语自动识别的方法。模型的特征集包含词、词性和词语长度。由人工设计候选特征集,从中选择有效的特征构成特征模板,训练生成用于短语自动识别的条件随机场模型。封闭测试和开放测试的实验结果表明,模型能够实现对这两种短语的有效识别。  相似文献   

3.
互联网上很多资源蕴含人类群体智慧.分类网站目录人工地对网站按照主题进行组织.基于网站目录中具有主题标注的URL设计URL主题分类器,结合伪相关反馈技术以及搜索引擎查询日志,提出了自动、快速、有效的查询主题分类方法.具体地,方法为2种策略的结合.策略1通过计算搜索结果中URL的主题分布预测查询主题,策略2基于查询日志点击关系,利用具有主题标注的URL,对查询进行标注获取数据并训练统计分类器预测查询主题.实验表明,方法可获得比当前最好算法更好的准确率,更好的在线处理效率并且可基于查询日志自动获取训练数据,具有良好的可扩展性.  相似文献   

4.
Caching query results is one efficient approach to improving the performance of XML management systems. This entails the discovery of frequent XML queries issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine the frequent query patterns over the query stream. To facilitate the one-pass mining process, we devise a novel data structure called DTS to summarize the pattern trees seen so far. By grouping the incoming pattern trees into batches, we can dynamically mark the active portion of the current batch in DTS and limit the enumeration of candidate trees to only the currently active pattern trees. We also design another summary data structure called ECTree that provides for the incremental computation of the frequent tree patterns over the query stream. Based on the above two constructs, we present two mining algorithms called XQSMinerI and XQSMinerII. XQSMinerI is fast, but it tends to overestimate, while XQSMinerII adopts a filter-and-refine approach to minimize the amount of overestimation. Experimental results show that the proposed methods are both efficient and scalable and require only small memory footprints.Received: 17 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein.  相似文献   

5.
针对搜索引擎查询结果缓存与预取问题,与传统的基于查询特性相关的方法不同,提出了一种基于用户特性的缓存与预取方法,用于提高搜索引擎系统性能,尤其针对部分用户效果更显著。通过对国内某著名商业搜索引擎用户的查询贡献分析得出,用户对搜索引擎的贡献具有长尾分布特性,结合该特性设计查询结果预测模型来进行预取和分区缓存。在该搜索引擎两个月的大规模真实用户查询日志上的实验结果表明,与传统的基于查询特性的典型方法相比,该方法可以获得3.03%~4.17%的命中率提升,对于查询贡献最大的0.25%的用户群体,可以获得20.52%~28.2%的命中率提升。  相似文献   

6.
汪晴  庄卫华 《计算机工程》2010,36(21):78-80
基于TF-IQF模型的建议方法不考虑用户查询行为的上下文,在满足用户个性化需求方面存在缺陷。针对这一情况,在该方法的基础上进行优化改进,根据不同用户的查询上下文来分析用户的查询偏好,重新排序系统推荐的查询。实验结果表明,改进方法能够给出个性化的查询建议,提高用户查询的满意度。  相似文献   

7.
The conventional approaches of finding related search engine queries rely on the common terms shared by two queries to measure their relatedness. However, search engine queries are usually short and the term overlap between two queries is very small. Using query terms as a feature space cannot accurately estimate relatedness. Alternative feature spaces are needed to enrich the term based search queries. In this paper, given a search query, first we extract the Web pages accessed by users from Japanese Web access logs which store the users individual and collective behavior. From these accessed Web pages we usually can get two kinds of feature spaces, i.e, content-sensitive (e.g., nouns) and content-ignorant (e.g., URLs), to enrich the expressions of search queries. Then, the relatedness between search queries can be estimated on their enriched expressions. Our experimental results show that the URL feature space produces much lower precision scores than the noun feature space which, however, is not applicable in non-text pages, dynamic pages and so on. It is crucial to improve the quality of the URL (content-ignorant) feature space since it is generally available in all types of Web pages. We propose a novel content-ignorant feature space, called Web community which is created from a Japanese Web page archive by exploiting link analysis. Experimental results show that the proposed Web community feature space generates much better results than the URL feature space.  相似文献   

8.
近年来,随着网络数据挖掘技术的迅猛发展,如何从搜索引擎查询日志中找到有用的信息成为一个重要的研究方向.首先详细讨论了Beeferman提出的针对搜索引擎查询日志的凝聚式聚类算法以及噪声数据对该算法的影响,指出了Chan的改进算法中的一个错误,最后提出一个新的改进算法,并且通过模拟实验对几种不同的算法进行了对比.  相似文献   

9.
One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time.  相似文献   

10.
王兵  ;刘彩虹 《微机发展》2008,(7):176-180
随着Internet信息的迅速增长,许多Web信息已经被各种各样的可搜索在线数据库所深化,并被隐藏在Web查询接口下面。传统的搜索引擎由于技术原因不能索引这些信息——DeepWeb信息。由于DeepWeb惟一“入口点”是查询接口,为使查询接口自动产生有意义有查询,给出了DeepWeb信息集成系统框架,提出了基于数据类型的搜索驱动的用户查询转换方法,基于此设计并实现了一个针对中文DeepWeb信息集成原型系统。通过在实际DeepWeb站点上的实验证明了此方法是非常有效的。  相似文献   

11.
随着互联网海量信息的不断涌现,根据用户的兴趣提供相关查询结果,是现有搜索引擎要考虑的一个问题,PageRank算法是基于链接的排序算法,已在Google搜索引擎广泛应用,但其忽略了用户个性化需求。采用网页预分类技术,来表示用户查询的兴趣度,进一步提出改进传统的PageRank算法,从而能适当提高用户在使用搜索引擎方面的个性化需求。  相似文献   

12.
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.  相似文献   

13.
识别搜索引擎用户的查询意图在信息检索领域是备受关注的研究内容。文中提出一种融合多类特征识别Web查询意图的方法。将Web查询意图识别作为一个分类问题,并从不同类型的资源包括查询文本、搜索引擎返回内容及Web查询日志中抽取出有效的分类特征。在人工标注的真实Web查询语料上采用文中方法进行查询意图识别实验,实验结果显示文中采用的各类特征对于提高查询意图识别的效果皆有一定帮助,综合使用这些特征进行查询意图识别,88。5%的测试查询获得准确的意图识别结果。  相似文献   

14.
查询建议可以有效减少用户输入、消除查询歧义,提高信息检索的便捷性和准确率。随着电子商务的发展,查询建议也越来越多地应用于电子商务网站的商品搜索中。然而,传统的基于Web搜索的查询建议方法在电商领域并不能完全适用。针对电商这一特定领域,对不同的查询建议技术进行比较,提出了一种综合考虑用户的搜索以及购物行为的查询建议方法,运用MapReduce技术对用户日志进行挖掘,以此生成检索词词库;并通过在线计算与离线计算结合的方法,为用户提供实时查询建议。实验结果表明,本文提出的基于日志挖掘的电商查询建议方法能有效提高查询建议的准确率,并且具有良好的处理性能。  相似文献   

15.
查询扩展可以有效地消除查询歧义,提高信息检索的准确率和召回率.通过挖掘用户日志中查询词和相关文档的连接关系,构造关联查询,并在此基础上提出一种从关联查询中提取查询扩展词的查询扩展方法.同时,还提出一种查询歧义的判别方法,该方法可以对查询词所表达的检索意图的模糊程度进行有效度量,也可以对查询词的检索性能进行预先估计.通过对查询歧义的度量来动态调整扩展词的长度,提高查询扩展模型的灵活性和适应能力.  相似文献   

16.
17.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.  相似文献   

18.
王继民  龚笔宏  孟涛 《计算机工程》2006,32(14):25-26,6
用户在使用Web搜索引擎进行信息查询时,可能包含单个或多个主题。该文针对大规模中文搜索引擎系统——北大天网的多任务Web查询,进行了研究和分析。结果显示:多于1/3的用户进行多任务Web查询;超过1/2的多任务会话包含2个不同的主题并进行2~7次查询;多任务会话时间的均值是一般会话时间均值的2倍;天网用户的多任务查询主要有3个主题:计算机,娱乐和教育;近1/4的多任务会话中包含不确定的信息。该文用关联分析的方法发现了用户查询主题之间的一些关系。  相似文献   

19.
Although personalized search has been under way for many years and many personalization algorithms have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users and under different search contexts. In this paper, we study this problem and provide some findings. We present a large-scale evaluation framework for personalized search based on query logs and then evaluate five personalized search algorithms (including two click-based ones and three topical-interest-based ones) using 12-day query logs of Windows Live Search. By analyzing the results, we reveal that personalized Web search does not work equally well under various situations. It represents a significant improvement over generic Web search for some queries, while it has little effect and even harms query performance under some situations. We propose click entropy as a simple measurement on whether a query should be personalized. We further propose several features to automatically predict when a query will benefit from a specific personalization algorithm. Experimental results show that using a personalization algorithm for queries selected by our prediction model is better than using it simply for all queries.  相似文献   

20.
Mining of music data is one of the most important problems in multimedia data mining. In this paper, two research issues of mining music data, i.e., online mining of music query streams and change detection of music query streams, are discussed. First, we proposed an efficient online algorithm, FTP-stream (Frequent Temporal Pattern mining of streams), to mine all frequent melody structures over sliding windows of music melody sequence streams. An effective bit-sequence representation is used in the proposed algorithm to reduce the time and memory needed to slide the windows. An effective list structure is developed in the FTP-stream algorithm to overcome the performance bottleneck of 2-candidate generation. Experiments show that the proposed algorithm FTP-stream only needs a half of memory requirement of original melody sequence data, and just scans the music query stream once. After mining frequent melody structures, we developed a simple online algorithm, MQS-change (changes of Music Query Streams), to detect the changes of frequent melody structures in current user-centered music query streams. Two music melody structures (set of chord-sets and string of chord-sets) are maintained and four melody structure changes (positive burst, negative burst, increasing change and decreasing change) are monitored in a new summary data structure, MSC-list (a list of Music Structure Changes). Experiments show that the MQS-change algorithm is an effective online method to detect the changes of music melody structures over continuous music query streams.
Hua-Fu LiEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号