首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Query suggestions help users refine their queries after they input an initial query.Previous work on query suggestion has mainly concentrated on approaches that are similarity-based or context-based,developing models that either focus on adapting to a specific user(personalization)or on diversifying query aspects in order to maximize the probability of the user being satisfied(diversification).We consider the task of generating query suggestions that are both personalized and diversified.We propose a personalized query suggestion diversification(PQSD)model,where a user's long-term search behavior is injected into a basic greedy query suggestion diversification model that considers a user's search context in their current session.Query aspects are identified through clicked documents based on the open directory project(ODP)with a latent dirichlet allocation(LDA)topic model.We quantify the improvement of our proposed PQSD model against a state-of-the-art baseline using the public america online(AOL)query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification.The experimental results show that PQSD achieves its best performance when only queries with clicked documents are taken as search context rather than all queries,especially when more query suggestions are returned in the list.  相似文献   

2.
One of the major challenges in Web search pertains to the correct interpretation of users’ intent. Query Expansion is one of the well-known approaches for determining the intent of the user by addressing the vocabulary mismatch problem. A limitation of the current query expansion approaches is that the relations between the query terms and the expanded terms is limited. In this paper, we capture users’ intent through query expansion. We build on earlier work in the area by adopting a pseudo-relevance feedback approach; however, we advance the state of the art by proposing an approach for feature learning within the process of query expansion. In our work, we specifically consider the Wikipedia corpus as the feedback collection space and identify the best features within this context for term selection in two supervised and unsupervised models. We compare our work with state of the art query expansion techniques, the results of which show promising robustness and improved precision.  相似文献   

3.
在微博系统中,寻找高质量微博用户进行关注是获取高质量信息的前提。该文研究高质量微博用户发现问题,即给定领域词查询,系统根据用户质量返回相关用户排序列表。将该问题分解成两个子问题: 一是领域相关用户的检索问题,二是微博用户排序问题。针对用户检索问题,提出了基于用户标签的用户表示方法以及基于维基百科的查询—用户相似度匹配方法,该方法作为ESA(explicit semantic analysis)的一个扩展应用,结果具有良好的可解释性,实验表明基于维基百科的效果要优于基于其他资源的检索效果。针对用户排序问题,提出了基于图的迭代排序方法UBRank,在计算用户质量时同时考虑用户发布消息的数量和消息的权威度,并且只选择含URL的消息来构建图,实验验证了该方法的高效性和优越性。  相似文献   

4.
Query expansion is an information retrieval technique in which new query terms are selected to improve search performance. Although useful terms can be extracted from documents whose relevance is already known, it is difficult to get enough of such feedback from a user in actual use. We propose a query expansion method that performs well even if a user makes practically minimum effort, that is, chooses only a single relevant document. To improve searches in these conditions, we made two refinements to a well-known query expansion method. One uses transductive learning to obtain pseudorelevant documents, thereby increasing the total number of source documents from which expansion terms can be extracted. The other is a modified parameter estimation method that aggregates the predictions of multiple learning trials to sort candidate terms for expansion by importance. Experimental results show that our method outperforms traditional methods and is comparable to a state-of-the-art method.  相似文献   

5.
查询扩展是一种改善信息检索召回率的重要技术。该文根据维基百科和搜索引擎各自的优点来实现查询词的扩展,试图提高检索结果top N的准确率。由于维基百科篇章中存在着大量的超链接,这些超链接中包含着与主题紧密相关的词条,通过提取这些词条,来实现基于维基百科的扩展。实验基于搜索引擎伪相关反馈的查询扩展作为baseline,分别对单语扩展系统和中英文跨语言扩展系统进行检测。实验结果表明本文的方法相比baseline系统,单语系统中MAP值提高6.41%,跨语言系统中Top10-precision值提高10.90%。  相似文献   

6.
Query expansion by mining user logs   总被引:9,自引:0,他引:9  
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to previous query expansion methods, ours takes advantage of the user judgments implied in user logs. The experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods.  相似文献   

7.
We present Wiser, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking.Wiser indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author’s publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author’s expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia.At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author’s documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author’s expertise and the query topic via the above graph-based profiles.The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that Wiser achieves better performance than all the other competitors, thus proving the effectiveness of modeling author’s profile via our “semantic” graph of entities. Finally, we comment on the use of Wiser for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University.  相似文献   

8.
随着大规模知识图谱的出现以及企业高效管理领域知识图谱的需求,知识图谱中的自组织实体检索成为研究热点。给定知识图谱以及用户查询,实体检索的目标在于从给定的知识图谱中返回实体的排序列表。从匹配的角度来看,传统的实体检索模型大都将用户查询和实体统一映射到词的特征空间。这样做具有明显的缺点,例如,将同属于一个实体的两个词视为独立的。为此,该文提出将用户查询和实体同时映射到实体与词两个特征空间方法,称为双特征空间的排序学习。首先将实体抽象成若干个域。之后从词空间和实体空间两个维度分别抽取排序特征,最终应用于排序学习算法中。实验结果表明,在标准数据集上,双特征空间的实体排序学习模型性能显著优于当前先进的实体检索模型。  相似文献   

9.
10.
查询歧义作为查询分类的子问题在信息检索领域已经得到了很多的关注,现有的研究主要是对查询内容上的歧义进行分类,而忽略了用户查询需求形式上的歧义。该文针对查询需求歧义问题进行了研究,提出了相应的查询需求分类模型。该文利用网页目录构建用户需求形式分类体系及站点列表,在大规模商业搜索引擎日志上进行用户点击覆盖检测,从而得到对查询需求形式的描述。该文的贡献在于提供了一种实际可行的查询需求分类方法,搜索引擎可以根据用户需求的区别调整排序方式,从而改善搜索性能。  相似文献   

11.
查询扩展作为一门重要的信息检索技术,是以用户查询为基础,通过一定策略在原始查询中加入一些相关的扩展词,从而使得查询能够更加准确地描述用户信息需求。排序学习方法利用机器学习的知识构造排序模型对数据进行排序,是当前机器学习与信息检索交叉领域的研究热点。该文尝试利用伪相关反馈技术,在查询扩展中引入排序学习算法,从文档集合中提取与扩展词相关的特征,训练针对于扩展词的排序模型,并利用排序模型对新查询的扩展词集合进行重新排序,将排序后的扩展词根据排序得分赋予相应的权重,加入到原始查询中进行二次检索,从而提高信息检索的准确率。在TREC数据集合上的实验结果表明,引入排序学习算法有助于提高伪相关反馈的检索性能。  相似文献   

12.
Identifying and interpreting user intent are fundamental to semantic search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving the result quality, and in this work we focus on intent word discovery and understanding. Our approach towards intent word detection is motivated by the hypotheses that query intent words satisfy certain distributional properties in large query logs similar to function words in natural language corpora. Following this idea, we first prove the effectiveness of our corpus distributional features, namely, word co-occurrence counts and entropies, towards function word detection for five natural languages. Next, we show that reliable detection of intent words in queries is possible using these same features computed from query logs. To make the distinction between content and intent words more tangible, we additionally provide operational definitions of content and intent words as those words that should match, and those that need not match, respectively, in the text of relevant documents. In addition to a standard evaluation against human annotations, we also provide an alternative validation of our ideas using clickthrough data. Concordance of the two orthogonal evaluation approaches provide further support to our original hypothesis of the existence of two distinct word classes in search queries. Finally, we provide a taxonomy of intent words derived through rigorous manual analysis of large query logs.  相似文献   

13.
Keyword queries have long been popular to search engines and to the information retrieval community and have recently gained momentum for its usage in the expert systems community. The conventional semantics for processing a user query is to find a set of top-k web pages such that each page contains all user keywords. Recently, this semantics has been extended to find a set of cohesively interconnected pages, each of which contains one of the query keywords scattered across these pages. The keyword query having the extended semantics (i.e., more than a list of keywords hyperlinked with each other) is referred to the graph query. In case of the graph query, all the query keywords may not be present on a single Web page. Thus, a set of Web pages with the corresponding hyperlinks need to be presented as the search result. The existing search systems reveal serious performance problem due to their failure to integrate information from multiple connected resources so that an efficient algorithm for keyword query over graph-structured data is proposed. It integrates information from multiple connected nodes of the graph and generates result trees with the occurrence of all the query keywords. We also investigate a ranking measure called graph ranking score (GRS) to evaluate the relevant graph results so that the score can generate a scalar value for keywords as well as for the topology.  相似文献   

14.
Since engineering design is heavily informational, engineers want to retrieve existing engineering documents accurately during the product development process. However, engineers have difficulties searching for documents because of low retrieval accuracy. One of the reasons for this is the limitation of existing document ranking approaches, in which relationships between terms in documents are not considered to assess the relevance of the retrieved documents. Therefore, we propose a new ranking approach that provides more correct evaluation of document relevance to a given query. Our approach exploits domain ontology to consider relationships among terms in the relevance scoring process. Based on domain ontology, the semantics of a document are represented by a graph (called Document Semantic Network) and, then, proposed relation-based weighting schemes are used to evaluate the graph to calculate the document relevance score. In our ranking approach, user interests and searching intent are also considered in order to provide personalized services. The experimental results show that the proposed approach outperforms existing ranking approaches. A precisely represented semantics of a document as a graph and multiple relation-based weighting schemes are important factors underlying the notable improvement.  相似文献   

15.
林子雨  邹权  赖永炫  林琛 《软件学报》2014,25(3):528-546
关键词查询可以帮助用户从数据库中快速获取感兴趣的内容,它不需要用户掌握专业的数据库结构化查询语言,降低了使用门槛.针对基于关键词的数据库查询,基于数据图的方法是一种比较常见的方法,它把数据库转换成数据图,然后从数据图中计算最小Steiner树.但是,已有的方法无法根据不断变化的用户查询兴趣而动态优化查询结果.提出采用蚁群优化算法解决数据库中的关键词查询问题,并提出了基于概念漂移理论的用户查询兴趣突变探查方法,可以及时发现用户兴趣的突变.在此基础上,提出了基于概念漂移理论和蚁群优化算法的查询结果动态优化算法ACOKS*,可以根据突变的用户兴趣,动态地优化查询结果,使其更加符合用户查询预期.在原型系统上得到的大量实验结果表明,该方法具有很好的可扩展性,并且可以比已有的方法取得更好的性能.  相似文献   

16.
Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

17.
针对传统搜索引擎“面向检索”而非“面向用户”的缺点, 将个性化服务思想引入到企业搜索引擎排序中, 对其关键技术即用户兴趣建模进行了研究, 将模型用于查询扩展及排序中, 并为企业搜索引擎设计基于用户兴趣的个性化排序方法, 能为不同用户的同一检索请求提供不同的检索结果列表. 通过将研究用于油田企业搜索引擎的实验证明, 本研究能有效地提高企业搜索引擎检索精确度及满足用户的个性化检索需求, 并具有较好的自适应能力.  相似文献   

18.
针对当前主流web搜索引擎存在信息检索个性化效果差和信息检索的精确率低等缺点, 通过对已有方法的技术改进, 介绍了一种基于用户历史兴趣网页和历史查询词相结合的个性化查询扩展方法。当用户在搜索引擎上输入查询词时,能根据学习到的当前用户兴趣模型动态判定用户潜在兴趣和计算词间相关度,并将恰当的扩展查询词组提交给搜索引擎,从而实现不同用户输入同一查询词能返回不同检索结果的目的。实验验证了算法的有效性,检索精确率也比原方法有明显提高。  相似文献   

19.
Search engine users often encounter the difficulty of phrasing the precise query that could lead to satisfactory search results. Query recommendation is considered an effective assistant in enhancing keyword-based queries in search engines and Web search software. In this paper, we present a Query-URL Bipartite based query reCommendation approach, called QUBiC. It utilizes the connectivity of a query-URL bipartite graph to recommend related queries and can significantly improve the accuracy and effectiveness of personalized query recommendation systems comparing with the conventional pairwise similarity based approach. The main contribution of the QUBiC approach is its three-phase framework for personalized query recommendations. The first phase is the preparation of queries and their search results returned by a search engine, which generates a historical query-URL bipartite collection. The second phase is the discovery of similar queries by extracting a query affinity graph from the bipartite graph, instead of operating on the original bipartite graph directly using biclique-based approach or graph clustering. The query affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (dissimilarity) measure. The third phase is the ranking of similar queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering (HAC). By utilizing the query affinity graph and the HAC-based ranking, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. Furthermore, the flexibility of the HAC strategy makes it possible for users to interactively participate in the query recommendation process, and helps to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus adaptively recommending related queries on demand. Our experimental evaluation results show that the QUBiC approach is highly efficient and more effective compared to the conventional query recommendation systems, yielding about 13.3 % as the most improvement in terms of precision.  相似文献   

20.
基于逐点互信息的查询结构分析   总被引:1,自引:0,他引:1  
Web搜索引擎中,对用户查询结构的有效分析,能更好地理解用户的查询意图,促进检索效果的提升。该文提出了一种简单高效的基于逐点互信息的查询结构分析方法,该方法包含了基于MapReduce的离线训练算法,以及一种自下向上的在线查询树构建算法。实验显示,该方法具有很高的切分速度,并能取得不错的可比较的切分效果。进一步的,该方法对检索性能的提升,也有明显的促进作用,在MAP,p@5,p@10评价指标上,都取得了不错的性能提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号