首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user’s queries by additional context. We used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents. Then, we applied a query expansion approach based on the query context and the Latent Semantic Analyses method. Using a web test collection, we tested our approach on short and long queries. We varied the number of recommended queries and the number of expansion terms to specify the appropriate parameters for the proposed approach. Experimental results show that the proposed approach improves the effectiveness of the information retrieval system by 19.23 % for short queries and 52.94 % for long queries according to the retrieval results using the original users’ queries.  相似文献   

2.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.  相似文献   

3.
查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。  相似文献   

4.
识别搜索引擎用户的查询意图在信息检索领域是备受关注的研究内容。文中提出一种融合多类特征识别Web查询意图的方法。将Web查询意图识别作为一个分类问题,并从不同类型的资源包括查询文本、搜索引擎返回内容及Web查询日志中抽取出有效的分类特征。在人工标注的真实Web查询语料上采用文中方法进行查询意图识别实验,实验结果显示文中采用的各类特征对于提高查询意图识别的效果皆有一定帮助,综合使用这些特征进行查询意图识别,88。5%的测试查询获得准确的意图识别结果。  相似文献   

5.
6.
Web search engine: Characteristics of user behaviors and their implication   总被引:5,自引:0,他引:5  
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.  相似文献   

7.
Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user's personal information requirement and provide more effective information retrieval. In this paper, we propose a seed URLs selection approach based on user-interest ontology. In order to enrich semantic query, we first intend to apply Formal Concept Analysis to construct user-interest concept lattice with user log profile. By using concept lattice merger, we construct the user-interest ontology which can describe the implicit concepts and relationships between them more appropriately for semantic representation and query match. On the other hand, we make full use of the user-interest ontology for extracting the user interest topic area and expanding user queries to receive the most related pages as seed URLs, which is an entrance of the focused crawler. In particular, we focus on how to refine the user topic area using the bipartite directed graph. The experiment proves that the user-interest ontology can be achieved effectively by merging concept lattices and that our proposed approach can select high quality seed URLs collection and improve the average precision of focused Web crawler.  相似文献   

8.
9.
Recent efforts have enabled applications to query the entire Semantic Web. Such approaches are either based on a centralised store or link traversal and URI dereferencing as often used in the case of Linked Open Data. These approaches make additional assumptions about the structure and/or location of data on the Web and are likely to limit the diversity of resulting usages.In this article we propose a technique called Avalanche, designed for querying the Semantic Web without making any prior assumptions about the data location or distribution, schema-alignment, pertinent statistics, data evolution, and accessibility of servers. Specifically, Avalanche  finds up-to-date answers to queries over SPARQL endpoints. It first gets on-line statistical information about potential data sources and their data distribution. Then, it plans and executes the query in a concurrent and distributed manner trying to quickly provide first answers.We empirically evaluate Avalanche  using the realistic FedBench data-set over 26 servers and investigate its behaviour for varying degrees of instance-level distribution “messiness” using the LUBM synthetic data-set spread over 100 servers. Results show that Avalanche  is robust and stable in spite of varying network latency finding first results for 80% of the queries in under 1 s. It also exhibits stability for some classes of queries when instance-level distribution messiness increases. We also illustrate, how Avalanche  addresses the other sources of messiness (pertinent data statistics, data evolution and data presence) by design and show its robustness by removing endpoints during query execution.  相似文献   

10.
信息检索的效果很大程度上取决于用户能否输入恰当的查询来描述自身信息需求。很多查询通常简短而模糊,甚至包含噪音。查询推荐技术可以帮助用户提炼查询、准确描述信息需求。为了获得高质量的查询推荐,在大规模“查询-链接”二部图上采用随机漫步方法产生候选集合。利用摘要点击信息对候选列表进行重排序,使得体现用户意图的查询排在比较高的位置。最终采用基于学习的算法对推荐查询中可能存在的噪声进行过滤。基于真实用户行为数据的实验表明该方法取得了较好的效果。  相似文献   

11.
Query reformulation, including query recommendation and query auto-completion, is a popular add-on feature of search engines, which provide related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the increasing coverage and bandwidth of mobile networks, a large percentage of search engine queries are issued from mobile devices. This makes it possible to improve the quality of query recommendation and auto-completion by considering the physical locations of the query issuers. However, limited research has been done on location-aware query reformulation for search engines. In this paper, we propose an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history. Based on this, we extend popular query recommendation and auto-completion approaches to our location-aware setting, which suggest query reformulations that are semantically relevant to the original query and give results that are spatially close to the query issuer. In addition, we extend the bookmark coloring algorithm for graph proximity search to support our proposed query recommendation approaches online, and we adapt an A* search algorithm to support our query auto-completion approach. We also propose a spatial partitioning based approximation that accelerates the computation of our proposed spatial proximity. We conduct experiments using a real query log, which show that our proposed approaches significantly outperform previous work in terms of quality, and they can be efficiently applied online.  相似文献   

12.
This paper presents research carried out toward the improvement of current virtual environments from an intelligent systems approach. A novel architecture to solve vague queries that allows users to find objects and scenes in virtual environments is described. As a base, a new virtual worlds representation model and an associated fuzzy querying approach are used. The new representation model adds a semantic level to the usual models, providing more suitable environments for the interaction with users. The query solver is able to work with queries expressing the vagueness inherent to human conceptualization of visual perception (for example, tall tree, a park with many tall trees, or a park bench near approximately five tall trees). The system has been developed and evaluated with user experiments, where comparison with navigation and keyword-based query approaches have been realized. The results of this study show that the proposed architecture is more powerful and intuitive for finding the targets.  相似文献   

13.
Keyword‐based search engines such as Google? index Web pages for human consumption. Sophisticated as such engines have become, surveys indicate almost 25% of Web searchers are unable to find useful results in the first set of URLs returned (Technology Review, March 2004). The lack of machine‐interpretable information on the Web limits software agents from matching human searches to desirable results. Tim Berners‐Lee, inventor of the Web, has architected the Semantic Web in which machine‐interpretable information provides an automated means to traversing the Web. A necessary cornerstone application is the search engine capable of bringing the Semantic Web together into a searchable landscape. We implemented a Semantic Web Search Engine (SWSE) that performs semantic search, providing predictable and accurate results to queries. To compare keyword search to semantic search, we constructed the Google CruciVerbalist (GCV), which solves crossword puzzles by reformulating clues into Google queries processed via the Google API. Candidate answers are extracted from query results. Integrating GCV with SWSE, we quantitatively show how semantic search improves upon keyword search. Mimicking the human brain's ability to create and traverse relationships between facts, our techniques enable Web applications to ‘think’ using semantic reasoning, opening the door to intelligent search applications that utilize the Semantic Web. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

14.
《Knowledge》2007,20(1):1-16
In this paper, we discuss a formal system for representing and analyzing real world events. The event representation discussed in this paper accounts for three important event attributes, namely, time, space, and label. We introduce the notion of sequence templates that appears natural for capturing event related semantics; and in semantically analyzing user queries. To harness this potential, we present a formal structure to represent the queries related to real world events as well as an approach to semantically analyze a user query, and collate event related information to be dispatched to the user. Finally, we discuss the design and implementation of the Query-Event Analysis System (QEAS), which is an integrated system to (a) identify a best-matching sequence template(s) given a user query; (b) derive the meta-events based on the selected sequence templates; and (c) and use the meta-event information to answer the user query.  相似文献   

15.
One key property of the Semantic Web is its support for interoperability. Recent research in this area focuses on the integration of multiple data sources to facilitate tasks such as ontology learning, user query expansion and context recognition. The growing popularity of such machups and the rising number of Web APIs supporting links between heterogeneous data providers asks for intelligent methods to spare remote resources and minimize delays imposed by queries to external data sources. This paper suggests a cost and utility model for optimizing such queries by leveraging optimal stopping theory from business economics: applications are modeled as decision makers that look for optimal answer sets. Queries to remote resources cause additional cost but retrieve valuable information which improves the estimation of the answer set’s utility. Optimal stopping optimizes the trade-off between query cost and answer utility yielding optimal query strategies for remote resources. These strategies are compared to conventional approaches in an extensive evaluation based on real world response times taken from seven popular Web services.  相似文献   

16.
断接下查询的缓存处理   总被引:5,自引:0,他引:5  
吴婷婷  章文嵩  周兴铭 《计算机学报》2003,26(10):1393-1399
移动环境下,由于无线网络可靠性低、费用高,移动主机本身受电源、资源等方面的限制,移动主机经常会主动或被动地处于断接,即没有网络连接的状态.为了提高断接时移动客户对数据的访问能力,有效利用移动缓存,该文提出断接下基于语义缓存的查询处理QPID算法.该算法的主要思路是先找出缓存中与当前查询相关的缓存项,再通过对相关项数据的进一步处理获得缓存中满足查询的结果.试验表明,基于QPID算法的查询处理可以更好地满足断接下客户的查询请求.  相似文献   

17.
18.
Previous work on temporal and historical databases has been mainly based on the assumption that the time intervals of temporal attributes and the start/finish points of modeled events are precisely known. In many real life situations, however, the time boundary of events and the duration of entity relationships may not be exactly known. Modeling these situations and providing a way to write queries dealing with time impreciseness represents a useful extension currently lacking in existing/proposed temporal database systems. In this paper, we discuss the problem of handling time impreciseness in temporal databases and present three models for the representation of imprecise time intervals. We illustrate the basic idea and motivation of each model, its underlying logic, tradeoffs, and important properties. We also propose query language extensions that can enrich the user interface with capabilities to formulate queries dealing with time impreciseness. Extensions to existing query constructs at both the transaction level and the operator level are presented. New operators related to time impreciseness are also presented. The models and extensions discussed in this paper enrich the flexibility of temporal databases and can be used to help users obtain more meaningful replies for their temporal queries.  相似文献   

19.
The general public is increasingly using search engines to seek information on risks and threats. Based on a search log from a large search engine, spanning three months, this study explores user patterns of query submission and subsequent clicks in sessions, for two important risk related topics, healthcare and information security, and compares them to other randomly sampled sessions. We investigate two session-level metrics reflecting users' interactivity with a search engine: session length and query click rate. Drawing from information foraging theory, we find that session length can be characterized well by the Inverse Gaussian distribution. Among three types of sessions on different topics (healthcare, information security, and other randomly sampled sessions), we find that healthcare sessions have the most queries and the highest query click rate, and information security sessions have the lowest query click rate. In addition, sessions initiated by the users with greater search engine activity level tend to have more queries and higher query click rates. Among three types of sessions, search engine activity level shows the strongest effect on query click rate for information security sessions and weakest for healthcare sessions. We discuss theoretical and practical implications of the study.  相似文献   

20.
Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational.Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation.Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347% (accuracy), and outperforming a baseline by up to 6.17%. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号