首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Evaluating refined queries in top-k retrieval systems   总被引:2,自引:0,他引:2  
In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-k queries, the answers returned by the system to an user query often do not satisfy the users need right away, either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each "refined" query can be treated as a "starting" query and evaluated from scratch. We explore alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia data set show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan.  相似文献   

2.
An efficient method for privacy preserving location queries   总被引:1,自引:0,他引:1  
Recently, the issue of privacy preserving location queries has attracted much research. However, there are few works focusing on the tradeoff between location privacy preservation and location query information collection. To tackle this kind of tradeoff, we propose the privacy persevering location query (PLQ), an efficient privacy preserving location query processing framework. This framework can enable the location-based query without revealing user location information. The framework can also facilitate location-based service providers to collect some information about the location based query, which is useful in practice. PLQ consists of three key components, namely, the location anonymizer at the client side, the privacy query processor at the server side, and an additional trusted third party connecting the client and server. The location anonymizer blurs the user location into a cloaked area based on a map-hierarchy. The map-hierarchy contains accurate regions that are partitioned according to real landforms. The privacy query processor deals with the requested nearest-neighbor (NN) location based query. A new convex hull of polygon (CHP) algorithm is proposed for nearest-neighbor queries using a polygon cloaked area. The experimental results show that our algorithms can efficiently process location based queries.  相似文献   

3.
A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.  相似文献   

4.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.  相似文献   

5.
当前传统的信息检索技术并不能准确的捕获用户的信息需求,基于本体的方法虽然考虑到语义搜索的复杂性但是却迫使用户使用一个十分难以掌握的查询语法.通过对用户查询习惯和查询短语的分析,我们发现查询短语通常为简单的动宾结构短语.针对化学领域科学效应知识和用户的查询习惯的特点,给出了一种从自然语言查询到本体知识映射的语义检索的方法.  相似文献   

6.
We present a new text-to-image re-ranking approach for improving the relevancy rate in searches. In particular, we focus on the fundamental semantic gap that exists between the low-level visual features of the image and high-level textual queries by dynamically maintaining a connected hierarchy in the form of a concept database. For each textual query, we take the results from popular search engines as an initial retrieval, followed by a semantic analysis to map the textual query to higher level concepts. In order to do this, we design a two-layer scoring system which can identify the relationship between the query and the concepts automatically. We then calculate the image feature vectors and compare them with the classifier for each related concept. An image is relevant only when it is related to the query both semantically and content-wise. The second feature of this work is that we loosen the requirement for query accuracy from the user, which makes it possible to perform well on users’ queries containing less relevant information. Thirdly, the concept database can be dynamically maintained to satisfy the variations in user queries, which eliminates the need for human labor in building a sophisticated initial concept database. We designed our experiment using complex queries (based on five scenarios) to demonstrate how our retrieval results are a significant improvement over those obtained from current state-of-the-art image search engines.  相似文献   

7.
Query expansion by mining user logs   总被引:9,自引:0,他引:9  
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to previous query expansion methods, ours takes advantage of the user judgments implied in user logs. The experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods.  相似文献   

8.
Privacy-Conscious Location-Based Queries in Mobile Environments   总被引:1,自引:0,他引:1  
In location-based services, users with location-aware mobile devices are able to make queries about their surroundings anywhere and at any time. While this ubiquitous computing paradigm brings great convenience for information access, it also raises concerns over potential intrusion into user location privacy. To protect location privacy, one typical approach is to cloak user locations into spatial regions based on user-specified privacy requirements, and to transform location-based queries into region-based queries. In this paper, we identify and address three new issues concerning this location cloaking approach. First, we study the representation of cloaking regions and show that a circular region generally leads to a small result size for region-based queries. Second, we develop a mobility-aware location cloaking technique to resist trace analysis attacks. Two cloaking algorithms, namely MaxAccu_Cloak and MinComm_Cloak, are designed based on different performance objectives. Finally, we develop an efficient polynomial algorithm for evaluating circular-region-based kNN queries. Two query processing modes, namely bulk and progressive, are presented to return query results either all at once or in an incremental manner. Experimental results show that our proposed mobility-aware cloaking algorithms significantly improve the quality of location cloaking in terms of an entropy measure without compromising much on query latency or communication cost. Moreover, the progressive query processing mode achieves a shorter response time than the bulk mode by parallelizing the query evaluation and result transmission.  相似文献   

9.
查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。  相似文献   

10.
One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user’s queries by additional context. We used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents. Then, we applied a query expansion approach based on the query context and the Latent Semantic Analyses method. Using a web test collection, we tested our approach on short and long queries. We varied the number of recommended queries and the number of expansion terms to specify the appropriate parameters for the proposed approach. Experimental results show that the proposed approach improves the effectiveness of the information retrieval system by 19.23 % for short queries and 52.94 % for long queries according to the retrieval results using the original users’ queries.  相似文献   

11.
In distributed DBMSs, one major issue in developing a horizontal fragmentation technique is what criteria to use to guide the fragmentation. The authors propose to use, in addition to typical user queries, particular knowledge about the data itself. Use of this knowledge allows revision of typical user queries into more precise forms. The revised query expressions produce better estimations of user reference clusters to the database than the original query expressions. The estimated user reference clusters form a basis to partition relations horizontally. In the proposed approach, an ordinary many-sorted language is extended to represent the queries and knowledge compatibly. This knowledge is identified in terms of five axiom schemata. An inference procedure is developed to apply the knowledge to the queries deductively  相似文献   

12.
The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon?test.  相似文献   

13.
Novice users often do not have enough domain knowledge to create good queries for searching information on-line. To help alleviate the situation, exploration techniques have been used to increase the diversity of the search results so that not only those explicitly asked will be returned, but also those potentially relevant ones will be returned too. Most existing approaches, such as collaborative filtering, do not allow the level of exploration to be controlled. Consequently, the search results can be very different from what is expected. We propose an exploration strategy that performs intelligent query processing by first searching usable old queries, and then utilising them to adapt the current query, with the hope that the adapted query will be more relevant to the user’s areas of interest. We applied the proposed strategy to the implementation of a personal information assistant (PIA) set up for user evaluation for 3 months. The experimental results showed that the proposed exploration method outperformed collaborative filtering, and mutation and crossover methods by around 25% in terms of the elimination of off-topic results.  相似文献   

14.
Boolean query mapping across heterogeneous information sources   总被引:5,自引:0,他引:5  
Searching over heterogeneous information sources is difficult because of the nonuniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. We introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems  相似文献   

15.
本为在调查了几种在关系型数据库中存储OWL和RDF对象的方法的基础上,分析了每种方法的不足,提出了一种新的方法,即把每类对象和属性的实例存储在数据库单独的表中,使用数据库中的视图来表示对象之间的关系。我们也实现了一个Java程序来实现描述逻辑推理器的功能,把用户输入的一阶谓词逻辑查询转换成为关系型数据库的SQL语句。实验证明,本方法适用于中等数量的OWL对象的存储。  相似文献   

16.
The problem of word mismatch in information retrieval (IR) occurs because users often use different words to describe concepts in their queries than authors use to describe the same concepts in their documents. Query expansion is used to deal with the mismatch between author and user vocabularies. To support query expansion, indices on words related by lexical semantics and syntactical co-occurrence need to be maintained. Two issues become paramount in supporting query expansion: the size of index tables and the query processing overhead. In this paper, we propose to use the notion of multi-granularity for more efficient indexing and query processing while the same degrees of precision and recall are maintained. We also describes extensions of this technique to handle: (1) query relaxation to handle words with multiple senses and with other semantic relationships; (2) progressive processing of queries with top N results and (3) progressive processing of queries with specification of the importance of each keyword.  相似文献   

17.
《Information Systems》2005,30(7):543-563
One of the main problems in the (web) information retrieval is the ambiguity of users’ queries, since they tend to post very short queries which do not express their information need clearly. This seems to be valid for the ontology-based information retrieval in which the domain ontology is used as the backbone of the searching process. In this paper, we present a novel approach for determining possible refinements of an ontology-based query. The approach is based on measuring the ambiguity of a query with respect to the original user's information need. We defined several types of the ambiguities concerning the structure of the underlying ontology and the content of the information repository. These ambiguities are interpreted regarding the user's information need, which we infer from the user's behaviour in searching process. Finally, the ranked list of the potentially useful refinements of her query is provided to the user. We present a small evaluation study that shows the advantages of the proposed approach.  相似文献   

18.
Novice users often do not have enough domain knowledge to create good queries for searching information on-line. To help alleviate the situation, exploration techniques have been used to increase the diversity of the search results so that not only those explicitly asked will be returned, but also those potentially relevant ones will be returned too. Most existing approaches, such as collaborative filtering, do not allow the level of exploration to be controlled. Consequently, the search results can be very different from what is expected. We propose an exploration strategy that performs intelligent query processing by first searching usable old queries, and then utilising them to adapt the current query, with the hope that the adapted query will be more relevant to the user’s areas of interest. We applied the proposed strategy to the implementation of a personal information assistant (PIA) set up for user evaluation for 3 months. The experimental results showed that the proposed exploration method outperformed collaborative filtering, and mutation and crossover methods by around 25% in terms of the elimination of off-topic results.  相似文献   

19.
When performing queries in web search engines, users often face difficulties choosing appropriate query terms. Search engines therefore usually suggest a list of expanded versions of the user query to disambiguate it or to resolve potential term mismatches. However, it has been shown that users find it difficult to choose an expanded query from such a list. In this paper, we describe the adoption of set‐based text visualization techniques to visualize how query expansions enrich the result space of a given user query and how the result sets relate to each other. Our system uses a linguistic approach to expand queries and topic modeling to extract the most informative terms from the results of these queries. In a user study, we compare a common text list of query expansion suggestions to three set‐based text visualization techniques adopted for visualizing expanded query results – namely, Compact Euler Diagrams, Parallel Tag Clouds, and a List View – to resolve ambiguous queries using interactive query expansion. Our results show that text visualization techniques do not increase retrieval efficiency, precision, or recall. Overall, users rate Parallel Tag Clouds visualizing key terms of the expanded query space lowest. Based on the results, we derive recommendations for visualizations of query expansion results, text visualization techniques in general, and discuss alternative use cases of set‐based text visualization techniques in the context of web search.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号