期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Monochromatic and bichromatic ranked reverse boolean spatial keyword nearest neighbors search

Pengpeng?Zhao Email author Hailin?Fang Victor?S.?Sheng Zhixu?Li Jiajie?Xu Jian?Wu Zhiming?Cui 《World Wide Web》2017,20(1):39-59

Recently, Reverse k Nearest Neighbors (RkNN) queries, returning every answer for which the query is one of its k nearest neighbors, have been extensively studied on the database research community. But the RkNN query cannot retrieve spatio-textual objects which are described by their spatial location and a set of keywords. Therefore, researchers proposed a RSTkNN query to find these objects, taking both spatial and textual similarity into consideration. However, the RSTkNN query cannot control the size of answer set and to be sorted according to the degree of influence on the query. In this paper, we propose a new problem Ranked Reverse Boolean Spatial Keyword Nearest Neighbors query called Ranked-RBSKNN query, which considers both spatial similarity and textual relevance, and returns t answers with most degree of influence. We propose a separate index and a hybrid index to process such queries efficiently. Experimental results on different real-world and synthetic datasets show that our approaches achieve better performance. 相似文献

2.

Scalable and efficient processing of top-<Emphasis Type="Italic">k</Emphasis> multiple-type integrated queries

Hyuk-Yoon?Kwon Kyu-Young?Whang Email author 《World Wide Web》2016,19(6):1051-1075

In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various data types such as spatial, textual, and relational data types can be used for the top-k MULTI query. The top-k MULTI query distinguishes itself from the traditional top-k query in that the component scores to calculate final scores are determined dependent of the query. Hence, each component score is calculated only when the query is given for each data type rather than being calculated apriori as in the top-k query. As a representative instance, the traditional top-k spatial keyword query is an instance of the top-k MULTI query. It deals with the spatial data type and text data type and finds the information based on spatial proximity and textual relevance between the query and the object, which is determined only when the query is given. In this paper, we first define the top-k MULTI query formally and define a new specific instance for the top-k MULTI query, the top-k spatial-keyword-relational(SKR) query, by integrating the relational data type into the traditional top-k spatial keyword query. Then, we investigate the processing approaches for the top-k MULTI query. We discuss the scalability of those approaches as new data types are integrated. We also devise the processing methods for the top-k SKR query. Finally, through extensive experiments on the top-k SKR query using real and synthetic data sets, we compare efficiency of the methods in terms of the query performance and storage. 相似文献

3.

Exact and approximate flexible aggregate similarity search

Feifei?Li Ke?Yi Yufei?Tao Bin?Yao Email author Yang?Li Dong?Xie Min?Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):317-338

Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support \(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor ( Fann ) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality. 相似文献

4.

Semantic-aware top-k spatial keyword queries

Zhihu Qian Jiajie Xu Kai Zheng Pengpeng Zhao Xiaofang Zhou 《World Wide Web》2018,21(3):573-594

The fast development of GPS equipped devices has aroused widespread use of spatial keyword querying in location based services nowadays. Existing spatial keyword query methodologies mainly focus on the spatial and textual similarities, while leaving the semantic understanding of keywords in spatial Web objects and queries to be ignored. To address this issue, this paper studies the problem of semantic based spatial keyword querying. It seeks to return the k objects most similar to the query, subject to not only their spatial and textual properties, but also the coherence of their semantic meanings. To achieve that, we propose novel indexing structures, which integrate spatial, textual and semantic information in a hierarchical manner, so as to prune the search space effectively in query processing. Extensive experiments are carried out to evaluate and compare them with other baseline algorithms. 相似文献

5.

The direction-constrained <Emphasis Type="Italic">k</Emphasis> nearest neighbor query

Min-Joong Lee Dong-Wan Choi SangYeon Kim Ha-Myung Park Sunghee Choi Chin-Wan Chung 《GeoInformatica》2016,20(3):471-502

Finding k nearest neighbor objects in spatial databases is a fundamental problem in many geospatial systems and the direction is one of the key features of a spatial object. Moreover, the recent tremendous growth of sensor technologies in mobile devices produces an enormous amount of spatio-directional (i.e., spatially and directionally encoded) objects such as photos. Therefore, an efficient and proper utilization of the direction feature is a new challenge. Inspired by this issue and the traditional k nearest neighbor search problem, we devise a new type of query, called the direction-constrained k nearest neighbor (DCkNN) query. The DCkNN query finds k nearest neighbors from the location of the query such that the direction of each neighbor is in a certain range from the direction of the query. We develop a new index structure called MULTI, to efficiently answer the DCkNN query with two novel index access algorithms based on the cost analysis. Furthermore, our problem and solution can be generalized to deal with spatio-circulant dimensional (such as a direction and circulant periods of time such as an hour, a day, and a week) objects. Experimental results show that our proposed index structure and access algorithms outperform two adapted algorithms from existing kNN algorithms. 相似文献

6.

Flexible integration of multimedia sub-queries with qualitative preferences 总被引：1，自引：0，他引：1

Ilaria Bartolini Paolo Ciaccia Vincent Oria M. Tamer Özsu 《Multimedia Tools and Applications》2007,33(3):275-300

Complex multimedia queries, aiming to retrieve from large databases those objects that best match the query specification, are usually processed by splitting them into a set of m simpler sub-queries, each dealing with only some of the query features. To determine which are the overall best-matching objects, a rule is then needed to integrate the results of such sub-queries, i.e., how to globally rank the m-dimensional vectors of matching degrees, or partial scores, that objects obtain on the m sub-queries. It is a fact that state-of-the-art approaches all adopt as integration rule a scoring function, such as weighted average, that aggregates the m partial scores into an overall (numerical) similarity score, so that objects can be linearly ordered and only the highest scored ones returned to the user. This choice however forces the system to compromise between the different sub-queries and can easily lead to miss relevant results. In this paper we explore the potentialities of a more general approach, based on the use of qualitative preferences, able to define arbitrary partial (rather than only linear) orders on database objects, so that a larger flexibility is gained in shaping what the user is looking for. For the purpose of efficient evaluation, we propose two integration algorithms able to work with any (monotone) partial order (thus also with scoring functions): MPO, which delivers objects one layer of the partial order at a time, and iMPO, which can incrementally return one object at a time, thus also suitable for processing top k queries. Our analysis demonstrates that using qualitative preferences pays off. In particular, using Skyline and Region-prioritized Skyline preferences for queries on a real image database, we show that the results we get have a precision comparable to that obtainable using scoring functions, yet they are obtained much faster, saving up to about 70% database accesses. 相似文献

7.

Kai?Zheng Bolong?Zheng Email author Jiajie?Xu Guanfeng?Liu An?Liu Zhixu?Li 《World Wide Web》2017,20(4):749-773

The proliferation of GPS-enabled smart mobile devices enables us to collect a large-scale trajectories of moving objects with GPS tags. While the raw trajectories that only consists of positional information have been studied extensively, many recent works have been focusing on enriching the raw trajectories with semantic knowledge. The resulting data, called activity trajectories, embed the information about behaviors of the moving objects and support a variety of applications for better quality of services. In this paper, we propose a Top-k Spatial Keyword (TkSK) query for activity trajectories, with the objective to find a set of trajectories that are not only close geographically but also meet the requirements of the query semantically. Such kind of query can deliver more informative results than existing spatial keyword queries for static objects, since activity trajectories are able to reflect the popularity of user activities and reveal preferable combinations of facilities. However, it is a challenging task to answer this query efficiently due to the inherent difficulties in indexing trajectories as well as the new complexity introduced by the textual dimension. In this work, we provide a comprehensive solution, including the novel similarity function, hybrid indexing structure, efficient search algorithm and further optimizations. Extensive empirical studies on real trajectory set have demonstrated the scalability of our proposed solution. 相似文献

8.

Continuous visible nearest neighbor query processing in spatial databases 总被引：1，自引：0，他引：1

Yunjun Gao Baihua Zheng Gencai Chen Qing Li Xiaofa Guo 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):371-396

In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms. 相似文献

9.

Answering why-not and why questions on reverse top-<Emphasis Type="Italic">k</Emphasis> queries

Qing Liu Yunjun Gao Gang Chen Baihua Zheng Linlin Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(6):867-892

Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings. 相似文献

10.

Efficient moving <Emphasis Type="Italic">k</Emphasis> nearest neighbor queries over line segment objects

Yu Gu Hui Zhang Zhigang Wang Ge Yu 《World Wide Web》2016,19(4):653-677

The growing need for location based services motivates the moving k nearest neighbor query (MkNN), which requires to find the k nearest neighbors of a moving query point continuously. In most existing solutions, data objects are abstracted as points. However, lots of real-world data objects, such as roads, rivers or pipelines, should be reasonably modeled as line segments or polyline segments. In this paper, we present LV*-Diagram to handle MkNN queries over line segment data objects. LV*-Diagram dynamically constructs a safe region. The query results remain unchanged if the query point is in the safe region, and hence, the computation cost of the server is greatly reduced. Experimental results show that our approach significantly outperforms the baseline method w.r.t. CPU load, I/O, and communication costs. 相似文献

11.

Top-<Emphasis Type="Italic">k</Emphasis> coupled keyword recommendation for relational keyword queries

Xiangfu Meng Longbing Cao Xiaoyan Zhang Jingyu Shao 《Knowledge and Information Systems》2017,50(3):883-916

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated. 相似文献

12.

Approximate Continuous Top-<Emphasis Type="Italic">k</Emphasis> Query over Sliding Window

下载免费PDF全文

Rui Zhu Bin Wang Shi-Ying Luo Xiao-Chun Yang Guo-Ren Wang 《计算机科学技术学报》2017,32(1):93-109

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 ? δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate. 相似文献

13.

Uncertain top-<Emphasis Type="Italic">k</Emphasis> query processing in distributed environments

Xite?Wang Email author Derong?Shen Ge?Yu 《Distributed and Parallel Databases》2016,34(4):567-589

The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments. 相似文献

14.

Multiple <Emphasis Type="Italic">k</Emphasis> nearest neighbor search

Yu-Chi Chung I-Fang Su Chiang Lee Pei-Chi Liu 《World Wide Web》2017,20(2):371-398

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time. 相似文献

15.

Reverse direction-based surrounder queries for mobile recommendations

Xi Guo Yoshiharu Ishikawa Yonghong Xie Aziguli Wulamu 《World Wide Web》2017,20(5):885-913

This paper proposes a new spatial query called a reverse direction-based surrounder (RDBS) query, which retrieves a user who is seeing a point of interest (POI) as one of their direction-based surrounders (DBSs). According to a user, one POI can be dominated by a second POI if the POIs are directionally close and the first POI is farther from the user than the second is. Two POIs are directionally close if their included angle with respect to the user is smaller than an angular threshold ??. If a POI cannot be dominated by another POI, it is a DBS of the user. We also propose an extended query called competitor RDBS query. POIs that share the same RDBSs with another POI are defined as competitors of that POI. We design algorithms to answer the RDBS queries and competitor queries. The experimental results show that the proposed algorithms can answer the queries efficiently. 相似文献

16.

Aggregate location recommendation in dynamic transportation networks

Jianmin Li Yan Wang Ying Zhong Danhuai Guo Shunzhi Zhu 《World Wide Web》2018,21(6):1637-1653

Travel planning and location recommendation are increasingly important in recent years. In this light, we propose and study a novel aggregate location recommendation query (ALRQ) of discovering aggregate locations for multiple travelers and planning the corresponding travel routes in dynamic transportation networks. Assuming the scenario that multiple travelers target the same destination, given a set of travelers’ locations Q, a set of potential aggregate location O, and a departure time t, the ALRQ finds an aggregate location o ∈ O that has the minimum global travel time \({\sum }_{q \in Q} T(q,o,t)\), where T(q,o,t) is the travel time between o and q with departure time t. The ALRQ problem is challenging due to three reasons: (1) how to model the dynamic transportation networks practically, and (2) how to compute ALRQ efficiently. We take two types of dynamic transportation networks into account, and we define a pair of upper and lower bounds to prune the search space effectively. Moreover, a heuristic scheduling strategy is adopted to schedule multiple query sources. Finally, we conducted extensive experiments on real and synthetic spatial data to verify the performance of the developed algorithms. 相似文献

17.

The <Emphasis Type="Italic">B</Emphasis><Superscript><Emphasis Type="Italic">dual</Emphasis></Superscript>-Tree: indexing moving objects by space filling curves in the dual space

Man Lung Yiu Yufei Tao Nikos Mamoulis 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(3):379-400

Existing spatiotemporal indexes suffer from either large update cost or poor query performance, except for the B ^x-tree (the state-of-the-art), which consists of multiple B ⁺-trees indexing the 1D values transformed from the (multi-dimensional) moving objects based on a space filling curve (Hilbert, in particular). This curve, however, does not consider object velocities, and as a result, query processing with a B ^x-tree retrieves a large number of false hits, which seriously compromises its efficiency. It is natural to wonder “can we obtain better performance by capturing also the velocity information, using a Hilbert curve of a higher dimensionality?”. This paper provides a positive answer by developing the B ^dual-tree, a novel spatiotemporal access method leveraging pure relational methodology. We show, with theoretical evidence, that the B ^dual-tree indeed outperforms the B ^x-tree in most circum- stances. Furthermore, our technique can effectively answer progressive spatiotemporal queries, which are poorly supported by B ^x-trees. 相似文献

18.

Monitoring distributed fragmented skylines

Odysseas Papapetrou Minos Garofalakis 《Distributed and Parallel Databases》2018,36(4):675-715

Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is to efficiently monitor a continuous skyline query over a collection of distributed streams. All existing work relies on the assumption of a single point of reference for object attributes/dimensions: objects may be vertically or horizontally partitioned, but the accurate value of each dimension for each object is always maintained by a single site. This assumption is unrealistic for several distributed applications, where object information is fragmented over a set of distributed streams (each monitored by a different site) and needs to be aggregated (e.g., averaged) across several sites. Furthermore, it is frequently useful to define skyline dimensions through complex functions over the aggregated objects, which raises further challenges for dealing with distribution and object fragmentation. We present the first known distributed algorithms for continuous monitoring of skylines over complex functions of fragmented multi-dimensional objects. Our algorithms rely on decomposition of the skyline monitoring problem to a select set of distributed threshold-crossing queries, which can be monitored locally at each site. We propose several optimizations, including: (a) a technique for adaptively determining the most efficient monitoring strategy for each object, (b) an approximate monitoring technique, and (c) a strategy that reduces communication overhead by grouping together threshold-crossing queries. Furthermore, we discuss how our proposed algorithms can be used to address other continuous query types. A thorough experimental study with synthetic and real-life data sets verifies the effectiveness of our schemes and demonstrates order-of-magnitude improvements in communication costs compared to the only alternative centralized solution. 相似文献

19.

A graph-based cache for large-scale similarity search engines

Veronica Gil-Costa Mauricio Marin Carolina Bonacic Roberto Solar 《The Journal of supercomputing》2018,74(5):2006-2034

Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%. 相似文献

20.

An intelligent,uncertainty driven aggregation scheme for streams of ordered sets

Kostas Kolomvatsos 《Applied Intelligence》2016,45(3):713-735

Data streams management has attracted the attention of many researchers during the recent years. The reason is that numerous devices generate huge amounts of data demanding an efficient processing scheme for delivering high quality applications. Data are reported through streams and stored into a number of partitions. Separation techniques facilitate the parallel management of data while intelligent methods are necessary to manage these multiple instances of data. Progressive analytics over huge amounts of data could be adopted to deliver partial responses and, possibly, to save time in the execution of applications. An interesting research domain is the efficient management of queries over multiple partitions. Usually, such queries demand responses in the form of ordered sets of objects (e.g., top-k queries). These ordered sets include objects in a ranked order and require novel mechanisms for deriving responses based on partial results. In this paper, we study a setting of multiple data partitions and propose an intelligent, uncertainty driven decision making mechanism that aims to respond to streams of queries. Our mechanism delivers an ordered set of objects over a number of partial ordered subsets retrieved by each partition of data. We envision that a number of query processors are placed in front of each partition and report progressive analytics to a Query Controller (QC). The QC receives queries, assigns the task to the underlying processors and decides the right time to deliver the final ordered set to the application. We propose an aggregation model for deriving the final ordered set of objects and a Fuzzy Logic (FL) inference process. We present a Type-2 FL system that decides when the QC should stop aggregating partial subsets and return the final response to the application. We report on the performance of the proposed mechanism through the execution of a large set of experiments. Our results deal with the throughput of the QC, the quality of the final ordered set of objects and the time required for delivering the final response. 相似文献