期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CLASCN:Candidate Network Selection for Efficient Top-k Keyword Queries over Databases

张俊彭朝晖王珊聂惠静《计算机科学技术学报》2007,22(2)

Keyword Search Over Relational Databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In this paper, a new approach CLASCN (Classification, Learning And Selection of Candidate Network) is developed to efficiently perform top-fc keyword queries in schema-graph-based online KSORD systems. In this approach, the Candidate Networks (CNs) from trained keyword queries or executed user queries are classified and stored in the databases, and top-fc results from the CNs are learned for constructing CN Language Models (CNLMs). The CNLMs are used to compute the similarity scores between a new user query and the CNs from the query. The CNs with relatively large similarity score, which are the most promising ones to produce top-fc results, will be selected and performed. Currently, CLASCN is only applicable for past queries and New All-keyword-Used (NAU) queries which are frequently submitted queries. Extensive experiments also show the efficiency and effectiveness of our CLASCN approach. 相似文献

2.

Query Intent Disambiguation of Keyword-Based Semantic Entity Search in Dataspaces

下载免费PDF全文

杨丹申德荣于戈寇月聂铁铮《计算机科学技术学报》2013,28(2):382-393

Keyword query has attracted much research attention due to its simplicity and wide applications. The inherent ambiguity of keyword query is prone to unsatisfied query results. Moreover some existing techniques on Web query, keyword query in relational databases and XML databases cannot be completely applied to keyword query in dataspaces. So we propose KeymanticES, a novel keyword-based semantic entity search mechanism in dataspaces which combines both keyword query and semantic query features. And we focus on query intent disambiguation problem and propose a novel three-step approach to resolve it. Extensive experimental results show the effectiveness and correctness of our proposed approach. 相似文献

3.

Continuous Probabilistic Subspace Skyline Query Processing Using Grid Projections

下载免费PDF全文

赵雷杨艳艳周晓方《计算机科学技术学报》2014,29(2):332-344

As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes （known as data uncertainty）, to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm. 相似文献

4.

数据流分隔——传感网络中的一种数据融合方法

吴健康董梁包晓明《自动化学报》2006,32(6):856-866

相似文献

5.

Stream Segmentation-A Data Fusion Approach for Sensor Networks

WU Jian-Kang DONG Liang BAO Xiao-Ming 《自动化学报》2006,(6)

相似文献

6.

Web Data Extraction from Query Result Pages Based on Visual and Content Features

下载免费PDF全文

Daiyue Weng Jun Hong David A. Bell 《International Journal of Software and Informatics》2012,6(3):453-472

A rapidly increasing number of Web databases are now become accessible via their HTML form-based query interfaces. Query result pages are dynamically generated in response to user queries, which encode structured data and are displayed for human use. Query result pages usually contain other types of information in addition to query results, e.g., advertisements, navigation bar etc. The problem of extracting structured data from query result pages is critical for web data integration applications, such as comparison shopping, meta-search engines etc, and has been intensively studied. A number of approaches have been proposed. As the structures of Web pages become more and more complex, the existing approaches start to fail, and most of them do not remove irrelevant contents which may affect the accuracy of data record extraction. We propose an automated approach for Web data extraction. First, it makes use of visual features and query terms to identify data sections and extracts data records in these sections. We also represent several content and visual features of visual blocks in a data section, and use them to filter out noisy blocks. Second, it measures similarity between data items in different data records based on their visual and content features, and aligns them into different groups so that the data in the same group have the same semantics. The results of our experiments with a large set of Web query result pages in di?erent domains show that our proposed approaches are highly effective. 相似文献

7.

Query Expansion Using Wikipedia and a Concept Base in Cross-language Information Retrieval

Pham Huy Anh Yukawa Takashi 《计算机技术与应用:英文》2013,(10):522-531

相似文献

8.

Cooperative Answering of Fuzzy Queries

下载免费PDF全文

Hanène Chettaoui 《计算机科学技术学报》2009,24(4):675-686

The majority of existing information systems deals with crisp data through crisp database systems.Traditional Database Management Systems(DBMS) have not taken into account imprecision so one can say there is some sort of lack of flexibility.The reason is that queries retrieve only elements which precisely match to the given Boolean query.That is,an element belongs to the result if the query is true for this element;otherwise,no answers are returned to the user.The aim of this paper is to present a cooper... 相似文献

9.

A Semantic Cache Framework for Secure XML Queries

下载免费PDF全文

Jian-Hua Feng Guo-Liang Li and Na Ta 《计算机科学技术学报》2008,23(6):988-997

Secure XML query answering to protect data privacy and semantic cache to speed up XML query answering are two hot spots in current research areas of XML database systems. While both issues are explored respectively in depth,they have not been studied together,that is,the problem of semantic cache for secure XML query answering has not been addressed yet. In this paper,we present an interesting joint of these two aspects and propose an efficient framework of semantic cache for secure XML query answering,which can improve the performance of XML database systems under secure circumstances. Our framework combines access control,user privilege management over XML data and the state-of-the-art semantic XML query cache techniques,to ensure that data are presented only to authorized users in an efficient way. To the best of our knowledge,the approach we propose here is among the first beneficial efforts in a novel perspective of combining caching and security for XML database to improve system performance. The efficiency of our framework is verified by comprehensive experiments. 相似文献

10.

Compressed Data Cube for Approximate OLAP Query Processing 总被引：4，自引：0，他引：4

下载免费PDF全文

冯玉王珊《计算机科学技术学报》2002,17(5):0-0

Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling. 相似文献

11.

Top-<Emphasis Type="Italic">k</Emphasis> coupled keyword recommendation for relational keyword queries

Xiangfu Meng Longbing Cao Xiaoyan Zhang Jingyu Shao 《Knowledge and Information Systems》2017,50(3):883-916

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated. 相似文献

12.

Multi-dimensional top-k dominating queries 总被引：1，自引：0，他引：1

Man Lung Yiu Nikos Mamoulis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):695-718

The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of top-k dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data. 相似文献

13.

No-but-semantic-match: computing semantically matched xml keyword search results

Mehdi?Naseriparsa Email author Md.?Saiful?Islam Chengfei?Liu Irene?Moser 《World Wide Web》2018,21(5):1223-1257

Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-k results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches. 相似文献

14.

Supporting top-<Emphasis Type="Italic">k</Emphasis><Emphasis Type="Italic">join</Emphasis> queries in relational databases

Ihab?F.?Ilyas Email author Walid?G.?Aref Ahmed?K.?Elmagarmid 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(3):207-221

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765 相似文献

15.

Semantics and evaluation of top-<Emphasis Type="Italic">k</Emphasis> queries in probabilistic databases

Xi Zhang Jan Chomicki 《Distributed and Parallel Databases》2009,26(1):67-126

We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms. For general databases we show polynomial-time reductions to the simple cases, and provide effective heuristics to speed up the computation in practice. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective. Research partially supported by NSF grant IIS-0307434. An earlier version of some of the results in this paper was presented in [1]. 相似文献

16.

An effective suggestion method for keyword search of databases

Hai Huang Zonghai Chen Chengfei Liu He Huang Xiangliang Zhang 《World Wide Web》2017,20(4):729-747

This paper solves the problem of providing high-quality suggestions for user keyword queries over databases. With the assumption that the returned suggestions are independent, existing query suggestion methods over databases score candidate suggestions individually and return the top-k best of them. However, the top-k suggestions have high redundancy with respect to the topics. To provide informative suggestions, the returned k suggestions are expected to be diverse, i.e., maximizing the relevance to the user query and the diversity with respect to topics that the user might be interested in simultaneously. In this paper, an objective function considering both factors is defined for evaluating a suggestion set. We show that maximizing the objective function is a submodular function maximization problem subject to n matroid constraints, which is an NP-hard problem. An greedy approximate algorithm with an approximation ratio O(\(\frac {1}{1+n}\)) is also proposed. Experimental results show that our suggestion outperforms other methods on providing relevant and diverse suggestions. 相似文献

17.

Evaluating continuous top-k queries over document streams

Weixiong Rao Lei Chen Shudong Chen Sasu Tarkoma 《World Wide Web》2014,17(1):59-83

相似文献

18.

一种针对反向空间偏好top-k查询的高效处理方法

李淼谷峪陈默于戈《软件学报》2017,28(2):310-325

随着地理位置定位技术的蓬勃发展,基于在线位置服务技术的应用也越来越多.提出一种查询类型——反向空间偏好top-k查询.类似于传统的反向空间top-k查询,对于给定的空间查询对象,该查询返回使该对象满足top-k属性得分的那些用户.但不同的是,该对象的属性不是自身具有的特性,而是通过计算该对象与其他偏好对象之间的空间关系（如距离）而确定.这种查询在市场分析等许多重要领域具有需求,例如,根据查询结果,分析出某个地区中某个设施受欢迎的程度.但是,由于大量空间对象的存在导致对象之间空间关系的计算代价非常高,如何实时地计算出对象的空间属性得分,给查询处理带来很大的挑战.针对该问题提出优化的查询处理算法包括：数据集剪枝、数据集批量处理、基于权重的用户分组等策略.通过理论分析和充分的实验验证,证明了所提出方法的有效性.与普通方法相比,这些方法能够大幅度提高查询处理的执行时间和I/O效率. 相似文献

19.

Top-k typicality queries and efficient query answering methods on large databases

Ming Hua Jian Pei Ada W. C. Fu Xuemin Lin Ho-Fung Leung 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):809-835

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like “Who are the top-k most typical NBA players?”, the measure of simple typicality is developed. To answer questions like “Who are the top-k most typical guards distinguishing guards from other players?”, the notion of discriminative typicality is proposed. Moreover, to answer questions like “Who are the best k typical guards in whole representing different types of guards?”, the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical. 相似文献

20.

Continually Answering Constraint <Emphasis Type="Italic">k</Emphasis>-<Emphasis Type="Italic">NN</Emphasis> Queries in Unstructured P2P Systems

下载免费PDF全文

Bin Wang Xiao-Chun Yang Guo-Ren Wang Ge Yu Lei Chen X. Sean Wang and Xue-Min Lin 《计算机科学技术学报》2008,23(4):538-556

We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer （P2P） system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle （MBR）-based query approaches, especially for continuous queries. 相似文献