首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
In this paper, a method for fast processing of data stream tuples in parallel execution of continuous queries over a multiprocessing environment is proposed. A copy of the query plan is assigned to each of processing units in the multiprocessing environment. Dynamic and continuous routing of input data stream tuples among the graph constructed by these copies (called the Query Mega Graph) for each input tuple determines that, after getting processed by each processing unit (e.g., processor), to which next processor it should be forwarded. Selection of the proper next processor is performed such that the destination processor imposes the minimum tuple latency to the corresponding tuple, among all of the alternative processors. The tuple latency is derived from processing, buffering and communication time delay which varies in different practical parallel systems.  相似文献   

2.
一个基于三元组存储的列式OLAP查询执行引擎   总被引:1,自引:0,他引:1  
朱阅岸  张延松  周烜  王珊 《软件学报》2014,25(4):753-767
大数据与传统的数据仓库技术相结合产生了大数据实时分析处理需要(volume+velocity),它要求大数据背景下的数据仓库不能过多地依赖物化、索引等高存储代价的优化技术,而要提高实时处理能力来应对大数据分析中数据量大、查询分析复杂等特点.这些查询分析操作一般表现为在事实表和维表之间连接操作的基础上对结果集上进行分组聚集等操作.因此,表连接和分组聚集操作是ROLAP(relational OLAP)性能的两个重要决定因素.研究了新硬件平台下针对大规模数据的OLAP查询的性能,设计新的列存储OLAP查询执行引擎CDDTA-MMDB(columnar direct dimensional tuple access-main memory databasequeryexecutionengine,直接维表元组访问的内存数据库查询执行引擎).基于三元组的物化策略,使得CDDTA-MMDB能够减少内存列存储模型上表连接操作访问基表和中间数据结构的次数.首先,CDDTA-MMDB将查询分解为作用在维表和事实表上的子查询,如果只涉及过滤操作,子查询将生成<代理键,布尔值>二元组;否则,子查询生成<代理键,关键字,值>三元组.然后,只需一趟扫描事实表,利用事实表的外键映射函数直接定位相应三元组或者二元组,完成相应的过滤、连接或聚集操作.CDDTA-MMDB充分考虑了内存列存储数据库的设计原则,尽量减少随机内存访问.实验结果表明:CDDTA-MMDB是高效的,与具代表性的列存储数据库相比,比MonetDB 5.5快2.5倍,比C-store的invisible join快5倍;并且,CDDTA-MMDB在多核处理器上具有线性加速比.  相似文献   

3.
The traditional approach to database querying and updating treats insertions and deletions of tuples in an asymmetric manner: if a tuple is inserted then, intuitively, we think of as being true and we use this knowledge in query and update processing; in contrast, if a tuple is deleted then we think of as being false but we do not use this knowledge at all! In this paper, we present a new approach to database querying and updating in which insertions and deletions of tuples are treated in a symmetric manner. Contrary to the traditional approach, we use both inserted and deleted tuples in our derivation algorithms. Our approach works as follows: if the deletion of a tuple is requested, then we mark as being deleted without removing it from the database; if the insertion of a tuple is requested, then we simply place in the database and remove all its marked subtuples. Derivation of tuples is done using two derivation rules under one constraint: a tuple is derived only if has no marked subtuples in the database. The derivation rules reflect relational projection and relational join. The main contribution of our work is to provide a method which allows insertion or deletion of a tuple over any relation scheme in a deterministic way. Received: 12 June 1995 / 19 February 1997  相似文献   

4.
In systems coordinated with a distributed set of tuple spaces, it is crucial to assist agents in retrieving the tuples they are interested in. This can be achieved by sorting techniques that group similar tuples together in the same tuple space, so that the position of a tuple can be inferred by similarity. Accordingly, we formulate the collective sort problem for distributed tuple spaces, where a set of agents is in charge of moving tuples up to a complete sort has been reached, namely, each of the N tuple spaces aggregate tuples belonging to one of the N kinds available. After pointing out the requirements for effectively tackling this problem, we propose a self-organizing solution resembling brood sorting performed by ants. This is based on simple agents that perform partial observations and accordingly take decisions on tuple movement. Convergence is addressed by a fully adaptive method for simulated annealing, based on noise tuples inserted and removed by agents on a need basis so as to avoid sub-optimal sorting. Emergence of sorting properties and scalability are evaluated through stochastic simulations.  相似文献   

5.
A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.  相似文献   

6.
Under the bag-theoretic semantics relations are bags of tuples, that is, a tuple may have any number of duplicates. Under this semantics, a conjunctive query is bag-contained in a conjunctive query , denoted , if for all databases , , the result of applying to , is a subbag of . It is not known whether testing is decidable. In this paper we prove that can be tested on a finite set of canonical databases built from the body of . Using that result we give a procedure that decides the bag-containment problem of conjunctive queries in a large number of cases. Received: 27 September 1995 / 19 June 1996  相似文献   

7.
Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query revaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations  相似文献   

8.
This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP #P -complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries.  相似文献   

9.
Traditional database search uses pattern match in the comparison process. For a query with some search words, tuples are selected only if the words of the tuples exactly match the query words. In this paper, we propose a new method for evaluating relational ranking queries (or top-N queries) with text attributes. This method defines semantic distance functions and utilizes semantic match between words in database search. The attempt is that tuples, not only exactly matching, but also close to the query according to semantic distances, can both be fetched. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. Extensive experiments are carried out to measure the performance of this new strategy for the evaluation of ranking queries over relational databases.  相似文献   

10.
Uncertain data are inevitable in many applications due to various factors such as the limitations of measuring equipment and delays in data updates. Although modeling and querying uncertain data have recently attracted considerable attention from the database community, there are still many critical issues to be resolved with respect to conducting advanced analysis on uncertain data. In this paper, we study the execution of the probabilistic skyline query over uncertain data streams. We propose a novel sliding window skyline model where an uncertain tuple may take the probability to be in the skyline at a certain timestamp t. Formally, a Wp-Skyline(p, t) contains all the tuples whose probabilities of becoming skylines are at least p at timestamp t. However, in the stream environment, computing a probabilistic skyline on a large number of uncertain tuples within the sliding window is a daunting task in practice. In order to efficiently calculate Wp-Skyline, we propose an efficient and effective approach, namely the candidate list approach, which maintains lists of candidates that might become skylines in future sliding windows. We also propose algorithms that continuously monitor the newly incoming and expired data to maintain the skyline candidate set incrementally. To further reduce the computation cost of deciding whether or not a candidate tuple belongs to the skyline, we propose an enhanced refinement strategy that is based on a multi-dimensional indexing structure combined with a grouping-and-conquer strategy. To validate the effectiveness of our proposed approach, we conduct extensive experiments on both real and synthetic data sets and make comparisons with basic techniques.  相似文献   

11.
Answering heterogeneous database queries with degrees of uncertainty   总被引:1,自引:0,他引:1  
In heterogeneous database systems,partial values have been used to resolve some schema integration problems. Performing operations on partial values may producemaybe tuples in the query result which cannot be compared. Thus, users have no way to distinguish which maybe tuple is the most possible answer. In this paper, the concept of partial values is generalized toprobabilistic partial values. We propose an approach to resolve the schema integration problems using probabilistic partial values and develop a full set of extended relational operators for manipulating relations containing probabilistic partial values. With this approach, the uncertain answer tuples of a query are associated with degrees of uncertainty (represented by probabilities). That provides users a comparison among maybe tuples and a better understanding on the query results. Besides, extended selection and join are generalized to -selection and -join, respectively, which can be used to filter out maybe tuples with low probabilities — those which have probabilities smaller than .  相似文献   

12.
It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.  相似文献   

13.
Faudemay  P. Mhiri  M. 《Micro, IEEE》1991,11(6):22-34
The RAPID-1 (relational access processor for intelligent data), an associative accelerator that recognizes tuples and logical formulas, is presented. It evaluates logical formulas instantiated by the current tuple, or record, and operates on whole relations or on hashing buckets. RAPID- 1 uses a reduced instruction set and hardwired control and executes all comparisons in a bit-parallel mode. It speeds up the database by a significant factor and will adapt to future generations of microprocessors. The principal design issues, data structures, instruction set, architecture, environments and performance are discussed  相似文献   

14.
We elaborate on how to interpret the query answer on exclusive disjunctive databases and how to reduce the query answer into a more concise form. Exclusive disjunctive data are represented as a pair of value set and variable set in Pv-table which is an extension of the relational model. A value set corresponds to a finite set of possible values in which exactly one value is the true value. By variable sets, tuples may be related with certain relationships, namely disjunctive relationship and join relationship. Three kinds of tuple sets are classified according to these relationships, each possesses an important property, namely co-exist, co-nonempty, or co-instance. Based on these properties, the interpretation of Pv-tables can be formalized in a semantically meaningful way, Also, the redundant and mergeable tuples can be identified. After removing and merging tuples accordingly, a more concise Pv-table can thus provide a better understanding of the query result  相似文献   

15.
Top-k查询在传统的存储确定性数据的关系型数据库中得到了广泛的应用,但是对于存储不确定性数据的数据库,Top-k查询必须结合元组的分值和不确定性来处理.已有的Top-k查询没有很好地结合元组的分值和不确定性,因此,定义一种新的针对不确定性数据的Top-k查询语义,并且实现了查询算法,在新语义下,计算第i位排名时考虑了第i-1位元组,能够更好地权衡分值和不确定性.不同数据集上的实验显示,该算法是有效的.  相似文献   

16.
Tables are used to represent unknown relations. The three partial tuple types are defined in a table to specify incompleteness relationships among tuples of the same table. For tuples of different tables, the cases where incompleteness is introduced at the relation level, tuple level, or attribute value level are discussed. For each of the models, it is shown that query evaluation is sound in the Imielinski-Lipski sense. None of the models is complete in the Imielinski-Lipski sense. Two of the models in the family are compared with other approaches  相似文献   

17.
封闭数据立方体技术研究   总被引:14,自引:1,他引:14  
李盛恩  王珊 《软件学报》2004,15(8):1165-1171
数据立方体中有很多冗余信息,去除这些冗余信息不但可以节约存储空间,还可以加快计算速度.数据立方体中的元组可以划分为封闭元组和非封闭元组.对任何一个非封闭元组,一定存在一个封闭元组,它们都是从基本表的同一组元组中经过聚集运算得到的,因而具有相同的聚集函数值.去掉数据立方体中所有的非封闭元组就产生了一个封闭数据立方体.提出了封闭数据立方体的生成算法、查询算法和增量维护算法,并使用合成数据和实际数据做了一些实验.实验结果表明,封闭数据立方体技术是有效的.  相似文献   

18.
The filtering of incoming tuples of a data stream should be completed quickly and continuously, which requires strict time and space constraints. In order to guarantee these constraints, the selection predicates of continuous queries are grouped or indexed in most data stream management systems (DSMS). This paper proposes a new scheme called attribute selection construct (ASC). Given a set of continuous queries, an ASC divides the domain of an attribute of a data stream into a set of disjoint regions based on the selection predicates that are imposed on the attribute. Each region maintains the pre-computed matching results of the selection predicates. Consequently, an ASC can collectively evaluate all of its selection predicates at the same time. Furthermore, it can also monitor the overall evaluation statistics, such as its selectivity and tuple dropping ratio, dynamically. For those attributes that are employed to express the selection predicates of the queries, the processing order of their ASC’s can significantly influence the overall performance of a multiple query evaluation. The evaluation sequence can be optimized by periodically capturing the run-time tuple dropping ratio of its current evaluation sequence. The performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.  相似文献   

19.
In distributed systems, tuple space is one of the coordination models that significantly maximizes system performance against failure due to its space and time decoupling features. With the growing popularity of distributed computing and increasing complexity in the network, host and link failure occurs frequently, resulting in poor system performance. This article proposes a fault-tolerant model named Tuple Space Replication (TSR) for tuple space coordination in distributed environments. The model introduces a multi-agent system that consists of multiple hosts. Each host in a multi-agent system comprises an agent space with a tuple space for coordination. In this model, we introduce three novel fault-tolerant algorithms for tuple space primitives to provide coordination among hosts with tolerance to multiple links and hosts failure. The first algorithm is given for out() operation to insert tuples in the tuple space. The second algorithm is presented for rdp() operation to read any tuple from the tuple space. The third algorithm is given for inp() operation to delete or withdraw tuples from the tuple space. These algorithms use less number of messages to ensure consistency in the system. The message complexity of the proposed algorithms is analyzed and found O(n) for out(), O(1) for rdp(), and O(n) for inp() operations which is comparable and better than existing works, where n is the number of hosts. The testbed experiment reveals that the proposed TSR model gives performance improvement up to 88%, 70.94%, and 63.80% for out(), rdp(), and inp() operations compared to existing models such as FT-SHE, LBTS, DEPSPACE, and E-DEPSPACE.  相似文献   

20.
This paper focuses on the protection of the confidentiality of the data space content when Shared Data Spaces are deployed in open, possibly hostile, environments. In previous approaches, the data space content was protected against access from unauthorised application components by means of access control mechanisms. The basic assumption is that the hosts (and their administrators) where the data space is deployed have to be trusted. When such an assumption does not hold, then encryption schemes can be used to protect the data space content from malicious hosts. However, such schemes do not support searching on encrypted data. As a consequence, performing retrieval operations is very expensive in terms of resource consumption. Moreover, in these schemes applications have to share secret keys requiring a very complex key management. In this paper, we present a novel encryption scheme that allows tuple matching on completely encrypted tuples. Since the data space does not need to decrypt tuples to perform the search, tuple confidentiality can be guaranteed even when the data space is deployed on malicious hosts (or an adversary gains access to the host). Our scheme does not require authorised components to share keys for inserting and retrieving tuples. Each authorised component can encrypt, decrypt, and search encrypted tuples without knowing other components’ keys. This is beneficial inasmuch as it simplifies the task of key management. An implementation of an encrypted data space based on this scheme is described and some preliminary performance results are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号