期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Load shedding for multi-way stream joins based on arrival order patterns

Tae-Hyung Kwon Ki Yong Lee Myoung Ho Kim 《Journal of Intelligent Information Systems》2011,37(2):245-265

We address the problem of load shedding for continuous multi-way join queries over multiple data streams. When the arrival rates of tuples from data streams exceed the system capacity, a load shedding algorithm drops some subset of input tuples to avoid system overloads. To decide which tuples to drop among the input tuples, most existing load shedding algorithms determine the priority of each input tuple based on the frequency or some historical statistics of its join attribute value, and then drop tuples with the lowest priority. However, those value-based algorithms cannot determine the priorities of tuples properly in environments where join attribute values are unique and each join attribute value occurs at most once in each data stream. In this paper, we propose a load shedding algorithm specifically designed for such environments. The proposed load shedding algorithm determines the priority of each tuple based on the order of streams in which its join attribute value appears, rather than its join attribute value itself. Consequently, the priorities of tuples can be determined effectively in environments where join attribute values are unique and do not repeat. The experimental results show that the proposed algorithm outperforms the existing algorithms in such environments in terms of effectiveness and efficiency. 相似文献

2.

消息分发系统中的元组空间分解算法研究

郑广宫云战张威杨朝红《计算机工程与设计》2010,31(1)

为解决分布式环境下消息分发系统中的按需通信,在对Gelemter元组空间模型进行改进的基础上,对消息分发系统中的元组空间通信进行了结构设计,定义了元组空间的特征模型,并基于局部性原理提出一种元组空间通信的空间分解算法.该算法依据在实际通信中不同元组不同元素的匹配频度的差异,将元组空间分解为依赖特征空间、特征元组和特征元素之间抽象关系的一组缓冲子空间,通信进程在进行匹配操作时可直接从缓冲子空间中获取匹配元组,从而降低通信的计算成本. 相似文献

3.

An extended algebra for constraint databases

Belussi A. Bertino E. Catania B. 《Knowledge and Data Engineering, IEEE Transactions on》1998,10(5):686-705

Constraint relational databases use constraints to both model and query data. A constraint relation contains a finite set of generalized tuples. Each generalized tuple is represented by a conjunction of constraints on a given logical theory and, depending on the logical theory and the specific conjunction of constraints, it may possibly represent an infinite set of relational tuples. For their characteristics, constraint databases are well suited to model multidimensional and structured data, like spatial and temporal data. The definition of an algebra for constraint relational databases is important in order to make constraint databases a practical technology. We extend the previously defined constraint algebra (called generalized relational algebra). First, we show that the relational model is not the only possible semantic reference model for constraint relational databases and we show how constraint relations can be interpreted under the nested relational model. Then, we introduce two distinct classes of constraint algebras, one based on the relational algebra, and one based on the nested relational algebra, and we present an algebra of the latter type. The algebra is proved equivalent to the generalized relational algebra when input relations are modified by introducing generalized tuple identifiers. However, from a user point of view, it is more suitable. Thus, the difference existing between such algebras is similar to the difference existing between the relational algebra and the nested relational algebra, dealing with only one level of nesting. We also show how external functions can be added to the proposed algebra 相似文献

4.

The Threshold Algorithm: From Middleware Systems to the Relational Engine

Bruno N. Hui Wang 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(4):523-537

The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems 相似文献

5.

TCRM: diagnosing tuple inconsistency for granulized datasets

Chien-Hsing Wu 《Knowledge》2002,15(8):507-514

Many approaches to the granulization have been presented for knowledge discovery. However, the inconsistent tuples that exist in granulized datasets are hardly ever revealed. In this paper, we developed a model, tuple consistency recognition model (TCRM) to help efficiently detect inconsistent tuples for datasets that are granulized. The main outputs of the developed model include explored inconsistent tuples and consumed processing time. We further conducted an empirical test where eighteen continuous real-life datasets granulized by the equal width interval technique that embedded S-plus histogram binning algorithm (SHBA) and largest binning size algorithm (LBSA) binning algorithms were diagnosed. Remarkable results: almost 40% of the granulized datasets contain inconsistent tuples and 22% have the amount of inconsistent tuples more than 20%. 相似文献

6.

一种新型k匿名隐私保护算法

刘斐樊华金松昌贾焰《信息网络安全》2012,(8):199-202

文章针对公开数据集上的隐私数据保护展开研究,分析了经典的k匿名算法在处理连续发布的数据集时存在的不足,在新的应用场景下对其进行改进。文章提出的算法通过增量式的数据处理技术减少了时间开销,适用于大规模数据集的快速连续发布。算法通过为每个数据元组选择最优等价类,有效控制了信息损失。算法以敏感属性值泛化技术代替了伪造元组的引入,保证了数据集上只包含真实数据,提高了数据集的可用性。通过实例分析发现提出的算法可以很好的解决连续发布数据集上的隐私保护问题。相似文献

7.

Privacy-Preserving Tuple Matching in Distributed Databases

Yingpeng Sang Hong Shen Hui Tian 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(12):1767-1782

We address the problems of privacy-preserving duplicate tuple matching (PPDTM) and privacy-preserving threshold attributes matching (PPTAM) in the scenario of a horizontally partitioned database among N parties, where each party holds a private share of the database's tuples and all tuples have the same set of attributes. In PPDTM, each party determines whether its tuples have any duplicate on other parties' private databases. In PPTAM, each party determines whether all attribute values of each tuple appear at least a threshold number of times in the attribute unions. We propose protocols for the two problems using additive homomorphic cryptosystem based on the subgroup membership assumption, e.g., Paillier's and ElGamal's schemes. By analysis on the total numbers of modular exponentiations, modular multiplications and communication bits, with a reduced computation cost which dominates the total cost, by trading off communication cost, our PPDTM protocol for the semihonest model is superior to the solution derivable from existing techniques in total cost. Our PPTAM protocol is superior in both computation and communication costs. The efficiency improvements are achieved mainly by using random numbers instead of random polynomials as existing techniques for perturbation, without causing successful attacks by polynomial interpolations. We also give detailed constructions on the required zero-knowledge proofs and extend our two protocols to the malicious model, which were previously unknown. 相似文献

8.

On the collective sort problem for distributed tuple spaces

Matteo Casadei Mirko Viroli 《Science of Computer Programming》2009,74(9):702-722

In systems coordinated with a distributed set of tuple spaces, it is crucial to assist agents in retrieving the tuples they are interested in. This can be achieved by sorting techniques that group similar tuples together in the same tuple space, so that the position of a tuple can be inferred by similarity. Accordingly, we formulate the collective sort problem for distributed tuple spaces, where a set of agents is in charge of moving tuples up to a complete sort has been reached, namely, each of the N tuple spaces aggregate tuples belonging to one of the N kinds available. After pointing out the requirements for effectively tackling this problem, we propose a self-organizing solution resembling brood sorting performed by ants. This is based on simple agents that perform partial observations and accordingly take decisions on tuple movement. Convergence is addressed by a fully adaptive method for simulated annealing, based on noise tuples inserted and removed by agents on a need basis so as to avoid sub-optimal sorting. Emergence of sorting properties and scalability are evaluated through stochastic simulations. 相似文献

9.

线性序约束的规范表达 总被引：3，自引：0，他引：3

范志新施伯乐《计算机研究与发展》1999,36(2):209-213

文中研究了约束数据库中线性序约束关系的规范表达。提出一种线性序约束元组的表结构规约形式,增加了线性冗余和变量可约减两条新的规约原则,并给线性序元组规约算法ＬＣＴＲＡ,探讨了绝对点语义和复杂对象语言下线性序约束关系的规范型。相似文献

10.

Optimal reachability for multi-priced timed automata

Kim Guldstrand Larsen Jacob Illum Rasmussen 《Theoretical computer science》2008,390(2-3):197-213

In this paper, we prove the decidability of the minimal and maximal reachability problems for multi-priced timed automata, an extension of timed automata with multiple cost variables evolving according to given rates for each location. More precisely, we consider the problems of synthesizing the minimal and maximal costs of reaching a given target location. These problems generalize conditional optimal reachability, i.e., the problem of minimizing one primary cost under individual upper bound constraints on the remaining, secondary, costs, and the problem of maximizing the primary cost under individual lower bound constraints on the secondary costs. Furthermore, under the liveness constraint that all traces eventually reach the goal location, we can synthesize all costs combinations that can reach the goal.

The decidability of the minimal reachability problem is proven by constructing a zone-based algorithm that always terminates while synthesizing the optimal cost tuples. For the corresponding maximization problem, we construct two zone-based algorithms, one with and one without the above liveness constraint. All algorithms are presented in the setting of two cost variables and then lifted to an arbitrary number of cost variables. 相似文献

11.

General Properties and Termination Conditions for Soft Constraint Propagation

S. Bistarelli R. Gennari F. Rossi 《Constraints》2003,8(1):79-97

Soft constraints based on semirings are a generalization of classical constraints, where tuples of variables' values in each soft constraint are associated to elements from an algebraic structure called semiring. This framework is able to express, for example, fuzzy, classical, weighted, valued and over-constrained constraint problems.Classical constraint propagation has been extended and adapted to soft constraints by defining a schema for soft constraint propagation [8]. On the other hand, in [1–3] it has been proven that most of the well known constraint propagation algorithms for classical constraints can be cast within a single schema.In this paper we combine these two schemas and we provide a more general framework where the schema of [3] can be used for soft constraints. In doing so, we generalize the concept of soft constraint propagation, and we provide new sufficient and independent conditions for its termination. 相似文献

12.

Incremental Evaluation of Sliding-Window Queries over Data Streams

Ghanem T.M. Hammad M.A. Mokbel M.F. Aref W.G. Elmagarmid A.K. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(1):57-72

Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query revaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations 相似文献

13.

一种基于子元组划分的快速两维包分类算法

刘彤李华伟李晓维宫曙光《计算机研究与发展》2006,43(10):1797-1803

包分类对于支持如防火墙、攻击检测、差分服务等网络应用有着重要的意义．研究人员对此做了大量研究．其中基于Srinivasan提出的元组空间思想的算法都存在着不能够通过预查找的方法直接定位匹配规则的元组的问题,因此此类算法的平均查找性能不稳定．针对两维包分类,提出了将元组划分为子元组的准则,满足准则的子元组可以根据3个独立的一维查找结果确定是否包含匹配规则,通过消除不必要的元组查找来提高查找速度和获得稳定的查找性能．相似文献

14.

内存存储模型上的多表连接优化技术研究

下载免费PDF全文

张延松于利胜王珊陈红《计算机科学与探索》2010,4(6):531-541

分析了面向先进硬件平台上的数据库优化技术,提出了基于内存存储模型的多表连接查询处理优化技术,采用内存存储模型存储维表并对维表主键进行顺序化,从而使维表的主键与内存维表记录的内存偏移地址相一致,实现对维表记录的内存直接访问。通过列存储技术减少维表记录的访问宽度,进一步优化维表访问的cache性能。与基于SQL Server 2005的查询执行计划的连接算法、join index连接算法以及基于列存储模型的优化连接算法进行了实验比较和性能分析,结果表明:基于内存存储模型的多表连接算法在处理星型结构数据仓库多谓词、多连接的复杂查询时具有很好的性能,与join index相比不需要额外的空间开销,与列存储数据模型相比具有更好的兼容性和性能。相似文献

15.

The use of deleted tuples in database querying and updating

D. Laurent V. Phan Luong N. Spyratos 《Acta Informatica》1997,34(12):905-925

The traditional approach to database querying and updating treats insertions and deletions of tuples in an asymmetric manner: if a tuple is inserted then, intuitively, we think of as being true and we use this knowledge in query and update processing; in contrast, if a tuple is deleted then we think of as being false but we do not use this knowledge at all! In this paper, we present a new approach to database querying and updating in which insertions and deletions of tuples are treated in a symmetric manner. Contrary to the traditional approach, we use both inserted and deleted tuples in our derivation algorithms. Our approach works as follows: if the deletion of a tuple is requested, then we mark as being deleted without removing it from the database; if the insertion of a tuple is requested, then we simply place in the database and remove all its marked subtuples. Derivation of tuples is done using two derivation rules under one constraint: a tuple is derived only if has no marked subtuples in the database. The derivation rules reflect relational projection and relational join. The main contribution of our work is to provide a method which allows insertion or deletion of a tuple over any relation scheme in a deterministic way. Received: 12 June 1995 / 19 February 1997 相似文献

16.

A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering

Randel Rodrigo Aloise Daniel Blanchard Simon J. Hertz Alain 《Data mining and knowledge discovery》2021,35(6):2341-2368

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and, in doing so, guide the algorithm to a more useful and meaningful solution. Such additional information often takes the form of a cannot-link constraint (i.e., two data points cannot be part of the same cluster) or a must-link constraint (i.e., two data points must be part of the same cluster). A key challenge for users of such constraints in semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely incorrect. In the present work, we introduce a method to score each must-link and cannot-link pairwise constraint as likely incorrect. Using synthetic experimental examples and real data, we show that the resulting impact score can successfully identify individual constraints that should be removed or revised.

相似文献

17.

On the Expressiveness of Probabilistic and Prioritized Data-retrieval in Linda

Mario Bravetti Roberto Gorrieri Roberto Lucchi Gianluigi Zavattaro 《Electronic Notes in Theoretical Computer Science》2005,128(5):39

Linda tuple-spaces coordination model does not allow to express a preference of tuples. In many applications we could be interested in indicating tuples that should be returned more frequently w.r.t. other ones, or even tuples with a low relevance that should be taken under consideration only if there is no tuple with a higher importance. We present an extension of the tuple-space model with quantitative information that permit to express such forms of preference. More precisely, we consider tuples decorated with a quantitative label. Such labels will be considered with two different semantics, one modeling probabilistic distribution of data retrieval and the other modeling priorities of tuples. Finally, we report all the results concerning the expressiveness gap between the standard model and the proposed extensions. We show that by adding probabilities the leader election problem can be solved. More surprisingly, the addition of priorities makes the model Turing complete, while we prove that this is not the case for the other two calculi. 相似文献

18.

Continuous monitoring of skylines over uncertain data streams

Xiaofeng Ding Xiang Lian Lei Chen Hai Jin 《Information Sciences》2012,184(1):196-214

Uncertain data are inevitable in many applications due to various factors such as the limitations of measuring equipment and delays in data updates. Although modeling and querying uncertain data have recently attracted considerable attention from the database community, there are still many critical issues to be resolved with respect to conducting advanced analysis on uncertain data. In this paper, we study the execution of the probabilistic skyline query over uncertain data streams. We propose a novel sliding window skyline model where an uncertain tuple may take the probability to be in the skyline at a certain timestamp t. Formally, a Wp-Skyline(p, t) contains all the tuples whose probabilities of becoming skylines are at least p at timestamp t. However, in the stream environment, computing a probabilistic skyline on a large number of uncertain tuples within the sliding window is a daunting task in practice. In order to efficiently calculate Wp-Skyline, we propose an efficient and effective approach, namely the candidate list approach, which maintains lists of candidates that might become skylines in future sliding windows. We also propose algorithms that continuously monitor the newly incoming and expired data to maintain the skyline candidate set incrementally. To further reduce the computation cost of deciding whether or not a candidate tuple belongs to the skyline, we propose an enhanced refinement strategy that is based on a multi-dimensional indexing structure combined with a grouping-and-conquer strategy. To validate the effectiveness of our proposed approach, we conduct extensive experiments on both real and synthetic data sets and make comparisons with basic techniques. 相似文献

19.

Programming mobile context-aware applications with TOTAM

《Journal of Systems and Software》2014

In tuple space approaches to context-aware mobile systems, the notion of context is defined by the presence or absence of certain tuples in the tuple space. Existing approaches define such presence either by collocation of devices holding the tuples or by replication of tuples across all devices. We show that both approaches can lead to an erroneous perception of context. Collocation ties the perception of context to network connectivity which does not always yield the expected result. Tuple replication can cause that a certain context is perceived even if the device has left the context a long time ago. We propose a tuple space approach in which tuples themselves carry a predicate that determines whether they are in the right context or not. We present a practical API for our approach and show its use by means of the implementation of various mobile applications. Benchmarks show that our approach can lead to a significant increase in performance compared to other approaches. 相似文献

20.

一种基于时间戳的简单表缩减算法

杨明奇李占山张家晨《软件学报》2019,30(11):3355-3363

表约束是一种外延的知识表示方法,每个约束在对应的变量集上列举出所有支持或禁止的元组.广义弧相容（generalized arc consistency,简称GAC）是求解约束满足问题应用最广泛的相容性.Simple Tabular Reduction（STR）是一类高效的维持GAC的算法.在回溯搜索中,STR动态地删除无效元组,降低了查找支持的开销,并拥有单位时间的回溯代价,在高元表约束上获得了广泛运用,并有大量基于STR的改进算法被提出,其中,元组集的压缩表示是目前研究较多的方法.同样基于动态维持元组集有效部分的思想,为STR提出一种检测并删除无效元组和为变量更新支持的算法,作用于原始表约束并拥有单位时间的回溯代价.实验结果表明,该算法在表约束上维持GAC的效率普遍高于现有的非基于压缩表示的STR算法,并且在一些实例上的效率高于最新的基于元组集压缩表示的STR算法. 相似文献