首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Parallel execution of hash joins in parallel databases   总被引:1,自引:0,他引:1  
We explore two important issues, processor allocation and the use of hash filters, to improve the parallel execution of hash joins. To exploit the opportunity of pipelining for hash join execution, a scheme to transform a bushy execution tree to an allocation tree is first devised. In an allocation tree, each node denotes a pipeline. Then, using the concept of synchronous execution time, processors are allocated to the nodes in the allocation tree in such a way that inner relations in a pipeline can be made available at approximately the same time. Also, the approach of hash filtering is investigated to further improve the parallel execution of hash joins. Extensive performance studies are conducted via simulation to demonstrate the importance of processor allocation and to evaluate various schemes using hash filters. It is experimentally shown that processor allocation is, in general, the dominant factor of performance, and the effect of hash filtering becomes more prominent as the number of relations in a query increases  相似文献   

2.
The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data set.  相似文献   

3.
Existing methods for spatial joins require pre-existing spatial indices or other precomputation, but such approaches are inefficient and limited in generality. Operand data sets of spatial joins may not all have precomputed indices, particularly when they are dynamically generated by other selection or join operations. Also, existing spatial indices are mostly designed for spatial selections, and are not always efficient for joins. This paper explores the design and implementation of seeded trees, which are effective for spatial joins and efficient to construct at join time. Seeded trees are R-tree-like structures, but divided into seed levels and grown levels. This structure facilitates using information regarding the join to accelerate the join process, and allows efficient buffer management. In addition to the basic structure and behavior of seeded trees we present techniques for efficient seeded tree construction, a new buffer management strategy to lower I/O costs, and theoretical analysis for choosing algorithmic parameters. We also present methods for reducing space requirements and improving the stability of seeded tree performance with no additional I/O costs. Our performance studies show that the seeded tree method outperforms other tree-based methods by far both in terms of the number disk pages accessed and weighted I/O costs. Further, its performance gain is stable across different input data, and its incurred CPU penalties are also lower  相似文献   

4.
In this paper, we propose the approach of using multiple hash tables for lock requests with different data access patterns to minimize the number of false contentions in a data sharing environment. We first derive some theoretical results on using multiple hash tables. Then, in light of these derivations, a two-step procedure to design multiple hash tables is developed. In the first step, data items are partitioned into a given number of groups. Each group of data items is associated with the use of a hash table in such a way that lock requests to data items in the same group will be hashed into the same hash table. In the second step, given an aggregate hash table size, the hash table size for each individual data group is optimally determined so as to minimize the number of false contentions. Some design examples and remarks on the proposed method are given. It is observed from real database systems that different data sets usually have their distinct data access patterns, thus resulting in an environment where this approach can offer significant performance improvement  相似文献   

5.
Network filtering is a challenging area in high-speed computer networks, mostly because lots of filtering rules are required and there is only a limited time available for matching these rules. Therefore, network filters accelerated by field-programmable gate arrays (FPGAs) are becoming common where the fast lookup of filtering rules is achieved by the use of hash tables. It is desirable to be able to fill-up these tables efficiently, i.e. to achieve a high table-load factor in order to reduce the offline time of the network filter due to rehashing and/or table replacement. A parallel reconfigurable hash function tuned by an evolutionary algorithm (EA) is proposed in this paper for Internet Protocol (IP) address filtering in FPGAs. The EA fine-tunes the reconfigurable hash function for a given set of IP addresses. The experiments demonstrate that the proposed hash function provides high-speed lookup and achieves a higher table-load factor in comparison with conventional solutions.  相似文献   

6.
Universal hash functions for an infinite universe and hash trees   总被引:1,自引:0,他引:1  
In this note we describe the adaptation of the universal hashing idea to an infinite universe, and to prefix hash trees. These are a structure underlying all extendible hash methods, which have up to now only been studied under the uniform hashing assumption.  相似文献   

7.
In this paper, we observe that in the seminal work on indifferentiability analysis of iterated hash functions by Coron et al. and in subsequent works, the initial value $(IV)$ of hash functions is fixed. In addition, these indifferentiability results do not depend on the Merkle–Damg?rd (MD) strengthening in the padding functionality of the hash functions. We propose a generic $n$ -bit-iterated hash function framework based on an $n$ -bit compression function called suffix-free-prefix-free (SFPF) that works for arbitrary $IV$ s and does not possess MD strengthening. We formally prove that SFPF is indifferentiable from a random oracle (RO) when the compression function is viewed as a fixed input-length random oracle (FIL-RO). We show that some hash function constructions proposed in the literature fit in the SFPF framework while others that do not fit in this framework are not indifferentiable from a RO. We also show that the SFPF hash function framework with the provision of MD strengthening generalizes any $n$ -bit-iterated hash function based on an $n$ -bit compression function and with an $n$ -bit chaining value that is proven indifferentiable from a RO.  相似文献   

8.
We propose a new algorithm, called Stripe-join, for performing a join given a join index. Stripe-join is inspired by an algorithm called ‘Jive-join’ developed by Li and Ross. Stripe-join makes a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a set of temporary files that contain tuple identifiers but no input tuples. Stripe-join performs this efficiently even when the input relations are much larger than main memory, as long as the number of blocks in main memory is of the order of the square root of the number of blocks in the participating relations. Stripe-join is particularly efficient for self-joins. To our knowledge, Stripe-join is the first algorithm that, given a join index and a relation significantly larger than main memory, can perform a self-join with just a single pass over the input relation and without storing input tuples in intermediate files. Almost all the I/O is sequential, thus minimizing the impact of seek and rotational latency. The algorithm is resistant to data skew. It can also join multiple relations while still making only a single pass over each input relation. Using a detailed cost model, Stripe-join is analyzed and compared with competing algorithms. For large input relations, Stripe-join performs significantly better than Valduriez's algorithm and hash join algorithms. We demonstrate circumstances under which Stripe-join performs significantly better than Jive-join. Unlike Jive-join, Stripe-join makes no assumptions about the order of the join index.  相似文献   

9.
TYPHOON is a capability-aware peer-to-peer (P2P) system. It exploits the heterogeneity of nodes in the system based on the concept of virtual homes. Nodes participating in the system are classified as good and inactive. TYPHOON uses resources provided by good peers. It is thus more reliable and agile than a naive structured P2P system. When a good peer is overloaded, it picks a suitable inactive node and migrates some loads (i.e., virtual homes) to that node. However, migration of virtual homes may cause instability in the system. TYPHOON thus incorporates a mechanism for tracking virtual homes. A migrated home can receive states of relevant homes using an adaptive, logical tree structure that can also react to system heterogeneity, node loading and network locality. A migrated home can also proactively discover the state of an interested home to ensure the correctness of lookup. We evaluate TYPHOON using theoretical and simulation analysis. We also benchmark TYPHOON using a prototype system on 34 desktop PCs. The results all confirm the effectiveness of TYPHOON.  相似文献   

10.
In this paper, we present a preimage attack on reduced versions of Keccak hash functions. We use our recently developed toolkit CryptLogVer for generating the conjunctive normal form, CNF, which is passed to the SAT solver PrecoSAT. We found preimages for some reduced versions of the function and showed that full Keccak function has a comfortable security margin against this kind of attack.  相似文献   

11.
非易失性内存(NVM)因其大容量、持久化、按位存取和读延迟低等特性而受到人们的关注,但它同时也具有写次数有限、读写速度不均衡等缺点.针对传统线性哈希索引直接在NVM上实现时会导致大量的随机写操作这一问题,提出了一种新的NVM友好的线性哈希索引NVM-LH.NVM-LH通过存储数据时的缓存行对齐实现了缓存友好性,同时提出...  相似文献   

12.
Current data repositories include a variety of data types, including audio, images, and time series. State-of-the-art techniques for indexing such data and doing query processing rely on a transformation of data elements into points in a multidimensional feature space. Indexing and query processing then take place in the feature space. We study algorithms for finding relationships among points in multidimensional feature spaces, specifically algorithms for multidimensional joins. Like joins of conventional relations, correlations between multidimensional feature spaces can offer valuable information about the data sets involved. We present several algorithmic paradigms for solving the multidimensional join problem and we discuss their features and limitations. We propose a generalization of the size separation spatial join algorithm, named multidimensional spatial join (MSJ), to solve the multidimensional join problem. We evaluate MSJ along with several other specific algorithms, comparing their performance for various dimensionalities on both real and synthetic multidimensional data sets. Our experimental results indicate that MSJ, which is based on space filling curves, consistently yields good performance across a wide range of dimensionalities  相似文献   

13.
This paper gives a survey of MAC algorithms and hash functions. It describes the main properties, summarizes the most important constructions and reports on their security.  相似文献   

14.
提出了一种基于可并行和变参数的混沌分段线性映射hash函数算法。该函数通过明文扩展将并行处理的明文消息矩阵元素信息关联起来,实现了并行性。由矩阵元素位置标号决定的可变参数和矩阵元素相应的ASCII码值分别作为混沌分段线性映射的输入参数和迭代次数来生成相应明文的中间hash值。最终的128 bit的hash值由中间hash值的异或而得到。计算机模拟表明,本算法具有较好的单向性、混乱、扩散性以及抗碰撞性,满足单向hash 函数的各项性能要求。  相似文献   

15.
为提高跨多个区域大数据存储效率,提出一种基于布隆(Bloom)滤波器(BF)的海量数据存储空间部署策略.采用模糊交叉方法(FFBF),使用模糊交叉操作合并压缩两个Bloom滤波器,实现散列数据在两个Bloom滤波器的共享容纳,减少海量数据存储需求;利用双哈希计算k个哈希函数降低计算成本.实验结果表明,所提算法的误报受压...  相似文献   

16.
Many Geographic Information Systems (GIS) handle a large volume of geospatial data. Spatial joins over two or more geospatial datasets are very common operations in GIS for data analysis and decision support. However, evaluating spatial joins can be very time intensive due to the size of datasets. In this paper, we propose an interactive framework that provides faster approximate answers of spatial joins. The proposed framework utilizes two statistical methods: probabilistic join and sampling based join. The probabilistic join method provides speedup of two orders of magnitude with no correctness guarantee, while the sampling based method provides an order of magnitude improvement over the full indexing tree joins of datasets and also provides running confidence intervals. The framework allows users to trade-off speed versus bounded accuracy, hence it provides truly interactive data exploration. The two methods are evaluated empirically with real and synthetic datasets.  相似文献   

17.
Computer-aided software engineering (CASE) tools are defined, and ten CASE tools are briefly overviewed. Individual presentations on the various tools follow. The focus is on structured analysis, design, and programming. Two of the tools (Cradle and JSP Workbench) are implementations of the Yourdon and Jackson methods for structured analysis, structured design, and programming; three tools (Time Bench, Card Tools, and Prosa) are for real-time systems development; four (Excelerator, Adagen, Smart System, and Software Through Pictures) are general-purpose front-end CASE tools and one (Virtual Software Factory) is a CASE tool for building CASE tools  相似文献   

18.
State-of-the-art molecular dynamics (MD) simulations generate massive datasets involving billion-vertex chemical bond networks, which makes data mining based on graph algorithms such as K-ring analysis a challenge. This paper proposes an algorithm to improve the efficiency of ring analysis of large graphs, exploiting properties of K-rings and spatial correlations of vertices in the graph. The algorithm uses dual-tree expansion (DTE) and spatial hash-function tagging (SHAFT) to optimize computation and memory access. Numerical tests show nearly perfect linear scaling of the algorithm. Also a parallel implementation of the DTE + SHAFT algorithm achieves high scalability. The algorithm has been successfully employed to analyze large MD simulations involving up to 500 million atoms.  相似文献   

19.
This paper proposes a category- and selection-enabled nearest neighbor join (NNJ) between relation r and relation s, with similarity on T and support for category attributes C and selection predicate θ. Our solution does not suffer from redundant fetches and index false hits, which are the main performance bottlenecks of current nearest neighbor join techniques.A category-enabled NNJ leverages the category attributes C for query evaluation. For example, the categories of relation r can be used to limit relation s accessed at most once. Solutions that are not category-enabled must process each category independently and end up fetching, either from disk or memory, the blocks of the input relations multiple times. A selection-enabled NNJ performs well independent of whether the DBMS optimizer pushes the selection down or evaluates it on the fly. In contrast, index-based solutions suffer from many index false hits or end up in an expensive nested loop.Our solution does not constrain the physical design, and is efficient for row- as well as column-stores. Current solutions for column-stores use late materialization, which is only efficient if the data is clustered on the category attributes C. Our evaluation algorithm finds, for each outer tuple r, the inner tuples that satisfy the equality on the category and have the smallest distance to r with only one scan of both inputs. We experimentally evaluate our solution using a data warehouse that manages analyses of animal feeds.  相似文献   

20.
Design and implementation of hardware efficient stream ciphers using hash functions and analysis of their periodicity and security are presented in this paper. The hash generation circuits used for the design and development of stream ciphers are low power, low hardware complexity Linear Feedback Shift Register (LFSR) based circuits. One stream cipher design uses LFSR based Toeplitz hash generation circuit together with LFSR keystream generator circuit, while the other design combines LFSR based filter generator circuit with LFSR based polynomial modular division circuit. Both designs possess good security and periodicity properties for the keystreams generated. The developed circuits can compete with the most popular classic LFSR based stream ciphers in hardware complexity at the same time providing additional advantage that the same circuit can be used for hash generation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号