期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient index-based KNN join processing for high-dimensional data

《Information and Software Technology》2007,49(4):332-344

In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN join between two datasets, R and S, returns for each point in R its K most similar points in S. We propose a new index-based KNN join approach using the iDistance as the underlying index structure. We first present its basic algorithm and then propose two different enhancements. In the first enhancement, we optimize the original KNN join algorithm by using approximation bounding cubes. In the second enhancement, we exploit the reduced dimensions of data space. We conducted an extensive experimental study using both synthetic and real datasets, and the results verify the performance advantage of our schemes over existing KNN join algorithms. 相似文献

2.

Efficient algorithms for parallel sorting on mesh multicomputers

V. Singh V. Kumar G. Agha C. Tomlinson 《International journal of parallel programming》1991,20(2):95-131

We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al. ⁽¹⁾ and Schnorret al. ⁽²⁾ have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort. 相似文献

3.

Hypercube algorithms on mesh connected multicomputers

de Cerio L.D. Valero-Garcia M. Gonzalez A. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1247-1260

A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes. 相似文献

4.

Efficient scheduling of page access in index-based join processing 总被引：1，自引：0，他引：1

Chee Yong Chan Beng Chin Ooi 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(6):1005-1011

The paper examines the issue of scheduling page accesses in join processing, and proposes new heuristics for the following scheduling problems: 1) an optimal page access sequence for a join such that there are no page reaccesses using the minimum number of buffer pages, and 2) an optimal page access sequence for a join such that the number of page reaccesses for a given number of buffer pages is minimum. The experimental performance results show that the new heuristics perform better than existing heuristics for the first problem and also perform better for the second problem, provided that the number of available buffer pages is not much less than the optimal buffer size 相似文献

5.

Parallel image understanding algorithms on MIMD multicomputers

A. Petrosino E. Tarantino 《Computing》1998,60(2):91-107

The heterogeneous nature of data types and computational structures involved in Computer Vision algorithms make the design and implementation of massively parallel image processing systems a not yet fully solved problem. It is common belief that in the next future MIMD architectures with their high degree of flexibility will play a very important role in this research area, by using a limited number of identical but powerful processing elements. The aim of this paper is to show how a selected list of algorithms in which a unique Image Understanding process can be decomposed could map onto a distributed-memory MIMD architecture. The operative modalities we adopt are the SPMD modality for the low level processing and the MIMD modality for the intermediate and high levels of processing. Either efficient parallel formulations of the algorithms with respect to the interconnection topology of processors and their optimized implementations on a target transputer-based architecture are reported. 相似文献

6.

Comments on `Hash-based and index-based join algorithms for cubeand ring connected multicomputers' by E.R. Omiecinski and E.T. Lin

Tien J.-Y. Yang W.-P. 《Knowledge and Data Engineering, IEEE Transactions on》1991,3(3):387-389

The cost distributions of both the parallel hybrid-hash join and the parallel join-index join algorithms proposed in the above-named work (ibid., vol.1, p.329-43, Sept. 1989) are presented in more detail. The result shows that almost the entire relation may need to be retrieved from disk, though the join selectivity is low. A table of semi-join selectives and cube sizes is given to show the condition that the join-index method performs better than the hybrid-hash method, i.e., the really low selectivity for the join-index method. An error in one of the cost formulas is corrected, and a more efficient method on the final join in the join-index method is proposed 相似文献

7.

Late join algorithms for distributed interactive applications

Jürgen?Vogel Email author Martin?Mauve Volker?Hilt Wolfgang?Effelsberg 《Multimedia Systems》2003,9(4):327-336

Distributed interactive applications such as shared whiteboards and multiplayer games often support dynamic groups where users may join and leave at any time. A participant joining an ongoing session has missed the data that have previously been exchanged by the other session members. It is therefore necessary to initialize the application instance of the latecomer with the current state. In this paper, we propose a late join algorithm for distributed interactive applications that provides such an initialization of applications. The algorithm is scalable and robust and can be easily adapted to the needs of different applications by means of late join policies. The behavior of the late join algorithm and the impact of design alternatives are investigated in detail by means of an extensive simulation study. This study also shows that an improper handling of the late join problem can cause very high application and network load. 相似文献

8.

Parallel hash-based join algorithms for a shared-everythingenvironment

Martin T.P. Larson P.-A. Deshpande V. 《Knowledge and Data Engineering, IEEE Transactions on》1994,6(5):750-763

Analyzes the costs, and describes the implementation, of three hash-based join algorithms for a general purpose shared-memory multiprocessor. The three algorithms considered are the hashed loops, GRACE and hybrid algorithms. We also describe the results of a set of experiments that validate the cost models presented and demonstrate the relative performance of the three algorithms 相似文献

9.

Local algorithms for circuits in a Boolean cube

V. M. Neguritsa 《Cybernetics and Systems Analysis》1993,29(6):837-841

Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 65–70, November–December, 1993. 相似文献

10.

GPU-based efficient join algorithms on Hadoop

Wang Hongzhi Li Ning Wang Zheng Li Jianing 《The Journal of supercomputing》2021,77(1):292-321

The Journal of Supercomputing - The growing data have brought tremendous pressure for query processing and storage, so there are many studies that focus on using GPU to accelerate join operation,... 相似文献

11.

Hash-based labeling techniques for storage scaling

Shu-Yuen D. Yao Cyrus Shahabi Per-Åke Larson 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(2):222-237

Scalable storage architectures allow for the addition or removal of storage devices to increase storage capacity and bandwidth or retire older devices. Assuming random placement of data objects across multiple storage devices of a storage pool, our optimization objective is to redistribute a minimum number of objects after scaling the pool. In addition, a uniform distribution, and hence a balanced load, should be ensured after redistribution. Moreover, the redistributed objects should be retrieved efficiently during the normal mode of operation: in one I/O access and with low complexity computation. To achieve this, we propose an algorithm called random disk labeling (RDL), based on double hashing, where storage can be added or removed without any increase in complexity. We compare RDL with other proposed techniques and demonstrate its effectiveness through experimentation.Received: 23 June 2003, Accepted: 16 February 2004, Published online: 23 June 2004Edited by: G. AlonsoThis research has been funded in part by NSF grants EEC-9529152 (IMSC ERC), IIS-0082826 (ITR), IIS-0238560 (CAREER), IIS-0324955 (ITR), and IIS-0307908 and unrestricted cash gifts from Okawa Foundation and Microsoft. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. 相似文献

12.

New plane-sweep algorithms for distance-based join queries in spatial databases

George Roumelis Antonio Corral Michael Vassilakopoulos Yannis Manolopoulos 《GeoInformatica》2016,20(4):571-628

Efficient and effective processing of the distance-based join query (DJQ) is of great importance in spatial databases due to the wide area of applications that may address such queries (mapping, urban planning, transportation planning, resource management, etc.). The most representative and studied DJQs are the K Closest Pairs Query (KCPQ) and εDistance Join Query (εDJQ). These spatial queries involve two spatial data sets and a distance function to measure the degree of closeness, along with a given number of pairs in the final result (K) or a distance threshold (ε). In this paper, we propose four new plane-sweep-based algorithms for KCPQs and their extensions for εDJQs in the context of spatial databases, without the use of an index for any of the two disk-resident data sets (since, building and using indexes is not always in favor of processing performance). They employ a combination of plane-sweep algorithms and space partitioning techniques to join the data sets. Finally, we present results of an extensive experimental study, that compares the efficiency and effectiveness of the proposed algorithms for KCPQs and εDJQs. This performance study, conducted on medium and big spatial data sets (real and synthetic) validates that the proposed plane-sweep-based algorithms are very promising in terms of both efficient and effective measures, when neither inputs are indexed. Moreover, the best of the new algorithms is experimentally compared to the best algorithm that is based on the R-tree (a widely accepted access method), for KCPQs and εDJQs, using the same data sets. This comparison shows that the new algorithms outperform R-tree based algorithms, in most cases. 相似文献

13.

Coarse-grain parallelizations of interval algorithms decomposing dense matrices and solving triangular systems on multicomputers

Dr. R. Reith Prof. Dr. C. P. Ullrich 《Computing》1994,53(3-4):243-257

The evaluation of scalar products with maximum accuracy plays an important role in computing inclusions for the solutions of linear systems. In this paper, we discuss this operation within the context of parallel algorithms for distributed-memory systems (multicomputers). We describe new variants for solving triangular systems of linear equations and for computing the LU factorization of matrices under the assumption that scalar products are implemented as single, indivisible operations and that no processor works on different scalar products simultaneously. All algorithms work in the real and interval case; the theoretical results are supplemented by measurements obtained from a transputer network. 相似文献

14.

Handling data skew in join algorithms using MapReduce

《Expert systems with applications》2016

One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins. 相似文献

15.

Efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers

Chun-Yuan Lin Yeh-Ching Chung Jen-Shiuh Liu 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):625-639

Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases. 相似文献

16.

The join algorithms on a shared-memory multiprocessor databasemachine

Qadah G.Z. Irani K.B. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(11):1668-1683

The authors develop and present a large set of parallel algorithms for implementing the join operation on a shared-memory multiprocessor database machine. The development of these algorithms follows a structured approach. The major steps involved in the processing of the join operation by the machine are first identified. Then, alternative join algorithms are constructed by concatenating the different ways of performing these steps. A study of the performance of the proposed algorithms is presented. This study shows, among other things, that for a given hardware configuration there is not just one overall best performing join algorithm, but rather different algorithms score the best performance, depending on the characteristics of the data participating in the join operation 相似文献

17.

Message-routing systems for transputer-based multicomputers

Talia D. 《Micro, IEEE》1993,13(3):62-72

The Tiny, CSN, Multiple Rings, and Ordered Dimensions, and interval labeling routing systems for transputer networks are reviewed. The systems are compared with respect to several criteria, such as adaptivity, deadlock freedom, generality, livelock freedom, and network latency 相似文献

18.

一种基于Hash函数的改进RFID安全认证协议

殷继沛张迎新赵利新《信息安全与技术》2013,(4):19-21,46

为了改善RFID系统中阅读器与标签通信的安全隐私问题,针对现有基于Hash函数的安全认证协议的不足,提出了一种改进安全认证协议。通过论证分析,该协议可以有效的提高RFID系统的安全性,具有效率高、标签成本低等特点。相似文献

19.

The Paradigm compiler for distributed-memory multicomputers 总被引：1，自引：0，他引：1

Banerjee P. Chandy J.A. Gupta M. Hodges E.W. IV. Holm J.G. Lain A. Palermo D.J. Ramaswamy S. Su E. 《Computer》1995,28(10):37-47

To harness the computational power of massively parallel distributed-memory multicomputers, users must write efficient software. This process is laborious because of the absence of global address space. The programmer must manually distribute computations and data across processors and explicitly manage communication. The Paradigm (PARAllelizing compiler for DIstributed-memory, General-purpose Multicomputers) project at the University of Illinois addresses this problem by developing automatic methods for the efficient parallelization of sequential programs. A unified approach efficiently supports regular and irregular computations using data and functional parallelism 相似文献

20.

Internal and external algorithms for the points-in-regions problem—the inside join of geo-relational algebra

Gabriele Blankenagel Ralf Hartmut Güting 《Algorithmica》1990,5(1-4):251-276

We consider the problem of collectively locating a set of points within a set of disjoint polygonal regions when neither for points nor for regions preprocessing is allowed. This problem arises in geometric database systems. More specifically it is equivalent to computing theinside join of geo-relational algebra, a conceptual model for geo-data management. We describe efficient algorithms for solving this problem based on plane-sweep and divide-and-conquer, requiringO(n(logn) +t) andO(n(log² n) +t) time, respectively, andO(n) space, wheren is the total number of points and edges, and (is the number of reported (point, region) pairs. Since the algorithms are meant to be practically useful we consider as well as the internal versions-running completely in main memory-versions that run internally but use much less than linear space and versions that run externally, that is, require only a constant amount of internal memory regardless of the amount of data to be processed. Comparing plane-sweep and divide-and-conquer, it turns out that divide-and-conquer can be expected to perform much better in the external case even though it has a higher internal asymptotic worst-case complexity. An interesting theoretical by-product is a new general technique for handling arbitrarily large sets of objects clustered on a singlex-coordinate within a planar divide-and-conquer algorithm and a proof that the resulting “unbalanced” dividing does not lead to a more than logarithmic height of the tree of recursive calls. 相似文献