共查询到20条相似文献,搜索用时 15 毫秒
1.
Lin E.T. Omiecinski E.R. Yalamanchili S. 《Knowledge and Data Engineering, IEEE Transactions on》1994,6(2):304-315
Optimizing large join queries that consist of many joins has been recognized as NP-hard. Most of the previous work focuses on a uniprocessor environment. In a multiprocessor, the location of each join adds another dimension to the complexity of the problem. In this paper, we examine the feasibility of exploiting the inherent parallelism in optimizing large join queries on a hypercube multiprocessor. This includes using the multiprocessor not only to answer the large join query but also to optimize it. We propose an algorithm to estimate the cost of a parallel large join plan. Three heuristics are provided for generating an initial solution, which is further optimized by an iterative local-improvement method. The entire process of parallel query optimization and execution is simulated on an Intel iPSC/2 hypercube machine. Our experimental results show that the performance of each heuristic depends on the characteristics of the query 相似文献
2.
Adaptive mesh refinement (AMR) is an increasingly important simulation methodology for many science and engineering problems.
AMR has the potential to generate highly resolved simulations efficiently by dynamically refining the computational mesh near
key numerical solution features. AMR requires more complex numerical algorithms and programming than uniform fixed mesh approaches.
Software libraries that provide general AMR functionality can ease these burdens significantly. A major challenge for library
developers is to achieve adequate flexibility to meet diverse and evolving application requirements. In this paper, we describe
the design of software abstractions for general AMR data management and parallel communication operations in SAMRAI, an object-oriented
C++
structured AMR (SAMR) library developed at Lawrence Livermore National Laboratory (LLNL). The SAMRAI infrastructure provides
the foundation for a variety of diverse application codes at LLNL and elsewhere. We illustrate SAMRAI functionality by describing
how its unique features are used in these codes which employ complex data structures and geometry. We highlight capabilities
for moving and deforming meshes, coupling multiple SAMR mesh hierarchies, and immersed and embedded boundary methods for modeling
complex geometrical features. We also describe how irregular data structures, such as particles and internal mesh boundaries,
may be implemented using SAMRAI tools without excessive application programmer effort.
This work was performed under the auspices of the US Department of Energy by University of California Lawrence Livermore National
Laboratory under contract number W-7405-Eng-48 and is released under UCRL-JRNL-214559. 相似文献
3.
Models of parallel computations are considered for a wide class of data processing programs. Properties of programs are investigated and approaches to parallelizing sequential data processing programs and designing parallel programs are proposed. Computation optimizing problems are formulated.Translated from Kibernetika, No. 4, pp. 1–8, 42, July–August, 1989. 相似文献
4.
The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Eleven representative parallel applications were selected as benchmarks. Software monitoring techniques were then used to collect execution traces. Based on the measurement results, both the computation and communication behavior of these parallel programs were investigated. The various time interval distributions were modeled by statistical functions which were verified by a nonlinear regression technique using the empirical data. The temporal and spatial localities of message destinations were also studied. A model for the temporal locality of message length was introduced and used to analyze the communication traces. A trace-drive simulation environment, which uses the communication patterns of the parallel programs as inputs, was developed to study the behavior of the communication hardware under real workload. Simulation results on DMA and link utilizations are reported 相似文献
5.
Resultants were originally developed in the 18th and 19th centuries to solve certain simple algebraic problems. Here we shall present some applications of resultants to several important problems in computational geometry, including the implicitization, inversion, and intersection of parametric rational polynomial curves and surfaces.This paper was first delivered orally at the International Conference on Engineering and Computer Graphics in Beijing, China, held in August 1984 相似文献
6.
Database query processing can benefit significantly from parallelism. Parallel database algorithms combine substantial CPU and I/O activity, memory requirements, and massive data exchange between processes, all of which must be considered to obtain optimal performance. Since parallel external sorting is a very typical example, we have focused on sorting to tune Volcano, a new query processing system. The purpose of the Volcano project is to provide efficient, extensible tools for query and request processing in novel application domains, particularly in object-oriented and scientific database systems, and for experimental database performance research. It includes all query processing algorithms conventionally used in relational database systems as well as several new ones, and can execute all of them in parallel. In this article, we present Volcano's parallel external sorting algorithm and a sequence of enhancements to improve its performance. We obtained very good absolute performance, 84 seconds for 100 MB of data, as well as near-linear speedup with sixteen CPUs and disks. Furthermore, these results were achieved on a shared-memory machine despite the common belief that parallel query processing is best implemented on distributed-memory systems. We detail our tuning measures and report on their effectiveness. 相似文献
7.
Resultants were originally developed in the 18th and 19th centuries to solve certain simple algebraic problems. Here we shall present some applications of resultants to several important problems in computational geometry, including the implicitization, inversion, and intersection of parametric rational polynomial curves and surfaces. 相似文献
8.
K. Yu. Gorbunov A. V. Seliverstov V. A. Lyubetsky 《Problems of Information Transmission》2012,48(2):185-192
In a space of dimension 30 we find a pair of parallel hyperplanes, uniquely determined by vertices of a unit cube lying on them, such that strictly between the hyperplanes there are no vertices of the cube, though there are integer points. A similar two-sided example is constructed in dimension 37. We consider possible locations of empty quadrics with respect to vertices of the cube, which is a particular case of a discrete optimization problem for a quadratic polynomial on the set of vertices of the cube. We demonstrate existence of a large number of pairs of parallel hyperplanes such that each pair contains a large number of points of a prescribed set. 相似文献
9.
Alan GeorgeJoseph W. H. LiuEsmond Ng 《Parallel Computing》1989,10(3):287-298
We consider the problem of reducing data traffic among processor nodes during the parallel factorization of a sparse matrix on a hypercube multiprocessor. A task assignment strategy based on the structure of an elimination tree is presented. This assignment is aimed at achieving load balancing among the processors and also reducing the amount of processor-to-processor data communication. An analysis of regular grid problems is presented, providing a bound on communication volume generated by the new strategy, and showing that the allocation scheme is optimal in the asymptotic sense. Some experimental results on the performance of this scheme are presented. 相似文献
10.
《Computers & Mathematics with Applications》2006,51(6-7):1057-1064
We present a new parallel algorithm for computing N point lagrange interpolation on an n-dimensional hypercube with total number of nodes p = 2n. Initially, we consider the case when N = p. The algorithm is extended to the case when only p (p fixed) processors are available, p < N. We assume that N is exactly divisible by p. By dividing the hypercube into subcubes of dimension two, we compute the products and sums appearing in Lagrange's formula in a novel way such that wasteful repetitions of forming products are avoided. The speed up and efficiency of our algorithm is calculated both theoretically and by simulating it over a network of PCs. 相似文献
11.
I-Ling Yen Bastani F.B. Taylor D.J. 《IEEE transactions on pattern analysis and machine intelligence》2001,27(3):193-207
Multiprocessor systems are widely used in many application programs to enhance system reliability and performance. However, reliability does not come naturally with multiple processors. We develop a multi-invariant data structure approach to ensure efficient and robust access to shared data structures in multiprocessor systems. Essentially, the data structure is designed to satisfy two invariants, a strong invariant, and a weak invariant. The system operates at its peak performance when the strong invariant is true. The system will operate correctly even when only the weak invariant is true, though perhaps at a lower performance level. The design ensures that the weak invariant will always be true in spite of fail-stop processor failures during the execution. By allowing the system to converge to a state satisfying only the weak invariant, the overhead for incorporating fault tolerance can be reduced. We present the basic idea of multi-invariant data structures. We also develop design rules that systematically convert fault-intolerant data abstractions into corresponding fault-tolerant versions. In this transformation, we augment the data structure and access algorithms to ensure that the system always converges to the weak invariant, even in the presence of fail-stop processor failures. We also design methods for the detection of integrity violations and for restoring the strong invariant. Two data structures, namely binary search tree and double-linked list, are used to illustrate the concept of multi-invariant data structures 相似文献
12.
13.
This paper describes a parallel implementation of the finite element method on a multiprocessor computer. The proposed strategy does not require the formation of global system equations. An element or substructure is mapped onto each processor of the multiple-instruction, multiple-data multiprocessing system. Throughout the program, each processor stores only the information relevant to its element (substructure) and generates the local stiffness matrix. A parallel element (substructure) oriented conjugate gradient procedure is employed to compute the displacements. Each processor then determines the strains and stresses for its associated element (substructure). A prototype implementation of this parallel finite element program strategy on a hypercube computer is discussed. Examples for both linear and nonlinear analyses are presented. 相似文献
14.
A parallel ray tracing algorithm is presented. It subdivides the seene into 3D regions, the adjacency of which is modelled by a connectivity graph of regions. Since with each region is associated a ray tracing process, this graph becomes a graph of processes, the edges of which represent the communications between processes. This graph of processes is suitably mapped onto a hypercube topology so as to minimize the communication cost. Static load balancing is performed and solutions are brought to the problems of network congestion and termination.This work has been supported byC
3 and by the CCETT (Centre Commun d'Etudes de Télédiffusion et Télécommunications) under contract 86ME46 相似文献
15.
A theory of rational data structures is developed, making it possible to view various models of computation as data whose processing is determined by their semantic essence and to allow for the structure induced by the computational aspect of these data. Applications of rational data structures to the theory of formal languages are described.Translated from Kibernetika, No. 6, pp. 45–49, 61, November–December, 1989. 相似文献
16.
17.
Zhang X. Qin X. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(10):1059-1068
The efficiency of the basic operations of a NUMA (nonuniform memory access) multiprocessor determines the parallel processing performance on a NUMA multiprocessor. The authors present several analytical models for predicting and evaluating the overhead of interprocessor communication, process scheduling, process synchronization, and remote memory access, where network contention and memory contention are considered. Performance measurements to support the models and analyses through several numerical examples have been done on the BBN GP1000, a NUMA shared-memory multiprocessor. Analytical and experimental results give a comprehensive understanding of the various effects, which are important for the effective use of NUMA shared-memory multiprocessor. The results presented can be used to determine optimal strategies in developing an efficient programming environment for a NUMA system 相似文献
18.
In this article we discuss the detailed implementation of a parallel pseudospectral code for integration of the Navier-Stokes equations on an Intel iPSC/860 Hypercube. Issues related to the basic efficient parallelization of the algorithm on a hypercube are discussed, as well as optimization issues specific to the iPSC/860 system. With the combination of optimizations presented, the code runs on a 32-node iPSC/860 system at a speed exceeding that of the fastest implementation on a Cray YMP by nearly 25%. 相似文献
19.
There are two ways, other than the standard fast Fourier transform (FFT) algorithm, of computing Fourier transforms of real data, namely, (1)the real fast Fourier transform (RFFT) algorithm, and (2) the fast Hartley transform (FHT) algorithm. On a sequential computer, it has been shown that both the RFFT and the FHT algorithms are faster than the FFT algorithm. However, it is not obvious that the same is true on a parallel machine. The communication requirements of the RFFT and the FHT algorithms, which are critical to the cost of any parallel implementation, are different from those of the FFT algorithm. In this paper we present efficient implementations of the RFFT and the FHT algorithms on a hypercube machine. Experimental results are given for the implementation of the RFFT and the FHT algorithms on the NCUBE machine. 相似文献
20.
Parallel algorithms are given for determining geometric properties of systems of moving point-objects. The objects are assumed to be moving in a Euclidean space such that each coordinate of a point's motion is a polynomial of bounded degree in the time variable. The properties investigated include nearest (farthest) neighbor, closest (farthest) pair, collision, convex hull, diameter, and containment. Several of these properties are investigated from both the dynamic and steady-state points of view. Efficient, and often optimal, implementations of these algorithms are given for the mesh and hypercube.Research partially supported by a grant from the Niagara University Research Council.Research partially supported by National Science Foundation grants DCR-8608640 and IRI-8800514. 相似文献