期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance Measures for Evaluating Algorithms for SIMD Machines

《IEEE transactions on pattern analysis and machine intelligence》1982,(4):319-331

This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors. 相似文献

2.

Parallel processing with minicomputers

Alan Colvin 《Computer Communications》1980,3(3):127-131

The four general methods of parallel processing are described. These are: single instruction single data stream (SISD), multiple instruction single data stream (MISD), single instruction multiple data stream (SIMD), and multiple instruction multiple data stream (MIMD). Most single computers in use are SISD machines, while most parallel processing applications use the MIMD aproach. The paper outlines and compares the four basic MIMD architectures: tightly coupled, loosely coupled, voting, and peripheral processing. The latter is one of the most practical methods using existing minicomputers. Many modern programmable intelligent I/O controllers are peripheral processors. For a computer manufacturer to remain competitive, it is concluded that he will have to include such devices in his hardware. 相似文献

3.

Genetic algorithms for the optimal common due date assignment and the optimal scheduling policy in parallel machine earliness/tardiness scheduling problems 总被引：6，自引：0，他引：6

Liu Min Wu Cheng 《Robotics and Computer》2006,22(4):279-287

Earliness/tardiness scheduling problems with undetermined common due date which have wide application background in textile industry, mechanical industry, electronic industry and so on, are very important in the research fields such as industry engineering and CIMS. In this paper, a kind of genetic algorithm based on sectional code for minimizing the total cost of assignment of due date, earliness and tardiness in this kind of scheduling problem is proposed to determine the optimal common due date and the optimal scheduling policy for determining the job number and their processing order on each machine. Also, simulated annealing mechanism and the iterative heuristic fine-tuning operator are introduced into the genetic algorithm so as to construct three kinds of hybrid genetic algorithms with good performance. Numerical computational results focusing on the identical parallel machine scheduling problem and the general parallel machine scheduling problem shows that these algorithms outperform heuristic procedures, and fit for larger scale parallel machine earliness/tardiness scheduling problem. Moreover, with practical application data from one of the largest cotton colored weaving enterprises in China, numerical computational results show that these genetic algorithms are effective and robust, and that especially the performance of the hybrid genetic algorithm based on simulated annealing and the iterative heuristic fine-tuning operator is the best among them. 相似文献

4.

Sparse QR factorization on a massively parallel computer 总被引：1，自引：0，他引：1

Steven G. Kratzer 《The Journal of supercomputing》1992,6(3-4):237-255

This paper shows that QR factorization of large, sparse matrices can be performed efficiently on massively parallel SIMD (single instruction stream/multiple data stream) computers such as the Connection Machine CM-2. The problem is cast as a dataflow graph, whose nodes are mapped to a virtual dataflow machine in such a way that only nearest-neighbor communication is required. This virtual machine is implemented by programming the CM-2 processors to support a restricted dataflow protocol. Execution results for several test matrices show that good performance can be obtained without relying on nested dissection techniques. 相似文献

5.

A parallel processing algorithm for nonlinear programming problems

Y. Tsujimura K. Ida M. Gen 《Computers & Industrial Engineering》1992,23(1-4):305-307

Algorithms for solving multiple criteria nonlinear programming problems are frequently based on the use of the generalized reduced gradient (GRG) metod. Since the GRG method gives complex and large size processing for computation, it takes much time to solve large-scale multiple criteria nonlinear programming problems. Therefore, parallel processing dealing with the GRG method is required to solve the problems. We propose a parallel processing algorithm for the GRG method under multiple processors systems. 相似文献

6.

Parallel structure-aware halftoning

Huisi Wu Tien-Tsin Wong Pheng-Ann Heng 《Multimedia Tools and Applications》2013,67(3):529-547

Structure-aware halftoning technique is one of the state-of-the-art algorithms for generating structure-preserving bitonal images. However, the slow optimization process prohibits its real-time application. This is due to its high computational cost of similarity measurement and iterative refinement. Unfortunately, the structure-aware halftoning cannot be straightforwardly parallelized due to its data dependency nature. In this paper, we propose a parallel algorithm to boost the optimization of the structure-aware halftoning. Our main idea is to exploit the spatial independence during the evaluation of the objective function and temporal independence among the iterations. Specifically, we introduce a parallel Poisson-disk algorithm during the selection of pixel swaps, which guarantees the independency between parallel processes. Graphics processing unit (GPU) implementation of the technique leads to a significant speedup without sacrificing the quality. Our experiments demonstrate the effectiveness of the proposed parallel algorithm in generating structure-preserving bitonal images with much less time, especially for large images. 相似文献

7.

一种基于SMP的并行逐次超松弛迭代法

胡长军魏硕张纪林王珏《计算机研究与发展》2007,44(10):1688-1693

逐次超松弛迭代方法被广泛应用于油藏数值模拟中压力方程的求解.其并行实现是提高模拟速度的重要途径.传统并行方案大都只是在一次迭代内进行数据划分,而没有进一步将数据划分与迭代空间划分相结合,故针对SOR算法和SMP（symmetric multi-processors）系统的特点,以OpenMP为并行化实现工具,提出了基于SMP的并行逐次超松弛迭代方法（parallelSOR）.方法通过改变不同迭代步内数据点的更新次序,使不同区域内的数据点可以并行执行多次迭代.总结出针对三维油藏区域在数据空间划分和迭代空间合并上相对较优的策略,分析了迭代过程中网格块的生长形状.与传统的并行策略相比,该方法具有可减小同步开销、改进数据局部性、cache命中率高等优点.实验结果表明,该方法具有较高的加速比和效率. 相似文献

8.

FFT algorithms for SIMD parallel processing systems

《Journal of Parallel and Distributed Computing》1986,3(1):48-71

SIMD (single instruction stream - multiple data stream) algorithms for one- and two-dimensional discrete Fourier transforms are presented. Parallel structurings of algorithms for efficient computation for a variety of machine size/problem size combinations are presented and analyzed. Through these algorithms, techniques for exploiting relationships between problem size and machine size are demonstrated. The algorithms are evaluated in terms of the number of arithmetic operations and interprocessor data transfers required. The ability of various interconnection networks presented in the literature to perform the needed transfers is examined. It is shown that the efficiency of a particular data distribution/algorithm decomposition approach is a function of the machine-size/problem-size relationship. 相似文献

9.

Integrated range comparison for data-parallel compilation systems

Xian-He Sun Pantano M. Fahringer T. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(5):448-458

A major difficulty in restructuring compilation, and in parallel programming in general, is how to compare parallel performance over a range of system and problem sizes. Execution time varies with system and problem size and an initially fast implementation may become slow when system and problem size scale up. This paper introduces the concept of range comparison. Unlike conventional execution time comparison in which performance is compared for a particular system and problem size, range comparison compares the performance of programs over a range of ensemble and problem sizes via scalability and performance crossing point analysis. A novel algorithm is developed to predict the crossing point automatically. The correctness of the algorithm is proven and a methodology is developed to integrate range comparison into restructuring compilations for data-parallel programming. A preliminary prototype of the methodology is implemented and tested under Vienna Fortran Compilation System. Experimental results demonstrate that range comparison is feasible and effective. It is an important asset for program evaluation, restructuring compilation, and parallel programming 相似文献

10.

Machine scheduling with job class setup and delivery considerations

T.C.E. Cheng Xiuli Wang 《Computers & Operations Research》2010,37(6):1123-1128

We study machine scheduling problems in which the jobs belong to different job classes and they need to be delivered to customers after processing. A setup time is required for a job if it is the first job to be processed on a machine or its processing on a machine follows a job that belongs to another class. Processed jobs are delivered in batches to their respective customers. The batch size is limited by the capacity of the delivery vehicles and each shipment incurs a transport cost and takes a fixed amount of time. The objective is to minimize the weighted sum of the last arrival time of jobs to customers and the delivery (transportation) cost. For the problem of processing jobs on a single machine and delivering them to multiple customers, we develop a dynamic programming algorithm to solve the problem optimally. For the problem of processing jobs on parallel machines and delivering them to a single customer, we propose a heuristic and analyze its performance bound. 相似文献

11.

Successive overrelaxation for support vector machines 总被引：36，自引：0，他引：36

Mangasarian O.L. Musicant D.R. 《Neural Networks, IEEE Transactions on》1999,10(5):1032-1037

Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs is used to train a support vector machine (SVM) for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a time, similar to Platt's sequential minimal optimization (SMO) algorithm (1999) which handles two constraints at a time and Joachims' SVM(light) (1998) which handles a small number of points at a time, SOR can process very large datasets that need not reside in memory. The algorithm converges linearly to a solution. Encouraging numerical results are presented on datasets with up to 10 000 000 points. Such massive discrimination problems cannot be processed by conventional linear or quadratic programming methods, and to our knowledge have not been solved by other methods. On smaller problems, SOR was faster than SVM(light) and comparable or faster than SMO. 相似文献

12.

Scalable parallel Benders decomposition for stochastic linear programming

《Parallel Computing》1997,23(8):1069-1088

We develop a scalable parallel implementation of the classical Benders decomposition algorithm for two-stage stochastic linear programs. Using a primal-dual, path-following algorithm for solving the scenario subproblems we develop a parallel implementation that alleviates the difficulties of load balancing. Furthermore, the dual and primal step calculations can be implemented using a data-parallel programming paradigm. With this approach the code effectively utilizes both the multiple, independent processors and the vector units of the target architecture, the Connection Machine CM-5. The, usually limiting, master program is solved very efficiently using the interior point code LoQo on the front-end workstation. The implementation scales almost perfectly with problem and machine size. Extensive computational testing is reported with several large problems with up to 2 million constraints and 13.8 million variables. 相似文献

13.

基于工作站机群并行求解有限元线性方程组 总被引：2，自引：0，他引：2

付朝江《计算机工程与设计》2008,29(24)

随着计算机高速网络技术的发展,工作站机群正在成为并行计算的主要平台.有限元线性方程组在土木工程结构分析中是最常见的问题.预处理共轭梯度法(PCGM)是求解线性方程组的迭代方法.对预处理共轭梯度法进行并行化并在两个不同的机群上实现,对存储方式进行详细分析,编程中采用了稀疏矩阵向量相乘的优化技术.数值结果表明,设计的并行算法具有良好的加速比和并行效率,说明并行计算能更快地求解大规模问题. 相似文献

14.

Parallelizing a GIS on a shared address space architecture

Shekhar S. Ravada S. Kumar V. Chubb D. Turner G. 《Computer》1996,29(12):42-48

We are developing a high-performance GIS (our term for a parallel GIS) on an SGI Challenge, a 16-processor machine with a shared address space architecture (SASA). We describe how we parallelized a key GIS operation using a message-passing algorithm. We advocate the linking of two diverse approaches to the design of parallel architectures and algorithms. As part of our project, we evaluated the effect of parallelizing an important GIS operation: range query. We parallelized a range query using data partitioning (to reduce synchronization) and dynamic load balancing (to improve speedup). We found that these approaches do achieve the performance required for many GIS applications 相似文献

15.

Scheduling a single machine with concurrent jobs for the frozen food industry

《Computers & Industrial Engineering》2015

This study examines the air blast freezing process of the frozen food industry, which processes multiple products with variable processing rates. The analysis depicts a new, single machine-scheduling problem in which the machine can process multiple jobs concurrently, within its capacity. The machine processes independent jobs arriving at various times while incurring interruption costs when allowing the jobs to enter or leave the machine. A mixed integer linear programming (MILP) model and a heuristic algorithm are developed for scheduling, the objectives of which are to minimize the costs associated with machine activities including that of waiting to load, waiting to unload and interruption time. The heuristic algorithm demonstrates the high potential of the computational time savings by obtaining the solution within one-fifth of the mathematical model computational time. 相似文献

16.

A scheduling problem in blocking hybrid flow shop robotic cells with multiple robots

Atabak Elmi Seyda Topaloglu 《Computers & Operations Research》2013

This paper addresses the robotic scheduling problem in blocking hybrid flow shop cells that consider multiple part types, unrelated parallel machines, multiple robots and machine eligibility constraints. Initially, a mixed integer linear programming (MILP) model is proposed to minimize the makespan for this problem. Due to the complexity of the model, a simulated annealing (SA) based solution approach is developed for its solution. To increase the efficiency of the SA algorithm, a new neighborhood structure based on block properties is applied. The performance of the proposed SA is assessed over a set of randomly generated instances. The computational results demonstrate that the SA algorithm is effective with the employed neighborhood structure. Additionally, this study shows that the appropriate number of robots depends on the sequence of processing operations to be performed at each stage. 相似文献

17.

The Effect of Process Topology and Load Balancing on Parallel Programming Models for SMP Clusters and Iterative Algorithms

Nikolaos Drosinos Nectarios Koziris 《The Journal of supercomputing》2006,35(1):65-91

This article focuses on the effect of both process topology and load balancing on various programming models for SMP clusters and iterative algorithms. More specifically, we consider nested loop algorithms with constant flow dependencies, that can be parallelized on SMP clusters with the aid of the tiling transformation. We investigate three parallel programming models, namely a popular message passing monolithic parallel implementation, as well as two hybrid ones, that employ both message passing and multi-threading. We conclude that the selection of an appropriate mapping topology for the mesh of processes has a significant effect on the overall performance, and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm. We also propose static load balancing techniques for the computation distribution between threads, that diminish the disadvantage of the master thread assuming all inter-process communication due to limitations often imposed by the message passing library. Both improvements are implemented as compile-time optimizations and are further experimentally evaluated. An overall comparison of the above parallel programming styles on SMP clusters based on micro-kernel experimental evaluation is further provided, as well. 相似文献

18.

并行化洪水演进模拟研究综述

下载免费PDF全文

李健张大伟姜晓明向立云《计算机工程与应用》2021,57(13):1-7

近年来,并行化洪水演进模拟技术发展迅速,在防汛减灾领域发挥重要作用。在考虑洪水演进模型的数值方法、并行模式和编程技术等因素后,选取一些有代表性的洪水演进模型,分析了同构并行和异构并行洪水演进模型涉及的技术细节,提出并行化模型开发的技术难点和解决方法。最后,提出将来并行化洪水演进模型研发的着力点：非结构网格模型的异构并行化;混合并行的洪水演进模型;适于GPU异构并行的网格形式;并行环境下的实时可视化和交互式计算;基于动态编程语言的模型开发;界面式开发及模型应用推广。相似文献

19.

Parallel implementation of bisection for the calculation of eigenvalues of tridiagonal symmetric matrices

H. J. Bernstein M. Goldstein 《Computing》1986,37(1):85-91

Kuck and Sameh [9], Huang [8], and Wallach [10] have investigated parallel implementations of Givens' bisection algorithm [4, 5]. On a MIMD (multiple instruction stream — multiple data stream) machine one could apply parallelism on any or all of three levels: within each Sturm sequence calculation, within each individual bisection, or on the outer level assigning intervals to be searched. We show that allocating bisections to individual processors on the outer level is more effective than having processors share the work of a bisection when the number of processors is smaller than the number of eigenvalues to be found. 相似文献

20.

Scheduling jobs on a single batch processing machine with incompatible job families and weighted number of tardy jobs objective

Stéphane Dauzère-Pérès Lars Mönch 《Computers & Operations Research》2013

In this paper, we minimize the weighted and unweighted number of tardy jobs on a single batch processing machine with incompatible job families. We propose two different mixed integer linear programming (MILP) formulations based on positional variables. The second formulation does not contain a big-M coefficient. Two iterative schemes are discussed that are able to provide tighter linear programming bounds by reducing the number of positional variables. Furthermore, we also suggest a random key genetic algorithm (RKGA) to solve this scheduling problem. Results of computational experiments are shown. The second MILP formulation is more efficient with respect to lower bounds, while the first formulation provides better upper bounds. The iterative scheme is effective for the weighted case. The RKGA is able to find high-quality solutions in a reasonable amount of time. 相似文献