首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
This paper introduces a model for parallel computation, called thedistributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network.We introduce the notion of aconservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to shortcut pointers in a data structure so that remote processors can communicate without causing undue congestion. We giveO(lgn)-step, linear-processor, linear-space, conservative algorithms for a variety of problems onn-node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions in an expression tree. We giveO(lg2 n)-step, linear-processor, linear-space, conservative algorithms for problems on graphs of sizen, including finding a minimum-cost spanning forest, computing biconnected components, and constructing an Eulerian cycle. Most of these algorithms use as a subroutine a generalization of the prefix computation to trees. We show that any suchtreefix computation can be performed inO(lgn) steps using a conservative variant of Miller and Reif's tree-contraction technique.This research was supported in part by the Defense Advanced Research Projects Agency under Contract N00014-80-C-0622 and by the Office of Naval Research under Contract N00014-86-K-0593. Charles Leiserson is supported in part by an NSF Presidential Young Investigator Award with matching funds provided by AT&T Bell Laboratories and Xerox Corporation. Bruce Maggs is supported in part by an NSF Fellowship.  相似文献   

The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length orientation  相似文献   

肖满  丁璐  张怡 《计算机工程与科学》2020,42(12):2252-2258
This paper studies a semi-online hierarchical scheduling problem on three identical machines. In the problem, there is only one machine with hierarchy 1 and two machines with hierarchy 2, and the goal is to minimize the makespan. When the total size of low-hierarchy is known, an online algorithm with the competitive ratio of 5/3 and the lower bound of 3/2 is given. When the total size of high-hierarchy is known, an online algorithm with the competitive ratio of 9/5 and the lower bound of 3/2 is given. When the total size of each hierarchy is known, an online algorithm with the competitive ratio of 3/2 and the lower bound of 4/3 is given. When the total size of jobs is known, a best possible online algorithm with the competitive ratio of 3/2 is given.  相似文献   

Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in their time complexities. These results improve the known bounds in the literature  相似文献   

We explore novel algorithms for DVS (Dynamic Voltage Scaling) based energy minimization of DAG (Directed Acyclic Graph) based applications on parallel and distributed machines in dynamic environments. Static DVS algorithms for DAG execution use the estimated execution time. The estimated time in practice is overestimated or underestimated. Therefore, many tasks may be completed earlier or later than expected during the actual execution. For overestimation, the extra available slack can be added to future tasks so that energy requirements can be reduced. For underestimation, the increased time may cause the application to miss the deadline. Slack can be reduced for future tasks to reduce the possibility of not missing the deadline. In this paper, we present novel dynamic scheduling algorithms for reallocating the slack for future tasks to reduce energy and/or satisfy deadline constraints. Experimental results show that our algorithms are comparable to static algorithms applied at runtime in terms of energy minimization and deadline satisfaction, but require considerably smaller computational overhead.  相似文献   

This paper proposes new heuristic distributed parallel algorithms for searching and planning,which are based on the concepts of wave concurrent propagations and competitive activation mechanisms.These algorithms are characterized by simplicity and clearness of control strategies for earching,and distinguished abilities in many aspects,such as high speed processing,wide suitability for searching AND/OR implicit graphs,and ease in hardware implementation.  相似文献   

We consider optimizations that are required for efficient execution of code segments that consist of loops over distributed data structures. The PARTI execution time primitives are designed to perform these optimizations and can be used to Implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to perform gather and scatter operations on distributed arrays. Communications patterns are derived at run time, and the appropriate send and receive messages are automatically generated.  相似文献   

We address the problem of scheduling jobs with family setup times on identical parallel machines to minimize total weighted flowtime. We present two dynamic programming algorithms — a backward algorithm and a forward algorithm — and we identify characteristics of problems where each algorithm is best suited. We also derive two properties that improve the computational efficiency of the algorithms.Scope and purposeWhile most production schedulers must balance conflicting goals of high system efficiency and timely completion of individual jobs, consideration of this conflict is underdeveloped in the scheduling literature. This paper examines a model that incorporates a fundamental cause of the efficiency/timeliness conflict in practice. We propose solution methodologies and properties of an optimal solution for the purpose of exposing insights that may ultimately be useful in research on more complex models.  相似文献   

We consider an online scheduling problem on two identical parallel machines with a single server. Jobs arrive one by one and each job has to be loaded by the server before being processed on one of the machines, and unloaded immediately by the server after its processing. Both loading and unloading times are equal to one time unit. The goal is to minimize the makespan. For the variant of the problem involving both loading and unloading operations, we present an online algorithm with competitive ratio of 5/3. For the variant with loading operation only, we show that the competitive ratio of list scheduling is at least 8/5 and provide an improved online algorithm with competitive ratio of 11/7. Finally, we discuss the lower bounds for these problems. We show that both variants have a lower bound of 3/2. Furthermore, we show that the lower bound of the first variant is at least 8/5 if the online algorithm satisfies a certain constraint.  相似文献   

A branch and bound algorithm (B&B) has been widely used in various discrete and combinatorial optimization fields. To obtain optimal solutions as soon as possible for scheduling problems, three tools, which are branching, bounding and dominance rules, have been developed in the B&B algorithm. One of these tools, a branching is a method for generating subproblems and directly determines size of solution to be searched in the B&B algorithm. Therefore, it is very important to devise effective branching scheme for the problem.In this note, a survey of branching schemes is performed for parallel machines scheduling (PMS) problems with n independent jobs and m machines and new branching schemes that can be used for identical and unrelated PMS problems, respectively, are suggested. The suggested branching methods show that numbers of generated subproblems are much smaller than that of other methods developed earlier and therefore, it is expected that they help to reduce a lot of CPU time required to obtain optimal solutions in the B&B algorithm.  相似文献   

《Parallel Computing》2004,30(5-6):677-697
Numerous parallel and distributed evolutionary algorithms (PDEAs) and their implementations have been proposed and are available on the Web. A robust approach to make easier their code and design reuse is the framework approach. In this paper, we present some existing frameworks for PDEAs and their development requirements, and propose a new C++ open source framework, named Parallel and distributed Evolving Objects (ParadisEO). ParadisEO is basically devoted to the reusable and flexible design of parallel and distributed metaheuristics, but we focus here only on PDEAs. Compared to other related frameworks, ParadisEO allows more reuse flexibility, and provides more implemented parallel and distributed models. Furthermore, these models can be exploited by the user in a transparent way, and deployed as well on shared memory multi-processors as on distributed memory machines. The architecture has been experimented on two real-world applications: the radio network design and the spectroscopic data mining. The experimental results demonstrate the efficiency and robustness of the different models.  相似文献   

Zhou  Qiang  Chen  Yu  Pan  Sinno Jialin 《Machine Learning》2020,109(3):569-601
Machine Learning - This work focuses on distributed optimization for multi-task learning with matrix sparsity regularization. We propose a fast communication-efficient distributed optimization...  相似文献   

建立一个适用于整数序列排序的数据分配模型,在多核计算节点组成的异构机群上设计通信高效的整数序列并行算法。所提出的数据分配模型依据机群中各节点不同的计算能力、通信速率和存储容量,动态计算出调度分配给各节点的数据块的大小以平衡各个节点的负载。所设计的并行排序算法利用整数序列的特性,主节点采取两轮分发数据与接收结果的方法,从节点运用分桶打包方式返回有序的整数子序列给主节点,主节点采用桶映射方法将各个有序子序列直接整合成最终有序序列,以减少需要耗费较多通信时间的数据归并操作。分析与实验测试结果表明,给出的多核机群上的整数序列并行排序算法高效,具有良好的可扩展性。  相似文献   

Several commercial hypercube parallel processors with the potential to deliver massive parallelism cost-effectively have been announced recently. They open the door to a wide variety of application areas that could benefit from parallelism. Computer vision is one of these application areas. This paper develops a general model for hypercube machines, and uses it to show how vision algorithms can be executed on hypercubes. In particular, the steps in the problem of thick-film inspection are used as a concrete example. The time needed to complete a typical inspection is used to demonstrate the performance of hypercube machines. Experimental results from a hypercube machine illustrate the potential use of such machines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号