首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The problem of optimally assigning the modules of a program over the processors of an inhomogeneous distributed processor system is analyzed. The objective is to assign modules, wherever possible, to the processors on which they execute most rapidly while taking into account the overhead of interprocessor communication. Factors contributing to the cost of an assignment are 1) the amount of computation required by each module, 2) the amount of data transmitted between each pair of modules, 3) the speed of each processor, and 4) the speed of the communication link between each pair of processors.  相似文献   

2.
For the parallel computer systems, a new formulation of the problem of constructing parallel asynchronous abstract programs of the desired length was proposed. The conditions for the problem of planning were represented as a system of Boolean equations (constraints) whose solutions define the feasible plans for activation of the program modules specified in the planner’s knowledge base. The constraints on the number of processors and time delays arising at execution of the program modules were taken into consideration.  相似文献   

3.
The problem of mapping the parallel bottom up execution of Datalog programs to an interconnected network of processors is studied. The parallelization is achieved by using hash functions that partition the set of instantiations for the rules. We first examine this problem in an environment where the number of processors and the interconnection topology is known, and communication between program segments residing at non-adjacent processors is not permitted. An algorithm is presented that decides whether a given Datalog program can be mapped onto such an architecture. We then relax the constraint on the architecture by allowing program segments residing at non-adjacent processors to communicate, A theory of approximate mappings is developed, and an algorithm to obtain the closest approximate mapping of a given Datalog program onto a given architecture is presented  相似文献   

4.
We consider the problem of finding an optimal assignment of the modules of a program to processors in a distributed system. A module incurs an execution cost that may be different for each processor assignment, and modules that are not assigned to the same processor but that communicate with one another incur a communication cost. An optimal assignment minimizes the sum of the module execution costs and the intermodule communication costs. This problem is known to be NP-complete for more than three processors. Using a branch-and-bound-with-underestimates algorithm to reduce the size of the search tree, we evaluate its average time and space complexity for two underestimating functions through simulation. The more complex of the two functions, called the minimum independent assignment cost underestimate (MIACU), performs extremely well over a wide range of values of program model parameters such as the number of modules, the number of processors, and the ratio of average module execution cost to average intermodule communication cost. By reordering the list of modules to allow a subset of modules that do not communicate with one another to be assigned last, further improvements using MIACU are possible.  相似文献   

5.
We consider the problem of optimally assigning the modules of a parallel/pipelined program over the processors of a multiple processor system under certain restrictions on the interconnection structure of the program as well as the multiple computer system. We show that for a variety of such problems, it is possible to find if a partition of the modular program exists in which the load on any processor is whithin a certain bound. This method when combined with a binary search over a fixed range, provides an optimal solution to the partitioning problem.The specific problems we consider are partitioning of (1) a chain structured parallel program over a chain-like computer system, (2) multiple chain-like programs over a host-satellite system, and (3) a tree structured parallel program over a host-satellite system.For a problem withN modules andM processors, the complexity of our algorithm is no worse thanO(Mlog(N)log(W T/)), whereW T is the cost of assigning all modules to one processors, and the desired accuracy. This algorithm provides an improvement over the recently developed best known algorithm that runs inO(MNlog(N)) time.This Research was supported by a grant from the Division of Research Extension and Advisory Services, University of Engineering and Technology Lahore, Pakistan. Further support was provided by NASA Contracts NAS1-17070 and NAS1-18107 while the author was resident at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, Virginia, USA.  相似文献   

6.
In distributed systems, an application program is divided into several software modules, which need to be allocated to processors connected by communication links. The distributed system reliability (DSR) could be defined as the probability of successfully completing the distributed program. Previous studies about optimal task allocation with respect to DSR focused on the effects of the inter-connectivity of processors, the failure rates of the processors, and the failure rates of the communication links. We are the first to study the effects of module software reliabilities and module execution frequencies on the optimal task allocation. By viewing each module as a state in the Markov process, we build a task allocation decision model to maximize DSR for distributed systems with 100% reliable network. In this model, the DSR is derived from the module software reliabilities, the processor hardware reliabilities, the transition probabilities between modules, and the task allocation matrix. Resource constraints of memory space limitation and computation load limitation on each processor are considered. The constraint of total system cost, including the execution cost, the communication cost, and the failure cost, is also considered. We solve the problem by Constraint Programming using the ILOG SOLVER library. We then apply the proposed model to a case extended from previous studies. Finally, a sensitivity analysis is performed to verify the effects of module software reliabilities and processor hardware reliabilities on the DSR and on the task allocation decision.  相似文献   

7.
8.
The paper presents a generic technique for mapping parallel algorithms onto parallel architectures. The proposed technique is a fast recursive mapping algorithm which is a component of the Cluster-M programming tool. The other components of Cluster-M are the Specification module and the Representation module. In the Specification module, for a given task specified by a high-level machine-independent program, a clustered task graph called Spec graph is generated. In the Representation module, for a given architecture or computing organization, a clustered system graph called Rep graph is generated. Given a task (or system) graph, a Spec (or Rep) graph can be generated using one of the clustering algorithms presented in the paper. The clustering is done only once for a given task graph (system graph) independent of any system graphs (task graphs). It is a machine-independent (application-independent) clustering, and therefore it is not repeated for different mappings. The Cluster-M mapping algorithm presented produces a sub-optimal matching of a given Spec graph containing M task modules, onto a Rep graph of N processors, in O(MN) time. This generic algorithm is suitable for both the allocation problem and the scheduling problem. Its performance is compared to other leading techniques. We show that Cluster-M produces better or similar results in significantly less time and using fewer or an equal number of processors as compared to the other known methods.  相似文献   

9.
In any distributed processing environment, decisions need to be made concerning the assignment of computational task modules to various processors. Many versions of the task allocation problem have appeared in the literature. Intertask communication makes the assignment decision difficult; capacity limitations at the processors increase the difficulty. This problem is naturally formulated as a nonlinear integer program, but can be linearized to take advantage of commercial integer programming solvers. While traditional approaches to linearizing the problem perform well when only a few tasks communicate, they have considerable difficulty solving problems involving a large number of intercommunicating tasks. This paper introduces new mixed integer formulations for three variations of the task allocation problem. Results from extensive computational tests conducted over real and generated data indicate that the reformulations are particularly efficient when a large number of tasks communicate, solving reasonablylarge problems faster than other exact approaches available.  相似文献   

10.
A problem of relevant interest to some industries is that of optimum two-dimensional layout. In this problem, one is given a number of rectangular sheets and an order for a specified number of each of certain types of two-dimensional regular and irregular shapes. The aim is to cut the shapes out of the sheets in such a way as to minimize the amount of waste produced. A two-stage solution is proposed in which the problem is converted from one of placing irregularly shaped pieces to one of allocating rectangular modules. The clustering algorithm used in the first stage to produce rectangular modules is presented and the results obtained when it was applied to some typical layout problems are described.  相似文献   

11.
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log2 m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log2 m log log m) time with mn/(log2 m log log m) processors when log2 m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper  相似文献   

12.
在基于MIPS R10000处理器构建采用簇总线的多处理器系统时,发现R10000用户手册给出的外部冲突解决方案只适用于采用专用EA的单或多处理器系统.鉴于此,介绍了R10000处理器的系统配置和系统接口的一致性,分析了R10000用户手册所给出的外部冲突解决方案的局限性,并基于该外部冲突解决方案,对采用簇总线的多处理器系统中的外部冲突进行了研究,给出了簇协调器可以采用的一个外部冲突解决方案.  相似文献   

13.
基于事件跟踪的并行程序性能分析,就是通过分析各处理器采集的事件记录、计算程序对象的执行时间和探究事件间的相互关系,来揭示程序的性能问题。这一工作要求各处理器采集的事件时戳必须具有可比性。由于各种原因,通过测量获得的事件时戳往往是不同步的,这直接影响性能分析工作的开展。介绍处理器时钟误差的概念及产生原因、并行程序性能分析中的测量误差、时钟条件和时戳同步需求,最后介绍一种基于恒定时钟漂移的线性误差插值技术,在一定程度上解决了并行程序性能分析中的时戳同步问题。  相似文献   

14.
We consider the problem of scheduling tasks on multiprocessor architectures in the presence of communication delays. Given a set of dependent tasks, the scheduling problem is to allocate the tasks to processors such that the pre-specified precedence constraints among the tasks are obeyed and certain cost-measures (such as the computation time) are minimized. Several cases of the scheduling problem have been proven to be NP-complete. Nevertheless, there are polynomial time algorithms for interesting special cases of the general scheduling problem. Most of these results, however, do not take into consideration the delays due to message passing among processors. In this paper we study the increase in time complexity of scheduling problems due to the introduction of communication delays. In particular, we address the open problem of scheduling Out-forests (In-forests) in a multiprocessor system of m identical processors when communication delays are considered. The corresponding problem of scheduling Out-forests (In-forests) without communication delays admits an elegant polynomial time solution as presented first by Hu in 1961; however, the problem in the presence of communication delays has remained unsolved. We present here first known polynomial time algorithms for the computation of the optimal schedule when the number of available processors is given and bounded and both computation and communication delays are assumed to take one unit of time. Furthermore, we present a linear-time algorithm for computing a near-optimal schedule for unit-delay out-forests. The schedule's length exceeds the optimum by no more than (m-2) time units, where m is the number of processors. Hence for two processors the computed schedule is strictly optimum  相似文献   

15.
An efficient strategy to allocate tasks to processors in distributed computer systems is required for proper utilization of available computational power. Various task allocation models developed so far have the basic assumption that an application program is suitably divided into a number of subtasks or modules and the execution costs of modules and intermodule communication costs are also known. The subdivision of a given program into modules is assumed to be carried out by a software mechanism which forms a separate discipline of research work.The task allocation models developed earlier did not consider various constraints like load balancing, precedence relations, timing constraints, etc. Most of the recent works in the formulation of task allocation strategies take into account a number of constraints. This paper presents a brief outline of the various available models and algorithms for task allocation with suggestions for possible future directions in the field.  相似文献   

16.
Benchmarking and Comparison of the Task Graph Scheduling Algorithms   总被引:2,自引:0,他引:2  
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather meaningless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories according to their assumptions and functionalities. We then propose a set of benchmarks that are based on diverse structures and are not biased toward a particular scheduling technique. We have implemented 15 scheduling algorithms and compared them on a common platform by using the proposed benchmarks, as well as by varying important problem parameters. We interpret the results based upon the design philosophies and principles behind these algorithms, drawing inferences why some algorithms perform better than others. We also propose a performance measure called scheduling scalability (SS) that captures the collective effectiveness of a scheduling algorithm in terms of its solution quality, the number of processors used, and the running time.  相似文献   

17.
B. Indurkhya et al. (1986) concluded that the optimal partitioning of a homogeneous random program over a homogeneous distributed system either assigns all modules to a single processor or distributes the modules as evenly as possible among all processors. Their analysis rests heavily on the approximation that equates the expected maximum of a set of independent random variables with the set's maximum expectation. The author strengthens this result by providing an approximation-free proof of this result for two processors under general conditions on the module execution time distribution. It is found that additional rigor leads to a different characterization of the optimality points. The author also shows that under a rigorous analysis one is led to different conclusions in the general P-processor case than those reached using B. Indurkhya et al.'s approximation  相似文献   

18.
ARM程序执行周期估计的基于模拟的非线性方法   总被引:1,自引:0,他引:1  
为了快速而准确地估计ARM处理器上的程序执行时间,研究了基于模拟的非线性程序执行时间估计器的结构.它由程序功能剖面生成模块和程序执行时间预测模块串联而成.程序功能剖面生成模块直接用精确指令模拟器Sim-profile实现;而基于程序执行中的动态指令数与执行时间在处理器上的非线性关系,对于程序执行时间预测模块的实现,首先设计了一种人工神经网络方案,再根据对人工神经网络局限性的判断,如局部最优问题、不适于解决小样本的回归、网络拓扑结构依赖先验知识等缺点,又提出了基于最小二乘支持向量机的方法.实验证明,这些非线性方法,特别是基于最小二乘支持向量机的方法,可以用较低的模拟代价获得较高的程序执行时间估计精度.  相似文献   

19.
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. In this paper we examine the numerical solution of an elliptic partial differential equation in order to study the relationship between problem size and architecture. The equation's domain is discretized into n2 grid points which are divided into partitions and mapped onto the individual processor memories. We analytically quantify the relationships among grid size, stencil type, partitioning strategy processor execution time, and communication network type. In doing so, we determine the optimal number of processors to assign to the solution (and hence the optimal speedup), and identify (i) the smallest grid size which fully benefits from using all available processors, (ii) the leverage on performance given by increasing processor speed or communication network speed, and (iii) the suitability of various architectures for large numerical problems. Finally, we compare the predictions of our analytic model with measurements from a multiprocessor and find that the model accurately predicts performance.  相似文献   

20.
This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to the programs is a stream of data sets, each of which is processed in order by the chain of tasks. This computation structure, also referred to as a data parallel pipeline, is common in several application domains, including digital signal processing, image processing, and computer vision. The parameters of the performance for such stream processing are latency (the time to process an individual data set) and throughput (the aggregate rate at which data sets are processed). These two criteria are distinct since multiple data sets can be pipelined or processed in parallel. The central contribution of this research is a new algorithm to determine a processor mapping for a chain of tasks that optimizes latency in the presence of a throughput constraint. We also discuss how this algorithm can be applied to solve the converse problem of optimizing throughput with a latency constraint. The problem formulation uses a general and realistic model of intertask communication and addresses the entire problem of mapping, which includes clustering tasks into modules, assigning of processors to modules, and possible replicating of modules. The main algorithms are based on dynamic programming and their execution time complexity is polynomial in the number of processors and tasks. The entire framework is implemented as an automatic mapping tool in the Fx parallelizing compiler for a dialect of High Performance Fortran.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号