首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a two-processor distributed computer network, prior research showed that a maximum flow algorithm can be used to find optimal program-module-to-processor assignments to maximize the performance of distributed programs. This paper examines the sequence of optimal assignments found as the load on one processor is held fixed and the load on the other is varied. For every program module M there exists a critical load factor fM such that when the load on the processor with variable load is below fM, M is assigned to that processor by an optimal assignment, and is otherwise assigned to the other processor. This characteristic opens the possibility of doing optimal dynamic assignments in real-time.  相似文献   

2.
In this paper, we present a computation offloading scheme on handheld devices. This scheme partitions an ordinary program into a client–server distributed program, such that the client code runs on the handheld device and the server code runs on the server. Our partition analysis and program transformation guarantee correct distributed execution under all possible execution contexts. We give a polynomial time algorithm to find the optimal program partition for given program input data. We use an option-clustering approach to handle different program partitions for different program execution options. Experimental results show significant improvement of performance and energy consumption on an HP IPAQ handheld device through computation offloading.  相似文献   

3.
An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate an optimal processor assignment. The paper proposes two efficient algorithms to decide the optimal number of processors assigned to each individual loop. Efficient parallel counterparts of these two algorithms are also presented. These algorithms not only always generate an optimal processor assignment, but also are much faster than the exiting optimal algorithm in the literature. The paper discusses improving the performance of parallel execution by transforming a nested parallel loop into a semantically equivalent one. Three loop transformations are investigated. It is observed that, in most cases, the parallel execution time is improved after applying these transformations  相似文献   

4.
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. A cost model is proposed for modeling communication heterogeneity on computational grids. A processor allocation algorithm is developed which always finds an optimal processor allocation that minimizes the effective execution time of a job when the job is being scheduled. It is proven that the list scheduling (LS) algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. We compare the performance of various job scheduling and processor allocation algorithms for grid computing on metacomputers. We evaluate the performance of 128 combinations of two job scheduling algorithms, four initial job ordering strategies, four processor allocation algorithms, and four metacomputers by extensive simulation. It is found that the combination of largest job first (LJF) initial job ordering and minimum effective execution time (MEET) or largest machine first (LMF) processor allocation algorithm yields the best average-case performance, and the choice of FCFS and LS depends on the range of job sizes. It is also observed that communication heterogeneity does have significant impact on schedule lengths.  相似文献   

5.
为改进同构应用在计算网格中的执行性能,提出了子作业指派方法。对于计算密集的应用,任务间通信是可忽略的,故一个这样的作业被划分为若干子作业,不同的子作业被分别指派到不同的机群,该作业划分是根据网格负载平衡完成的。非计算密集的应用在多站点计算时很少取得令人满意的性能,故一个这样的作业被整体指派到某个机群。为找出最适合机群,对每个机群的处理机性能和处理机间通信性能进行测量,并根据应用性能模型预测作业运行时间。实验显示,该子作业指派方法在优化同构应用的执行性能上是有效的。  相似文献   

6.
One of the major issues that needs to be addressed in distributed memory multiprocessor (DMM) systems is the program task partitioning and scheduling problems, i.e. mapping of an application program's precedence related task threads among the processing elements of a DMM system. The optimal task partitioning and scheduling problem, with the goal of minimizing the program execution time and interprocessor communication overhead, is known to be an NP-complete problem. The paper addresses the design, development and performance evaluation of a novel static task partitioning and scheduling method called linear clustering with task duplication (LCTD). LCTD employs the linear (sequential) execution of tasks and task duplication heuristics in achieving minimized computation and interprocessor communication delays in DMMs. The superiority of the proposed LCTD algorithm is demonstrated through simulation studies and comparison against several of the existing static scheduling schemes, such as heavy node first (HNF) and linear clustering. We show that the proposed method can obtain an average of 33% improvement in program execution time and 21% improvement in processor utilization compared to linear clustering and HNF methods.  相似文献   

7.
The problem of distributing tasks to processors in a distributed computing system is addressed. A task should be assigned to a processor whose capabilities are most appropriate for the execution of that task and excessive interprocessor communication is avoided. A simple algorithm for task allocation is presented. The execution costs and communication costs of the tasks are represented by arrays. A task is either assigned to a processor or fused with another task using a simple criterion. The execution and communication costs are then modified suitably. The process continues until all the tasks are assigned to processors. This algorithm also facilitates incorporation of various system constraints. It is applicable to random program structures and to a system containing any number of processors.  相似文献   

8.
Modern cyber-physical systems assume a complex and dynamic interaction between the real world and the computing system in real-time. In this context, changes in the physical environment trigger changes in the computational load to execute. On the other hand, task migration services offered by networked control systems require also management of dynamic real-time computing load in nodes. In such systems it would be difficult, if not impossible, to analyse off-line all the possible combinations of processor loads. For this reason, it is worthwhile attempting to define new flexible architectures that enable computing systems to adapt to potential changes in the environment.We assume a system composed by three main components: the first one is responsible of the management of the requests arisen when new tasks require to be executed. This management component asks to the second component about the resources available to accept the new tasks. The second component performs a feasibility analysis to determine if the new tasks can be accepted coping with its real-time constraints. A new processor speed is also computed. A third component monitors the execution of tasks applying a fixed priority scheduling policy and additionally controlling the frequency of the processor.This paper focus on the second component providing a “correct” (a task never is accepted if it is not schedulable) and “near-exact” (a task is rarely rejected if it is schedulable) algorithm that can be applicable in practice because its low/medium and predictable computational cost. The algorithm analyses task admission in terms of processor frequency scaling. The paper presents the details of a novel algorithm to analyse tasks admission and processor frequency assignment. Additionally, we perform several simulations to evaluate the comparative performance of the proposed approach. This evaluation is made in terms of energy consumption, task rejection ratios, and real computing costs. The results of simulations show that from the cost, execution predictability, and task acceptance points of view, the proposed algorithm mostly outperforms other constant voltage scaling algorithms.  相似文献   

9.
One of the challenges for parallel compilers and compiler-related tools is, given a machine-independent parallel language, to generate executable code for a variety of computational models, and to identify those specific parallel modes for which a program is well-suited. One portion of this problem, developing a method for estimating the relative execution time of a data-parallel algorithm in an environment capable of the SIMD and SPMD (MIMD) modes of parallelism, is presented. Given a data-parallel program in a language whose syntax is mode-independent and empirical information about instruction execution time characteristics, the goal is to use static source-code analysis to determine an implementation that results in an optimal execution time for a mixed-mode machine capable of SIMD and SPMD parallelism. Statistical information about individual operation execution times and paths of execution through a parallel program is assumed. A secondary goal of this study is to indicate language, algorithm, and machine characteristics that must be researched to learn how to provide the information needed to obtain an optimal assignment of parallel modes to program segments.  相似文献   

10.
Programming models for distributed systems often construct a task graph for the program to be executed on a distributed system of processors. While the topology of the task graph can be constructed from the program structure, often the task execution times and data transfer costs between tasks depend on the input data, or more specifically, on the particular problem instance. Though this indicates that the optimal schedule of a task graph cannot be determined until the input data is available, it is possible to estimate theworst caseprocessor requirement for the optimal schedule of a program solely from the topology of its task graph. In this paper, we study the problem of estimating worst case processor requirements for scheduling (with cloning) layered task graphs based on their topology. We show that computing an accurate processor bound for layered graphs is NP hard (even for two layers) and present a polynomial time algorithm which computes an upper bound on the processor requirement. We show that the algorithm provides tight bounds for several common classes of layered task graphs.  相似文献   

11.
在分簇VLIW DSP上,指令分簇是一项对程序性能有重要影响的编译优化,但现有的指令分簇算法只能处理顺序的程序区域,且难以获得最佳的分簇方案。针对这些问题,提出一种基于整数线性规划的统一指令分簇与指令调度的方法。该方法使用零一决策变量表示函数中指令的分簇、指令的局部调度以及簇间传输指令的全局调度,并将指令之间的依赖关系和对处理器资源的竞争关系构造为线性约束,最终得到一个以最小化函数的估计执行时间为目标的整数线性规划模型。实验结果表明,求解该模型得到的分簇调度方案对程序性能的优化显著强于现有算法,并且求解模型所耗费的时间是可接受的。  相似文献   

12.
分布式决策是提高群体自主性的关键技术之一.以侦查类无人机(unmanned search aerial vehicles,USAV)和打击类无人机(unmanned combat aerial vehicles,UCAV)执行协同搜索、攻击灰色目标区域问题为背景,建立了一种考虑局部链式通信、无人机飞行性能和任务执行能力等多约束的分布式任务分配模型,基于贝叶斯定理将任务空间的连续/离散不确定量用任务收益值量化描述.然后,提出了一种基于一致性协调算法的在线协同策略,并利用一致协调理论建立了一种冲突调解规则,在此基础上,设计了一种分布式任务分配求解算法,能够实现多USAV,UCAV的协同多任务快速分配.最后,通过数值仿真,验证了本文算法求解不确定空间任务分配问题的可行性和快速性.  相似文献   

13.
Evolutionary Algorithms for Allocating Data in Distributed Database Systems   总被引:2,自引:0,他引:2  
A major cost in executing queries in a distributed database system is the data transfer cost incurred in transferring relations (fragments) accessed by a query from different sites to the site where the query is initiated. The objective of a data allocation algorithm is to determine an assignment of fragments at different sites so as to minimize the total data transfer cost incurred in executing a set of queries. This is equivalent to minimizing the average query execution time, which is of primary importance in a wide class of distributed conventional as well as multimedia database systems. The data allocation problem, however, is NP-complete, and thus requires fast heuristics to generate efficient solutions. Furthermore, the optimal allocation of database objects highly depends on the query execution strategy employed by a distributed database system, and the given query execution strategy usually assumes an allocation of the fragments. We develop a site-independent fragment dependency graph representation to model the dependencies among the fragments accessed by a query, and use it to formulate and tackle data allocation problems for distributed database systems based on query-site and move-small query execution strategies. We have designed and evaluated evolutionary algorithms for data allocation for distributed database systems.  相似文献   

14.
To solve the load imbalance problem of a solution-adaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load of each processor. For the former case, remapping can be performed by some fast mapping algorithms. For the latter case, a load-balancing algorithm can be applied to balance the computational load of each processor. In this paper, three tree-based parallel load-balancing methods, the MCSTLB method, the BTLB method, and the CBTLB method, were proposed to deal with the load imbalance problems of solution-adaptive finite element application programs. To evaluate the performance of the proposed methods, we have implemented those methods along with three mapping methods, the AE/ORB method, the AE/MC method, and the MLkP method, on an SP2 parallel machine. Three criteria, the execution time of mapping/load-balancing methods, the execution time of a solution-adaptive finite element application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for a solution-adaptive finite element application program, are used for the performance evaluation. The experimental results show that 1) if the initial mapping is performed by a mapping method and the same mapping method and load-balancing methods were used in each refinement to balance the load of processors, the execution time of an application program under a load-balancing method is always shorter than that of the mapping method, and 2) the execution time of an application program under the CBTLB method is shorter than that of the BTLB method and the MCSTLB method  相似文献   

15.
The problem of finding an optimal dynamic assignment of a modular program for a two-processor system is analyzed. Stone's formulation of the static assignment problem is extended to include the cost of dynamically reassigning a module from one processor to the other and the cost of module residence without execution. By relocating modules during the course of program execution, changes in the locality of the program can be taken into account. It is shown that network flow algorithms may be used to find a dynamic assignment that minimizes the sum of module execution costs, module residence costs, intermodule communication costs, and module reassignment costs. Techniques for reducing the size of the problem are described for the case where the costs of residence are negligible.  相似文献   

16.
To efficiently execute a finite element program on a 2D torus, we need to map nodes of the corresponding finite element graph to processors of a 2D torus such that each processor has approximately the same amount of computational load and the communication among processors is minimized. If nodes of a finite element graph do not increase during the execution of a program, the mapping only needs to be performed once. However, if a finite element graph is solution-adaptive, that is, nodes of a finite element graph increase discretely due to the refinement of some finite elements during the execution of a program, a dynamic load-balancing algorithm has to be performed many times in order to balance the computational load of processors while keeping the communication cost as low as possible. In the paper we propose a parallel dynamic load-balancing algorithm (LB) to deal with the load-imbalancing problem of a solution-adaptive finite element program on a 2D torus. The algorithm uses an iterative approach to achieve load-balancing. We have implemented the proposed algorithm along with two parallel mapping algorithms, parallel orthogonal recursive bisection (ORB) and parallel recursive mincut bipartitioning (MC), on a simulated 2D torus. Three criteria, the execution time of load-balancing algorithms, the computation time of an application program under different load balancing algorithms, and the total execution time of an application program (under several refinement phases) are used for performance evaluation. Simulation results show that (1) the execution of LB is faster than those of MC and ORB; (2) the mappings of LB are better than those of ORB and MC; and (3) the speedups of LB are better than those of ORB and MC.  相似文献   

17.
This paper examines the relative effectiveness of fixed priority pre-emptive scheduling in a uniprocessor system, compared to an optimal algorithm such as Earliest Deadline First (EDF). The quantitative metric used in this comparison is the processor speedup factor, equivalent to the factor by which processor speed needs to increase to ensure that any taskset that is schedulable according to an optimal scheduling algorithm can be scheduled using fixed priority pre-emptive scheduling, assuming an optimal priority assignment policy.  相似文献   

18.
Time Warp is an optimistic protocol for synchronizing parallel discrete event simulations. To achieve performance in a multiuser network of workstation (NOW) environment, Time Warp must continue to operate efficiently in the presence of external workloads caused by other users, processor heterogeneity, and irregular internal workloads caused by the simulation model. However, these performance problems can cause a Time Warp program to become grossly unbalanced, resulting in slower execution. The key observation asserted in this article is that each of these performance problems, while different in source, has a similar manifestation. For a Time Warp program to be balanced, the amount of wall clock time necessary to advance an LP one unit of simulation time should be about the same for all LPs. Using this observation, we devise a single algorithm that mitigates these performance problems and enables the “background” execution of Time Warp programs on heterogeneous distributed computing platforms in the presence of external as well as irregular internal workloads  相似文献   

19.
In distributed systems, an application program is divided into several software modules, which need to be allocated to processors connected by communication links. The distributed system reliability (DSR) could be defined as the probability of successfully completing the distributed program. Previous studies about optimal task allocation with respect to DSR focused on the effects of the inter-connectivity of processors, the failure rates of the processors, and the failure rates of the communication links. We are the first to study the effects of module software reliabilities and module execution frequencies on the optimal task allocation. By viewing each module as a state in the Markov process, we build a task allocation decision model to maximize DSR for distributed systems with 100% reliable network. In this model, the DSR is derived from the module software reliabilities, the processor hardware reliabilities, the transition probabilities between modules, and the task allocation matrix. Resource constraints of memory space limitation and computation load limitation on each processor are considered. The constraint of total system cost, including the execution cost, the communication cost, and the failure cost, is also considered. We solve the problem by Constraint Programming using the ILOG SOLVER library. We then apply the proposed model to a case extended from previous studies. Finally, a sensitivity analysis is performed to verify the effects of module software reliabilities and processor hardware reliabilities on the DSR and on the task allocation decision.  相似文献   

20.
Distributed execution of simulation models comes into play when memory limitations of a single computational resource prohibit their execution. In addition, the potential for parallel execution of a model on a distributed platform through the integration of multiple computational cores, can potentially reduce the execution time of a simulation. However, such gains can be voided by the overhead that time synchronization protocols for parallel and distributed simulation induce. This overhead is determined by the protocol used, the characteristics of the simulation model, as well as the architectural and performance characteristics of the hardware platform used. Recently, Infrastructure-as-a-Service offerings in the cloud computing domain have introduced flexibility in acquiring access to virtualized hardware platforms on a pay-as-you-go basis. At present, it is however unclear to what extent these offerings are suited for the distributed execution of discrete-event simulations, and how the characteristics of different resource types impact the performance of distributed simulation under different time synchronization protocols. Likewise, it is unclear which type of resources are most cost-efficient for this type of workload. To our knowledge, this paper is the first to investigate these aspects through an assessment of the performance and cost efficiency of different conservative time synchronization protocols on a range of cloud resource types that are currently available on Amazon EC2. Our analysis shows that performance levels comparable to those realized on commodity hardware based-clusters are attainable, and that the relative performance of different synchronization protocols is retained on high-end IaaS resources. In terms of cost-efficiency, we find that IaaS products tailored to traditional cluster workloads do not necessarily constitute the optimal choice, and we assess the impact of different packing configurations for logical processes in this regard.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号