首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
当前处理器由于较高的能量消耗,导致处理器热量散发的提高及系统可靠性的降低,已经成为目前计算机领域较为关心的问题.然而目前一些有效降低能量消耗的技术大多针对单处理器系统,较少考虑多处理器系统.提出的调度算法针对多处理器计算环境,以执行时间最快的任务优先调度为基础,结合其它有效技术(共享空闲时间回收),使得实时任务在其截止期内完成的同时能够有效地减低整个系统的能量消耗.针对独立任务集及具有依赖关系的任务集,提出两种针对同构计算环境的算法:STFBA1(Shortest—Task—First—Based Algorithm)及STFBA2,及两钟针对多任务集的算法HSA1(Hybrid Seheduling Algorithm)及HAS2.在单任务集计算环境下,与目前所知的有效算法相比,算法具有更好的性能(调度长度及能量消耗).在多任务集计算环境下,基于混合调度策略的算法能够明显改进调度性能.  相似文献   

2.
实时多处理器系统中基于能量节约的动态调度算法   总被引:1,自引:0,他引:1  
当前处理器由于较高的能量消耗。导致处理器热量散发的提高及系统可靠性的降低,已经成为目前计算机领域较为关心的问题.然而目前一些有效降低能量消耗的技术大多针对单处理器系统,较少考虑多处理器系统.本文提出的调度算法针对多处理器系统,以最短任务优先调度为基础,结合其它有效技术,如共享空闲时间回收等,使得实时任务在其截止期内完成的同时能够有效地减低整个系统的能量消耗.针对独立任务集及具有依赖关系的任务集,本文提出两种算法:STFBA1及STFBA2(Shortest Task First—Based Algorithm).与目前所知的有效算法相比,我们的算法具有更好的性能(调度长度及能量消耗).  相似文献   

3.
The development of computing systems with large numbers of processors has been motivated primarily by the need to solve large, complex problems more quickly than is possible with uniprocessor systems. Traditionally, multiprocessor systems have been uniprogrammed, i.e., dedicated to the execution of a single set of related processes, since this approach provides the fastest response for an individual program once it begins execution. However, if the goal of a multiprocessor system is to minimize average response time or to maximize throughput, then multiprogramming must be considered. In this paper, a model of a simple multiprocessor system with a two-program workload is reviewed; the model is then applied to an Intel iPSC/2 hypercube multiprocessor with a workload consisting of parallel wavefront algorithms for solving triangular systems of linear equations. Throughputs predicted by the model are compared with throughputs obtained experimentally from an actual system. The results provide validation for the model and indicate that significant performance improvements for multiprocessor systems are possible through multiprogramming.  相似文献   

4.
Both parallel and distributed network environment systems play a vital role in the improvement of high performance computing. Of primary concern when analyzing these systems is multiprocessor task scheduling. Therefore, this paper addresses the challenge of multiprocessor task scheduling parallel programs, represented as directed acyclic task graph (DAG), for execution on multiprocessors with communication costs. Moreover, we investigate an alternative paradigm, where genetic algorithms (GAs) have recently received much attention, which is a class of robust stochastic search algorithms for various combinatorial optimization problems. We design the new encoding mechanism with a multi-functional chromosome that uses the priority representation—the so-called priority-based multi-chromosome (PMC). PMC can efficiently represent a task schedule and assign tasks to processors. The proposed priority-based GA has show effective performance in various parallel environments for scheduling methods.  相似文献   

5.
Scheduling program tasks on processors is at the core of the efficient use of multiprocessor systems. Most task-scheduling problems are known to be NP-Hard and, thus, heuristics are the method of choice in all but the simplest cases. The utilization of acknowledged sets of benchmark-problem instances is essential for the correct comparison and analysis of heuristics. Yet, such sets are not available for several important classes of scheduling problems, including multiprocessor scheduling problem with communication delays (MSPCD) where one is interested in scheduling dependent tasks onto homogeneous multiprocessor systems, with processors connected in an arbitrary way, while explicitly accounting for the time required to transfer data between tasks allocated to different processors. We propose test-problem instances for the MSPCD that are representative in terms of number of processors, type of multiprocessor architecture, number of tasks to be scheduled, and task graph characteristics (task execution times, communication costs, and density of dependencies between tasks). Moreover, we define our task-graph generators in a way appropriate to ensure that the corresponding problem instances obey the theoretical principles recently proposed in the literature.  相似文献   

6.
It has been difficult to develop simple formulations to predict the execution time of parallel programs due to the complexity of characterizing parallel hardware and software. In an attempt to clarify these characterizations, we introduce a methodology for applying a simple performance model based on Amdahl′s law. Our formulation results in accurate predictions of execution time on available systems, allowing programmers to select the optimal number of processors to apply to a particular problem or to select an appropriate problem size for the number of processors available. In short, we accurately quantify the scalability of a specific algorithm when it is run on a specific parallel computer. Our predictions are based on simple experiments that characterize machine performance and on a simple analysis of the parallel program. We illustrate our method for a program executed on a Sequent Symmetry multiprocessor with 20 processors. Our predictions closely match experimental results, differing by no more than 5% from the actual execution times. Our results illustrate key performance limitations of parallel systems, showing the impact of overhead and the scaling of problem size.  相似文献   

7.
Using a directed acyclic graph (dag) model of algorithms, we investigate precedence-constrained multiprocessor schedules for the n×n×n directed mesh. This cubical mesh is fundamental, representing the standard algorithm for square matrix product, as well as many other algorithms. Its completion requires at least 3n-2 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. For the cubical mesh, such a schedule requires at least [3n2/4] processors. Among such schedules, one with the minimum period (i.e., maximum throughput) is referred to as a period-processor-time-minimal schedule. The period of any processor-time-minimal schedule for the cubical mesh is at least 3n/2 steps. This lower bound is shown to be exact by constructing, for n a multiple of 6, a period-processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is a toroidally connected n/2×n/2×3 mesh  相似文献   

8.
This paper proposes a routing algorithm for the interconnection of multiple processors based on the shortest-path and deflection-routing principles. The routing algorithm, named SPDRA (Shortest Path and Deflection Routing Algorithm), is applied to multiprocessor systems with a single-stage shuffle physical topology. SPDRA is general-purpose, as opposed to the majority of routing algorithms for multiprocessor systems which are optimized for particular traffic patterns generated by a restricted class of parallel algorithms. The general-purpose nature of SPDRA allows perfomance comparisons with a wide class of routing algorithms for multiprocessor systems that, similar to the single-stage shuffle physical topology, have a fixed node-to-processor ratio. The paper compares SPDRA with hypercube algorithms for bidimensional meshes and torus physical topologies, routing algorithms for hierarchical tridimensional tori, and algorithms for routing permutations in shuffle networks, which constitute the most widely accepted approaches for multiprocessor interconnection. SPDRA exhibits a performance advantage for a broad range of network sizes and, in general, the performance advantage grows as the number of processors increases. However, this paper compares the SPDRA algorithm against a limited set of multiprocessor systems and does not demonstrate a general superiority of SPDRA over all systems with a fixed node-to-processor ratio and, especially, with a growing node-to-processor ratio, such as multistage networks.  相似文献   

9.
An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate an optimal processor assignment. The paper proposes two efficient algorithms to decide the optimal number of processors assigned to each individual loop. Efficient parallel counterparts of these two algorithms are also presented. These algorithms not only always generate an optimal processor assignment, but also are much faster than the exiting optimal algorithm in the literature. The paper discusses improving the performance of parallel execution by transforming a nested parallel loop into a semantically equivalent one. Three loop transformations are investigated. It is observed that, in most cases, the parallel execution time is improved after applying these transformations  相似文献   

10.
The multiprocessor scheduling problem is the problem of scheduling the tasks of a precedence constrained task graph (representing a parallel program) onto the processors of a multiprocessor in a way that minimizes the completion time. Since this problem is known to be NP-hard in the strong sense in all but a few very restricted eases, heuristic algorithms are being developed which obtain near optimal schedules in a reasonable amount of computation time. We present an efficient heuristic algorithm for scheduling precedence constrained task graphs with nonnegligible intertask communication onto multiprocessors taking contention in the communication channels into consideration. Our algorithm for obtaining satisfactory suboptimal schedules is based on the classical list scheduling strategy. It simultaneously exploits the schedule-holes generated in the processors and in the communication channels during the scheduling process in order to produce better schedules. We demonstrate the effectiveness of our algorithm by comparing with two competing heuristic algorithms available in the literature  相似文献   

11.
Then-dimensional grid is one of the most representative patterns of data flow in parallel computation. Many scientific algorithms, which require nearest neighbor communication in a lattice space, are modeled by a task graph with the properties of a simple or enhanced grid. The two most frequently used scheduling models for grids are the unit execution time-zero communication delay (UET) and the unit execution time–unit communication time (UET-UCT). In this paper we introduce an enhanced model of then-dimensional grid by adding extra diagonal edges and allowing unequal boundaries for each dimension. For this generalized grid topology we establish the optimal makespan for both cases of UET/UET-UCT grids. Then we give a closed formula that calculates the minimum number of processors required to achieve the optimal makespan. Finally, we propose a low-complexity optimal time and processor scheduling strategy for both cases.  相似文献   

12.
An admissible multiprocessor preemptive scheduling problem is solved for the given execution intervals. In addition, a number of generalizations are considered—interprocessor communications are arbitrary and may vary in time; costs for processing interruptions and switches from one processor to another are taken into account; and besides the processors, additional resources are used. Algorithms based on reducing the original problem to finding paths of a specific length in a graph, a flow problem, and an integer system of linear restrictions are developed.  相似文献   

13.
Multiprocessor execution of functional programs   总被引:1,自引:0,他引:1  
Functional languages have recently gained attention as vehicles for programming in a concise and elegant manner. In addition, it has been suggested that functional programming provides a natural methodology for programming multiprocessor computers. This paper describes research that was performed to demonstrate that multiprocessor execution of functional programs on current multiprocessors is feasible, and results in a significant reduction in their execution times.Two implementations of the functional language ALFL were built on commercially available multiprocessors.Alfalfa is an implementation on the Intel iPSC hypercube multiprocessor, andBuckwheat is an implementation on the Encore Multimax shared-memory multiprocessor. Each implementation includes a compiler that performs automatic decomposition of ALFL programs and a run-time system that supports their execution. The compiler is responsible for detecting the inherent parallelism in a program, and decomposing the program into a collection of tasks, calledserial combinators, that can be executed in parallel.The abstract machine model supported by Alfalfa and Buckwheat is calledheterogeneous graph reduction, which is a hybrid of graph reduction and conventional stack-oriented execution. This model supports parallelism, lazy evaluation, and highe order functions while at the same time making efficient use of the processors in the system. The Alfalfa and Buckwheat runtime systems support dynamic load balancing, interprocessor communication (if required), and storage management. A large number of experiments were performed on Alfalfa and Buckwheat for a variety of programs. The results of these experiments, as well as the conclusions drawn from them, are presented.This research was supported in part by National Science Foundation grants DCR-8302018 and DCR-8521451, by a DARPA subcontract with SDC/Unisys, and by gifts from Burroughs Austin Research Center and the Intel Corporation.  相似文献   

14.
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds  相似文献   

15.
同构计算环境中一种快速有效的静态任务调度算法   总被引:10,自引:1,他引:9  
快速有效的调度任务是多处理器计算环境中的一个关键问题.目前任务调度算法中刻画任务依赖关系最流行的模型是DAG,在以前的文献中,提出了一种新的更实际、更普遍的TTIG模型及其相应的MATE算法(基于同构计算环境).延伸了TTIG模型,并提出基于同构系统的新的算法及两种启发式方法(GBHA1和GBHA2).GBHA以组的形式尽量消除图中回路,因而能获得任务图的全局信息,具有更好的调度性能.在模拟实验中,将此算法与MATE和其他同构环境中基于DAG的有效调度算法,在不同测试条件下进行了比较,结果显示GBHA在性能上明显优于MATE,与基于DAG模型的调度算法比较而言,在性能方面各有千秋,但在算法时间复杂度方面具有显著的优势.  相似文献   

16.
《Computer Networks》2003,41(5):601-621
To provide flexibility in deploying new protocols and services, general-purpose processing engines are being placed in the datapath of routers. Such network processors (NPs) are typically simple RISC multiprocessors that perform forwarding and custom application processing of packets. The inherent unpredictability of execution time of arbitrary instruction code poses a significant challenge in providing service guarantees for data flows that compete for such processing resources in the network. However, we show that network processing workloads are highly regular and predictable, which can be exploited for scheduling purposes. We present two such predictive processor scheduling algorithms that aim at providing service guarantees as well as improving the performance of the NP by increasing the instruction data locality. Simulation results show that these algorithms provide significantly better performance than processor scheduling algorithms that do not take packet processing times into consideration.  相似文献   

17.
《Robotics and Computer》1994,11(2):91-98
A new model is presented to describe data-flow algorithms implemented in a multiprocessing system. Called the resource/data flow graph (RDFG), the model explicitly represents cyclo-static processor schedules as circuits of processor arcs that reflect the order that processors execute graph nodes. The model also allows the guarantee of meeting hard real-time deadlines. When unfolded, the model identifies statistically the processor schedule. The model therefore is useful for determining the throughput and latency of systems with heterogeneous processors. The applicability of the model is demonstrated using a space surveillance algorithm.  相似文献   

18.
The high power consumption of modern processors becomes a major concern because it leads to decreased mission duration (for battery-operated systems), increased heat dissipation, and decreased reliability. While many techniques have been proposed to reduce power consumption for uniprocessor systems, there has been considerably less work on multiprocessor systems. In this paper, based on the concept of slack sharing among processors, we propose two novel power-aware scheduling algorithms for task sets with and without precedence constraints executing on multiprocessor systems. These scheduling techniques reclaim the time unused by a task to reduce the execution speed of future tasks and, thus, reduce the total energy consumption of the system. We also study the effect of discrete voltage/speed levels on the energy savings for multiprocessor systems and propose a new scheme of slack reservation to incorporate voltage/speed adjustment overhead in the scheduling algorithms. Simulation and trace-based results indicate that our algorithms achieve substantial energy savings on systems with variable voltage processors. Moreover, processors with a few discrete voltage/speed levels obtain nearly the same energy savings as processors with continuous voltage/speed, and the effect of voltage/speed adjustment overhead on the energy savings is relatively small.  相似文献   

19.
In simultaneous multithreading (SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems.  相似文献   

20.
Computer vision is regarded as one of the most complex and computationally intensive problems. In general, a Computer Vision System (CVS) attempts to relate scene(s) in terms of model(s). A typical CVS employs algorithms from a very broad spectrum such as numerical, image processing, graph algorithms, symbolic processing, and artificial intelligence. The authors present a multiprocessor architecture, called “NETRA,” for computer vision systems. NETRA is a highly flexible architecture. The topology of NETRA is recursively defined, and hence, is easily scalable from small to large systems. It is a hierarchical architecture with a tree-type control hierarchy. Its leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide the desired flexibility. The processors in clusters can operate in SIMD-, MIMD- or Systolic-like modes. Other features of the architecture include integration of limited data-driven computation within a primarily control flow mechanism, block-level control and data flow, decentralization of memory management functions, and hierarchical load balancing and scheduling capabilities. The paper also presents a qualitative evaluation and preliminary performance results of a cluster of NETRA  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号