首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Extract-transform-load (ETL) workflows model the population of enterprise data warehouses with information gathered from a large variety of heterogeneous data sources. ETL workflows are complex design structures that run under strict performance requirements and their optimization is crucial for satisfying business objectives. In this paper, we deal with the problem of scheduling the execution of ETL activities (a.k.a. transformations, tasks, operations), with the goal of minimizing ETL execution time and allocated memory. We investigate the effects of four scheduling policies on different flow structures and configurations and experimentally show that the use of different scheduling policies may improve ETL performance in terms of memory consumption and execution time. First, we examine a simple, fair scheduling policy. Then, we study the pros and cons of two other policies: the first opts for emptying the largest input queue of the flow and the second for activating the operation (a.k.a. activity) with the maximum tuple consumption rate. Finally, we examine a fourth policy that combines the advantages of the latter two in synergy with flow parallelization.  相似文献   

3.
In the literature, the problem of global termination detection in parallel systems is usually solved by message passing. In shared-memory systems, this problem can also be solved by using exclusively accessible variables with locking mechanisms. In this paper, we present an algorithm that solves the problem of global termination detection in shared-memory asynchronous multiprocessor systems without using locking. We assume a reasonable computation model in which concurrent reading does not require locking and concurrent writing different values without locking results in an arbitrary one of the values being actually written. For a system of n processors, the algorithm allocates a working space of 2n+1 bits. The worst case time complexity of the algorithm is n+2√+1, which we prove is the lower bound under a reasonable model of computation  相似文献   

4.
Scheduling multiprocessor tasks with genetic algorithms   总被引:4,自引:0,他引:4  
In the multiprocessor scheduling problem, a given program is to be scheduled in a given multiprocessor system such that the program's execution time is minimized. This problem being very hard to solve exactly, many heuristic methods for finding a suboptimal schedule exist. We propose a new combined approach, where a genetic algorithm is improved with the introduction of some knowledge about the scheduling problem represented by the use of a list heuristic in the crossover and mutation genetic operations. This knowledge-augmented genetic approach is empirically compared with a “pure” genetic algorithm and with a “pure” list heuristic, both from the literature. Results of the experiments carried out with synthetic instances of the scheduling problem show that our knowledge-augmented algorithm produces much better results in terms of quality of solutions, although being slower in terms of execution time  相似文献   

5.
This paper presents the implementation and performance results of anand-parallel execution model of logic programs on a shared-memory multiprocessor. The execution model is meant for logic programs with “don't-know nondeterminism”, and handles binding conflicts by dynamically detecting dependencies among literals. The model also incorporates intelligent backtracking at the clause level. Our implementation of this model is based upon the Warren Abstract Machine (WAM); hence it retains most of the efficiency of the WAM for sequential segments of logic programs. Performance results on Sequent Balance 21000 show that on suitable programs, our parallel implementation can achieve linear speedup on dozens of processors. We also present an analysis of different overheads encountered in the implementation of the execution model.  相似文献   

6.
7.
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup Sp O(N/3) and an efficiency Ep 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors.  相似文献   

8.
《Performance Evaluation》1994,20(4):361-371
In the classical scheduling theory it is widely assumed that any task requires for its processing only one processor at a time. In this paper the problem of deterministic scheduling of tasks requiring for their processing more than one processor at a time, i.e., a constant set of dedicated processors, is analyzed. Schedule length is assumed to be a performance measure. Tasks are assumed to be preemptable and independent. Low order polynomial algorithms for simple cases of the problem are given. Then a method to solve the general version of the problem for a limited number of processors is presented, while the case of an arbitrary number of processors is known to be NP-hard. Finally, a version of the problem, where besides processors every task can also require additional resources, is considered.  相似文献   

9.
Real-time systems (RTS) are omnipresent in several domains. The trend is to use multiprocessor architecture to satisfy the timing constraints of such systems. The model-checking methods have proven to be useful for making the development process reliable at a high abstraction level. Based on this approach, the present paper proposes a new technique for scheduling analysis of a partitioned multiprocessor RTS. Starting from a model with dynamic priority time Petri Nets modeling the system, we have proposed a generation of a reduced states graph. Thus, through the properties of the graph the schedulability is checked. Our approach provides an implementation of a Partition Checker tool, which produces an affirmation of the schedulability or a counterexample in the case of non-schedulable system to reduce the SW/HW space exploration.  相似文献   

10.
Timely run‐time software replacement techniques are a corner stone for reconciling real‐time systems development and dynamic behavior. Typical real‐time systems do not consider dynamic behavior because it deeply challenges predictability and timeliness. Current efforts are starting to merge the safe and predictable execution with a controllable level of dynamicity by imposing a set of bounds and limitations to the system dynamic behavior. One of the obstacles for this is how to time‐bound the different operations required to effectively implement component replacement. In this paper, the main challenges for this problem are identified, and a model to ensure that components can be replaced at run time preserving the temporal properties of the system is provided that also avoids failures in replacements. A real example and simulations of our replacement model are provided that validate the presented ideas. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

11.
This paper deals with the problem of scheduling spawned tasks when a query is issued to a database which resides on a MIMD multiprocessor. These tasks have the property that their associated dependency scheme can be presented as a directed tree. We present a theoretical framework with extensive experimental simulations which increase the throughput of database applications. We derive a family of algorithms for scheduling tasks. Their performance is tested on several common multiprocessor configurations. For better performance the adaptation of the scheduling algorithm to the multiprocessor configuration is examined and analyzed. The scheduling algorithms are divided into two cases: (a) permitted changes in the resources connection scheme of the multiprocessor, and (b) no changes allowed. The algorithms are scalable and their complexity is computed. In particular, we present an algorithm for scheduling tasks in the case where the construction of a central storage location is permitted. One of the main tools for the construction of the above algorithms is the notion of (t, 1)-domination and k-domination sets. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

12.
《Parallel Computing》1997,23(13):1963-1986
Memory conflict is a major phenomenon which may cause dramatic loss of performance in vector pipeline multiprocessors. Various techniques have been proposed and implemented to avoid such conflicts. They rely mostly on well-tuned vector element allocation in memory banks (either using programming tools or hard-wired features). We tackle this problem in another way. Instead of trying to avoid memory contention, we aim to enhance the performance of the memory system by scheduling vector element accesses in order to increase memory accesses. This scheduling depends on the memory bank activity when an access is issued, leading to out-of-order access to vector elements. An out-of-order pipeline execution is associated with this out-of-order memory access in order to maintain the processor functional unit chaining. In this paper we study some factors influencing this execution model: vector length, number of processors and number of banks. An analysis of this model using the Markov chain technique and simulation results are also presented. They show the importance of this model in comparison with the classical one encountered in pipelined vector supercomputers.  相似文献   

13.
Conclusion The mathematical model of execution of asynchronous competing processes in a macropipelined MS considered in this article makes it possible to estimate the minimum overall execution time of given volumes of computation and to find the optimal balancing of transfer and computing, the ratio of the number of processors and channels in the MS. Moreover, the proposed mathematical model and the derived balancing conditions fully corroborate the basic principle of macropipelined computation advanced previously by Glushkov [6]. This principle states that when the work is allocated to processors, each processor is assigned in the current step a task that will keep it busy for the longest possible time without requiring interaction with other processors. Further research of this model can proceed in several directions. First, it is very interesting to determine the total idle time of the processors due to busy channels, and also the “idle” time of transfer blocks. Second, it is relevant to calculate the efficiency of the macropipelined method of computation. A similar study of efficiency estimates has been previously conducted in [7]. Third, it is necessary to derive formulas for the total computing time and the corresponding balancing conditions for other classes of competing processes and various operating regimes of channels and processors. Translated from Kibernetika i Sistemnyi Analiz, No. 5, pp. 150–158, September–October, 1998.  相似文献   

14.
Various toolkits exist today for the distributed execution of computational algorithms on clusters of machines. These toolkits are often referred to by the terms ‘Grid Toolkits’, ‘Job Execution Environments’, and ‘Problem Solving Environments (PSEs)’. Here, we introduce iJob—an Internet-based job execution environment that sets out to meet many of the goals of PSEs, such as providing facilities and services to solve a class of problems. In addition, the iJob software allows execution of computational algorithms utilizing standard Internet technologies such as Java, XML, and asynchronous communication protocols. The goals of this project include: (1) deploying the toolkit easily to multiple platforms using the Java technologies; (2) running multiple types of algorithms and supporting multiple users simultaneously; (3) providing a web-based GUI for monitoring and controlling the status of jobs; and (4) providing security at both the user-level and at the network-level. The toolkit has been tested using several simulation codes on pools of Windows 2000 and Solaris systems.  相似文献   

15.
Synchronous VLSI design is approaching a critical point, with clock distribution becoming an increasingly costly and complicated issue and power consumption rapidly emerging as a major concern. Hence, recently, there has been a resurgence of interest in asynchronous digital design techniques which promise to liberate digital design from the inherent problems of synchronous systems. This activity has revealed a need for modelling and simulation techniques suitable for the asynchronous design style. The concurrent process algebra communicating sequential processes (CSP) and its executable counterpart, occam, are increasingly advocated as particularly suitable for this purpose. This paper focuses on issues related to the execution of CSP/occam models of asynchronous hardware on multiprocessor machines, including I/O, monitoring and debugging, partition, mapping and load balancing. These issues are addressed in the context of occarm, an occam simulation model of the AMULET1 asynchronous microprocessor; however, the solutions devised are more general and may be applied to other systems too. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

16.
17.
We consider many-core processors with a task-graph oriented programming model, whereby scheduling constraints among tasks are decided offline, and are then enforced by the runtime system using dedicated hardware. Here, exposing and beneficially exploiting fine grain data and control parallelism is increasingly important. Therefore, high expressive power for stating such constraints/directives, along with the ability to implement them in fast, simple hardware, is critical for success. In this paper, we focus on the relationship among different duplicable (multi-instance) tasks, which are used to express and exploit data parallelism. We extend the conventional Start-After-Complete (precedence) constraint to also be usable between replicas of different such tasks rather than only between entire tasks, thereby increasing the exposable parallelism. Additionally, we propose the parameterized Start-After-Start constraint, which can be used to control the degree of “lockstep” among multiple such tasks, e.g., in order to improve cache performance when the tasks work on the same data. Also, we briefly describe several additional interesting directives. Finally, we show that the directives can be supported efficiently in hardware. Hypercore, a very efficient CREW PRAM-like shared-cache architecture, which is very challenging because it has extremely fast dispatching for basic constraints, is used in the discussion. However, the new directives have broader applicability. Having shown the possibility of simple implementation and indications of benefit, this motivates further exploration of these directives and their implementation in hardware, as well as their support by programming tools.  相似文献   

18.
The Hopfield neural network is extensively applied to obtaining an optimal/feasible solution in many different applications such as the traveling salesman problem (TSP), a typical discrete combinatorial problem. Although providing rapid convergence to the solution, TSP frequently converges to a local minimum. Stochastic simulated annealing is a highly effective means of obtaining an optimal solution capable of preventing the local minimum. This important feature is embedded into a Hopfield neural network to derive a new technique, i.e., mean field annealing. This work applies the Hopfield neural network and the normalized mean field annealing technique, respectively, to resolve a multiprocessor problem (known to be a NP-hard problem) with no process migration, constrained times (execution time and deadline) and limited resources. Simulation results demonstrate that the derived energy function works effectively for this class of problems.  相似文献   

19.
As the impact of the communication architecture on performance grows in a Multiprocessor System-on-Chip (MPSoC) design, the need for performance analysis in the early stage in order to consider various communication architectures is also increasing. While a simulation is commonly performed for performance evaluation of an MPSoC, it often suffers from a lengthy run time as well as poor performance coverage due to limited input stimuli or their ad hoc applications. In this paper, we propose a novel system-level performance analysis method to estimate the performance distribution of an MPSoC. Our approach consists of two techniques: (1) analytical model of on-chip crossbar-based communication architectures and (2) enumeration of task-level execution time variations for a target application. The execution time variation of tasks is efficiently captured by a memory access workload model. Thus, the proposed approach leads to better performance coverage for an MPSoC application in a reasonable computation time than the simulation-based approach. The experimental results validate the accuracy, efficiency, and practical usage of the proposed approach.  相似文献   

20.
The use of parallel operations in automation, such as part fabrication and assembly, computation and control of industrial robots, etc., may yield a minimum production time and thereby increase productivity. However, the coupling between consecutive phases of operations results in series-parallel precedence constraints that may create unavoidable idle time intervals during the operations. To solve the problem, idle time intervals must be broken down and reduced. An algorithm that determines a minimum time-ordered schedule for the parallel operaitons is developed based on the Program Evaluation and Review Technique using the method of “variable” branch and bound.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号