首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 443 毫秒
1.
The development of database systems with hierarchical hardware architecture is currently a perspective trend in the field of parallel database machines. Hierarchical architectures have been suggested with the aim to combine advantages of shared-nothing architectures and architectures with shared memory and disks. A commonly accepted way of construction of hierarchical systems is to combine shared-memory (shared-everything) clusters in a unique system without shared resources. However, such architectures cannot ensure data accessibility under hardware failures on the processor cluster level, which limits their use in systems with high fault-tolerance requirements. In this paper, an alternative approach to construction of hierarchical systems is suggested. In accordance with this approach, the systems is constructed as an assembly of processor clusters with shared disks, with each cluster being a two-level multiprocessor structure with a standard strongly connected topology of interprocessor connections. A stream model for organization of parallel query processing in systems with the hierarchical architecture suggested is described. This model has been implemented in a prototype parallel database management system Omega designed for Russian multiprocessor computational systems MBC-100/1000. Our experiments show that the total performance of the processor clusters in the Omega system is comparable with that of the processor clusters with shared resources even in the case of great data skew. At the same time, the clusters of the Omega system are capable of ensuring a higher degree of data availability compared to the clusters with shared-memory architectures.  相似文献   

2.
《Parallel Computing》1997,22(13):1837-1851
The PAPS (Performance Analysis of Parallel Systems) toolset is a testbed for the model based performance prediction of message passing parallel applications executed on private memory multiprocessor computer systems. PAPS allows to describe the execution behavior of the computer hardware and operating system software resources up to a very detailed level. This enables very accurate performance prediction of parallel applications even in the case of substantial performance degradation due to contention for shared resources. In this paper the fundamental design principles and implementation methodologies for the development of the PAPS toolset are presented and the PAPS parallel system specification formalisms are described. A simplified performance study of a parallel Gaussian elimination application on the nCUBE 2 multiprocessor system is used to demonstrate the usage of the tool.  相似文献   

3.
For the multiprocessor systems of the hierarchical-architecture relational databases, a new approach to data layout and load balancing was proposed. Described was a database multiprocessor model enabling simulation and examination of arbitrary multiprocessor hierarchical configurations in the context of the on-line transaction processing applications. An important subclass of the symmetrical multiprocessor hierarchies was considered, and a new data layout strategy based on the method of partial mirroring was proposed for them. The disk space used to replicate the data was evaluated analytically. For the symmetrical hierarchies having certain regularity, theorems estimating the laboriousness of replica formation were proved. An efficient method of load balancing on the basis of the partial mirroring technique was proposed. The methods described are oriented to the clusters and Grid-systems.  相似文献   

4.
This paper presents the performance analysis results for the RAP-WAM AND-Parallel Prolog architecture on shared-memory multiprocessor organizations. The goal of this parallel model is to provide inference speeds beyond those attainable in sequential systems, while supporting conventional logic programming semantics. Special emphasis is placed on sequential performance, storage efficiency, and low control overhead. First, the concepts and techniques used in the parallel execution model are described, along with the general methodology, benchmarks, and simulation tools used for its evaluation. Results are given both at the memory reference level and at the memory organization level. A two-level shared-memory architecture model is presented together with an analysis of various solutions to the cache coherency problem. Finally, RAP-WAM shared-memory simulation results are presented. It is argued that the RAP-WAM model can exploit coherent caches and attain speeds in excess of 2 MLIPS with current shared-memory multiprocessing technology for real applications that exhibit medium degrees of parallelism. MCC  相似文献   

5.
用多机系统进行并行仿真是解决大规模连续系统实时仿真问题的有效途径。多机并行仿真中关键要解决的问题,是如何有效地将一个仿真任务分配到多机系统上并发执行,并获得高的加速比。本文介绍了作者自行研制的并行仿真软件支撑环境PARSIM,它可将一个传统单机上串行执行的仿真程序自动转换成在同构型多机系统上高效并发执行的并行仿真程序,并就并行性识别,多任务自动划分等问题展开了讨论,给出了相应的算法和应用实例。  相似文献   

6.
“多处理机并行处理模拟器”是旨在帮助使用者巩固和加深对典型的并行计算机系统--MIMD多处理机系统--进行并行处理基本工作过程的理解,通过实际编写并行应用程序并对其模拟执行的过程进而引导其进行更为深入研究与开发的计算机辅助教学系统。通过对MIMD多处理机系统体系结构、编译器和操作系统基本牲的模拟,实现了对作业/作业步和DO循环级并行性的显式及隐式开发,依据“单独并行段”和“主动抽取”策略可对并行代  相似文献   

7.
A hardware accelerator for self-organizing feature maps is presented. We have developed a massively parallel architecture that, on the one hand, allows a resource-efficient implementation of small or medium-sized maps for embedded applications, requiring only small areas of silicon. On the other hand, large maps can be simulated with systems that consist of several integrated circuits that work in parallel. Apart from the learning and recall of self-organizing feature maps, the hardware accelerates data pre- and postprocessing. For the verification of our architectural concepts in a real-world environment, we have implemented an ASIC that is integrated into our heterogeneous multiprocessor system for neural applications. The performance of our system is analyzed for various simulation parameters. Additionally, the performance that can be achieved with future microelectronic technologies is estimated.  相似文献   

8.
The two major design approaches taken to build distributed and parallel computer systems, multiprocessing and multicomputing, are discussed. A model that combines the best properties of both multiprocessor and multicomputer systems, easy-to-build hardware, and a conceptually simple programming model is presented. Using this model, a programmer defines and invokes operations on shared objects, the runtime system handles reads and writes on these objects, and the reliable broadcast layer implements indivisible updates to objects using the sequencing protocol. The resulting system is easy to program, easy to build, and has acceptable performance on problems with a moderate grain size in which reads are much more common than writes. Orca, a procedural language whose sequential constructs are roughly similar to languages like C or Modula 2 but which also supports parallel processes and shared objects and has been used to develop applications for the prototype system, is described  相似文献   

9.
The prevalence of multicore processors has resulted in the wider applicability of parallel programming models such as OpenMP and MapReduce. A common goal of running parallel applications implemented under such models is to guarantee bounded response times while maximizing system utilization. Unfortunately, little previous work has been done that can provide such performance guarantees. In this paper, this problem is addressed by applying soft real-time scheduling analysis techniques. Analysis and conditions are presented for guaranteeing bounded response times for parallel applications under global EDF multiprocessor scheduling.  相似文献   

10.
An accurate and efficient model of a commercial multiprocessor bus is developed. Four important characteristics of the bus design are modeled: asynchronous memory write operations; in-order delivery of responses to processor read requests; priority scheduling of memory responses; and upper bounds on the number of outstanding processor requests. A two-level hierarchical model employing both Markov chain and mean value analysis techniques for analyzing queueing networks is used. The model is shown to accurately predict measured system performance for two parallel program workloads that have different memory access characteristics. The results provide evidence that analytic queueing models can be extremely accurate in spite of simplifying assumptions required for model tractability. Model estimates are compared against detailed simulation of the bus to investigate in more detail the likely source of small model inaccuracies. The use of the analytical model for assessing system design tradeoffs is illustrated  相似文献   

11.
We propose and evaluate a parallel “decomposite best-first” search branch-and-bound algorithm (dbs) for MIN-based multiprocessor systems. We start with a new probabilistic model to estimate the number of evaluated nodes for a serial best-first search branch-and-bound algorithm. This analysis is used in predicting the parallel algorithm speed-up. The proposed algorithm initially decomposes a problem into N subproblems, where N is the number of processors available in a multiprocessor. Afterwards, each processor executes the serial best-first search to find a local feasible solution. Local solutions are broadcasted through the network to compute the final solution. A conflict-free mapping scheme, known as the step-by-step spread, is used for subproblem distribution on the MIN. A speedup expression for the parallel algorithm is then derived using the serial best-first search node evaluation model. Our analysis considers both computation and communication overheads for providing realistic speed-up. Communication modeling is also extended for the parallel global best-first search technique. All the analytical results are validated via simulation. For large systems, when communication overhead is taken into consideration, it is observed that the parallel decomposite best-first search algorithm provides better speed-up compared to other reported schemes  相似文献   

12.
In previous work, we introduced and analyzed a generalized class of m-level hierarchical multiprocessor systems [1]. The m levels of hierarchy employed by these systems allowed the use of relatively smaller crossbar switches to support processor-memory communication at the local, nonlocal, and global levels. The analysis showed that, for high rate of local requests the m-level system offers a BandWidth (BW) close to that of a crossbar system and better than that of a typical multiple-bus system (with the number of buses equal to half the number of processors). In this paper, the cost effectiveness of the m-level hierarchical multiprocessor system is evaluated in terms of a cost-related performance measure (BW/Cost). Based on an approximate cost analysis, the bandwidth-to-cost ratio of both the m-level and the crossbar multiprocessor systems has been determined, for hierarchically nonuniform reference model. It has been observed that the m-level system is more effective than the crossbar system for medium and large scale multiprocessor systems.  相似文献   

13.
Coupling of application programs designed for multiprocessor computing systems requires simultaneous use of several paradigms implemented as communication middleware. In this paper, we propose a method of integration of MPI, which is widely used in scientific parallel computations, and CORBA, which is designed for the development of object-oriented applications. This makes it possible to assemble integrated software systems for interdisciplinary computations on heterogeneous multiprocessor systems with the reuse of available application software. An example of inclusion of an MPI linear algebra package into a CORBA-based distributed object-oriented model for solving systems of equations is presented.  相似文献   

14.
This paper presents a new queuing approach for the modeling of single bus tightly coupled multiprocessors, and a new simulation method which allows for its analysis. The model considers the macroscopic behavior of the multiprocessor, as well as the local program behavior. A new simulation method for each processing element, in the multiprocessor, is also presented. An expansion of the simulator for a multiprocessor system is also described. This expansion takes into account the fact that the workload distribution may be either static or dynamic. The simulator is shown to be capable of addressing the steady state behavior of the multiprocessor model, as well as the transient behavior. Experimental results for a single processor are presented, along with a brief discussion of problems that arose during the simulation experiments.  相似文献   

15.
The simulation of systems that include components at varying levels of abstraction is addressed. A performance model of a hierarchical discrete-event simulation algorithm running on a hypercube architecture is presented. The model allows the performance impact of decisions made in the design of the parallel processor as well as in the design of the simulation algorithm to be examined. Three static component partitioning strategies are considered: random partitioning, heuristic partitioning, and simulated annealing. The performance model is applied to digital system simulation  相似文献   

16.
A large scale, cache-based multiprocessor that is interconnected by a hierarchical network such as hierarchical buses or a multistage interconnection network (MIN) is considered. An adaptive cache coherence scheme for the system is proposed based on a hardware approach that handles multiple shared reads efficiently. The new protocol allows multiple copies of a shared data block in the hierarchical network, but minimizes the cache coherence overhead by dynamically partitioning the network into sharing and nonsharing regions based on program behavior. The new cache coherence scheme effectively utilizes the bandwidth of the hierarchical networks and exploits the locality properties of parallel algorithms. Simulation experiments have been carried out to analyze the performance of the new protocol. The simulation results show that the new protocol gives 15% to 30% performance improvement over some existing cache coherence schemes on similar systems for a wide range of workload parameters  相似文献   

17.
Exploiting coherence for multiprocessor ray tracing   总被引:2,自引:0,他引:2  
The scalability and cost effectiveness of general-purpose distributed-memory multiprocessor systems makes them particularly suitable for ray-tracing applications. However, the limited memory available to each processor in such a system requires schemes to distribute the model database among the processors. The authors identify a form of coherence in the ray-tracing algorithm that can be exploited to develop optimum schemes for data distribution in a multiprocessor system. This in turn gives rise to high processor efficiency for systems with limited distributed memory  相似文献   

18.
This paper first develops architecture for a multiprocessor job scheduling system with an embedded simulation technique. The architecture provides a shell for applications that are characterized by two scheduling policies, a heuristic algorithm policy and a First-In-First-Out (FIFO) policy. These policies are implemented in the simulation model by using the embedded technique. The paper evaluates these two policies using the queue length, waiting time and flow time as the criteria to compare the performance of these two scheduling policies. Next we designed two simulation situations using two different real world applications. The purpose is to examine the performances of multiprocessor systems with and without inspection operations and two different scheduling policies. The two applications, berth allocation for the container terminal operations and production scheduling arrangement in an Original Equipment Manufacturer (OEM) power supply factory, are studied. The final results show that a proper scheduling policy will perform better than the traditional FIFO approach for a multiprocessor system. Our study also provides guidelines on balancing a system with the addition of a final inspection activity.  相似文献   

19.
An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7%, with a standard deviation of 1.5%; the maximum error is about 10%  相似文献   

20.
Hierarchical scheduling has been proposed as a scheduling technique to achieve aggregate resource partitioning among related groups of threads and applications in uniprocessor and packet scheduling environments. Existing hierarchical schedulers are not easily extensible to multiprocessor environments because 1) they do not incorporate the inherent parallelism of a multiprocessor system while resource partitioning and 2) they can result in unbounded unfairness or starvation if applied to a multiprocessor system in a naive manner. In this paper, we present hierarchical multiprocessor scheduling (H-SMP), a novel hierarchical CPU scheduling algorithm designed for a symmetric multiprocessor (SMP) platform. The novelty of this algorithm lies in its combination of space and time multiplexing to achieve the desired bandwidth partition among the nodes of the hierarchical scheduling tree. This algorithm is also characterized by its ability to incorporate existing proportional-share algorithms as auxiliary schedulers to achieve efficient hierarchical CPU partitioning. In addition, we present a generalized weight feasibility constraint that specifies the limit on the achievable CPU bandwidth partitioning in a multiprocessor hierarchical framework and propose a hierarchical weight readjustment algorithm designed to transparently satisfy this feasibility constraint. We evaluate the properties of H-SMP using hierarchical surplus fair scheduling (H-SFS), an instantiation of H-SMP that employs surplus fair scheduling (SFS) as an auxiliary algorithm. This evaluation is carried out through a simulation study that shows that H-SFS provides better fairness properties in multiprocessor environments as compared to existing algorithms and their naive extensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号