首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
In previous work, we introduced and analyzed a generalized class of m-level hierarchical multiprocessor systems [1]. The m levels of hierarchy employed by these systems allowed the use of relatively smaller crossbar switches to support processor-memory communication at the local, nonlocal, and global levels. The analysis showed that, for high rate of local requests the m-level system offers a BandWidth (BW) close to that of a crossbar system and better than that of a typical multiple-bus system (with the number of buses equal to half the number of processors). In this paper, the cost effectiveness of the m-level hierarchical multiprocessor system is evaluated in terms of a cost-related performance measure (BW/Cost). Based on an approximate cost analysis, the bandwidth-to-cost ratio of both the m-level and the crossbar multiprocessor systems has been determined, for hierarchically nonuniform reference model. It has been observed that the m-level system is more effective than the crossbar system for medium and large scale multiprocessor systems.  相似文献   

2.
Performance of multiprocessor interconnection networks   总被引:1,自引:0,他引:1  
A tutorial is provided on the performance evaluation of multiprocessor interconnection networks, to guide system designers in their design process. A classification of parallel/distributed systems is followed by a classification of multiprocessor interconnection networks. Basic terminology for performance evaluation is presented. The performance of crossbar interconnection networks, multistage interconnection networks, and multiple-bus systems is then addressed, and a comparison is made along them  相似文献   

3.
Earlier performance studies of multiple-bus multiprocessor systems assume a random selection of competing requests for bus assignment and ignore the effects of realistic bus arbitraion schemes on the performance of such systems.In this paper,we present performance analysis of the multiple-bus systems with different arbitration protocols.The priority protocols considered are random selection,fixed priority,rotating priority,round-robin and FIFO.Analytical models are developed for each of these five different priority protocols.Each of our analyses models exactly the behavior of the corresponding priority protocol with little computation cost.The analytical models are validated through extensive simulations and are them used to carry out performance analysis and comparison of different priority protocols.Numerical results obtained from our models show that the round-robin protocol performs the best amont the five protocols in the system with a few buses.  相似文献   

4.
We present an efficient approach to characterizing the fault tolerance of multiprocessor systems that employ multiple shared buses for interprocessor communication. Of concern is connective fault tolerance, which is defined as the ability to maintain communication between any two fault-free processors in the presence of faulty processors, buses, or processor-bus links. We introduce a model called processor-bus-link (PBL) graphs to represent a multiple-bus system's interconnection structure. The model is more general than previously proposed models, and has the advantages of simple representation, broad application, and the ability to model partial bus failures. The PBL graph implies a set of component adjacency graphs that highlights various connectivity features of the system. Using these graphs, we propose a method for analyzing the maximum number of faults a multiple-bus system can tolerate, and for identifying every minimum set of faulty components that disconnects the processors of the system. We also analyze the connective fault tolerance of several proposed multiple-bus systems to illustrate the application of our method  相似文献   

5.
We study the problem of minimizing the makespan for the precedence multiprocessor constrained scheduling problem with hierarchical communications (Parallel Process. Lett. 10(1) (2000) 133). We propose an -approximation algorithm for the Unit Communication Time hierarchical problem with arbitrary but integer processing times and an unbounded number of biprocessor machines. We extend this result in the case where each cluster has m processors (where m is a fixed constant) by presenting a (2−2/(2m+1))-approximation algorithm.  相似文献   

6.
The performance of multiple-bus interconnection networks for multiprocessor systems is analyzed, taking into account conflict arising from memory and bus interference. A discrete stochastic model of bandwidth is presented for systems in which each memory is connected either to all the buses or to a subset of the available buses. The effects of the assumptions made concerning independence among requests for different memories (spatial independence) and resubmission of blocked requests (temporal independence) are investigated systematically. The basic bandwidth model is extended to account for spatial dependence, and compared to previously proposed models. Finally, the various analytic models are shown to be in close agreement with simulation results.  相似文献   

7.
A class of highly scalable interconnect topologies called the Scalable Optical Crossbar-Connected Interconnection Networks (SOCNs) is proposed. This proposed class of networks combines the use of tunable Vertical Cavity Surface Emitting Lasers (VCSEL's), Wavelength Division Multiplexing (WDM) and a scalable, hierarchical network architecture to implement large-scale optical crossbar based networks. A free-space and optical waveguide-based crossbar interconnect utilizing tunable VCSEL arrays is proposed for interconnecting processor elements within a local cluster. A similar WDM optical crossbar using optical fibers is proposed for implementing intercluster crossbar links. The combination of the two technologies produces large-scale optical fan-out switches that could be used to implement relatively low cost, large scale, high bandwidth, low latency, fully connected crossbar clusters supporting up to hundreds of processors. An extension of the crossbar network architecture is also proposed that implements a hybrid network architecture that is much more scalable. This could be used to connect thousands of processors in a multiprocessor configuration while maintaining a low latency and high bandwidth. Such an architecture could be very suitable for constructing relatively inexpensive, highly scalable, high bandwidth, and fault-tolerant interconnects for large-scale, massively parallel computer systems. This paper presents a thorough analysis of two example topologies, including a comparison of the two topologies to other popular networks. In addition, an overview of a proposed optical implementation and power budget is presented, along with analysis of proposed media access control protocols and corresponding optical implementation  相似文献   

8.
《Parallel Computing》2007,33(1):2-20
In multiprocessor systems, interconnection network design is critical for overall system performance. Among the popular interconnection networks, unidirectional ring-based networks have been one of popular choices for high performance large-scale shared memory multiprocessor systems. In this paper, we propose “Torus Ring”, which is a modified version of two-level hierarchical ring. The Torus Ring has the same complexity as the hierarchical rings, and the only difference is the way it connects the local rings. Compared to hierarchical rings, the Torus Ring helps exploit the memory access locality of application programs more efficiently. It has an advantage over the hierarchical ring when the destination of a packet is the adjacent local ring, especially the backward adjacent local ring. Although we assume that the destination of a network packet is uniformly distributed across the processing nodes, the average number of hops in Torus Ring is equal to that of the hierarchical ring. However, the performance gain of the Torus Ring is expected to increase, due to the memory access locality of the application programs in the real parallel programming environment. In the simulation results, the latency of the interconnection network is reduced by up to 19% and the execution time is reduced by up to 10%, with the moderate ring utilization ratio.  相似文献   

9.
Clustered or hierarchical interconnections have advantages when designing large scale multiprocessor systems. Earlier studies have either focused on only flat interconnections or proposed hierarchical/clustered interconnections with limited packaging and demanded performance constraints. Large systems require several levels of packaging. Packaging technologies impose various physical constraints on bisection bandwidth and channel width of a system. Pinout technologies and the capacity of packaging modules have been ignored in earlier studies, often leading to configurations that are not design-feasible. Similarly, the impact of processor and interconnect technologies on demanded performance has not been considered. We propose a new supply-demand framework for multiprocessor system design by considering packaging, processor, and interconnect technologies in an integrated manner. The elegance of this framework lies in its parameterised representation of different technologies. For a given set of technological parameters the framework derives the best configuration while considering practical design aspects like maximum board area, maximum available pinout, fixed channel width, and scalability. In order to build a scalable parallel system with a given number of processors, the framework explores the design space of flat k-ary n-cube topologies and their clustered variations (k-ary n-cube cluster-c) to derive design-feasible configurations with best system performance  相似文献   

10.
Optimally fault-tolerant partial-connection multiple-bus networks and their fault-tolerant routing algorithms are presented in this paper. The proposed networks are scalable and provide flexibility in the choice of network parameters determining construction cost, system performance, and fault tolerance, given a fixed number of processors. In this design, when performance begins to fall due to contention, the simple addition of a bus can improve performance without adding costly processors or changing the whole topology, as required for other multiple-bus designs. Also, in situations requiring high reliability, for a fixed number of processors, excellent fault tolerance can be obtained  相似文献   

11.
Computer vision is regarded as one of the most complex and computationally intensive problems. In general, a Computer Vision System (CVS) attempts to relate scene(s) in terms of model(s). A typical CVS employs algorithms from a very broad spectrum such as numerical, image processing, graph algorithms, symbolic processing, and artificial intelligence. The authors present a multiprocessor architecture, called “NETRA,” for computer vision systems. NETRA is a highly flexible architecture. The topology of NETRA is recursively defined, and hence, is easily scalable from small to large systems. It is a hierarchical architecture with a tree-type control hierarchy. Its leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide the desired flexibility. The processors in clusters can operate in SIMD-, MIMD- or Systolic-like modes. Other features of the architecture include integration of limited data-driven computation within a primarily control flow mechanism, block-level control and data flow, decentralization of memory management functions, and hierarchical load balancing and scheduling capabilities. The paper also presents a qualitative evaluation and preliminary performance results of a cluster of NETRA  相似文献   

12.
本文运用死锁的一般理论,分析了Crossbar多机系统中各种访问周期之间的死锁问题,并根据电路设计的实际情况,提出了切实可行的解决措施。  相似文献   

13.
Cluster Queue Structure for Shared-Memory Multiprocessor Systems   总被引:1,自引:0,他引:1  
Three basic structures have been proposed to organize the task queues for shared-memory multiprocessor systems: centralized, distributed, and hierarchical. Centralized structures are not suitable for massively parallel systems since the shared queue becomes a bottleneck for frequent enqueuing and dequeuing operations. Distributed structures have load imbalancing problem because of no support for workload sharing between queues. Hierarchical structures intend to combine the advantage of the previous two structures and eliminate their disadvantages. Unfortunately, we find load imbalancing still exists in the hierarchical structure, and has significant impact on system performance, particularly when the workload is heavy and irregular. After identifying the cause of this problem, we propose the use of a clustered structure in place of the hierarchical one. Analyzes and simulations show the proposed structure can provide better load balancing and less contention than the hierarchical one.  相似文献   

14.
Hierarchical scheduling has been proposed as a scheduling technique to achieve aggregate resource partitioning among related groups of threads and applications in uniprocessor and packet scheduling environments. Existing hierarchical schedulers are not easily extensible to multiprocessor environments because 1) they do not incorporate the inherent parallelism of a multiprocessor system while resource partitioning and 2) they can result in unbounded unfairness or starvation if applied to a multiprocessor system in a naive manner. In this paper, we present hierarchical multiprocessor scheduling (H-SMP), a novel hierarchical CPU scheduling algorithm designed for a symmetric multiprocessor (SMP) platform. The novelty of this algorithm lies in its combination of space and time multiplexing to achieve the desired bandwidth partition among the nodes of the hierarchical scheduling tree. This algorithm is also characterized by its ability to incorporate existing proportional-share algorithms as auxiliary schedulers to achieve efficient hierarchical CPU partitioning. In addition, we present a generalized weight feasibility constraint that specifies the limit on the achievable CPU bandwidth partitioning in a multiprocessor hierarchical framework and propose a hierarchical weight readjustment algorithm designed to transparently satisfy this feasibility constraint. We evaluate the properties of H-SMP using hierarchical surplus fair scheduling (H-SFS), an instantiation of H-SMP that employs surplus fair scheduling (SFS) as an auxiliary algorithm. This evaluation is carried out through a simulation study that shows that H-SFS provides better fairness properties in multiprocessor environments as compared to existing algorithms and their naive extensions.  相似文献   

15.
The development of database systems with hierarchical hardware architecture is currently a perspective trend in the field of parallel database machines. Hierarchical architectures have been suggested with the aim to combine advantages of shared-nothing architectures and architectures with shared memory and disks. A commonly accepted way of construction of hierarchical systems is to combine shared-memory (shared-everything) clusters in a unique system without shared resources. However, such architectures cannot ensure data accessibility under hardware failures on the processor cluster level, which limits their use in systems with high fault-tolerance requirements. In this paper, an alternative approach to construction of hierarchical systems is suggested. In accordance with this approach, the systems is constructed as an assembly of processor clusters with shared disks, with each cluster being a two-level multiprocessor structure with a standard strongly connected topology of interprocessor connections. A stream model for organization of parallel query processing in systems with the hierarchical architecture suggested is described. This model has been implemented in a prototype parallel database management system Omega designed for Russian multiprocessor computational systems MBC-100/1000. Our experiments show that the total performance of the processor clusters in the Omega system is comparable with that of the processor clusters with shared resources even in the case of great data skew. At the same time, the clusters of the Omega system are capable of ensuring a higher degree of data availability compared to the clusters with shared-memory architectures.  相似文献   

16.
17.
A large scale, cache-based multiprocessor that is interconnected by a hierarchical network such as hierarchical buses or a multistage interconnection network (MIN) is considered. An adaptive cache coherence scheme for the system is proposed based on a hardware approach that handles multiple shared reads efficiently. The new protocol allows multiple copies of a shared data block in the hierarchical network, but minimizes the cache coherence overhead by dynamically partitioning the network into sharing and nonsharing regions based on program behavior. The new cache coherence scheme effectively utilizes the bandwidth of the hierarchical networks and exploits the locality properties of parallel algorithms. Simulation experiments have been carried out to analyze the performance of the new protocol. The simulation results show that the new protocol gives 15% to 30% performance improvement over some existing cache coherence schemes on similar systems for a wide range of workload parameters  相似文献   

18.
We introduce the concept ofhierarchical clustering as a way to structure shared-memory multiprocessor operating systems for scalability. The concept is based on clustering and hierarchical system design. Hierarchical clustering leads to a modular system, composed of easy-to-design and efficient building blocks. The resulting structure is scalable because it 1) maximizes locality, which is key to good performance in NUMA (non-uniform memory access) systems and 2) provides for concurrency that increases linearly with the number of processors. At the same time, there is tight coupling within a cluster, so the system performs well for local interactions that are expected to constitute the common case. A clustered system can easily be adapted to different hardware configurations and architectures by changing the size of the clusters. We show how this structuring technique is applied to the design of a microkernel-based operating system calledHurricane. This prototype system is the first complete and running implementation of its kind and demonstrates the feasibility of a hierarchically clustered system. We present performance results based on the prototype, demonstrating the characteristics and behavior of a clustered system. In particular, we show how clustering trades off the efficiencies of tight coupling for the advantages of replication, increased locality, and decreased lock contention.  相似文献   

19.
The paper is dedicated to issues concerning simulation and analysis of hierarchical multiprocessor systems oriented to database applications. Requirements for a parallel database system model are given. A survey and comparative analysis of known parallel database system models are presented. A new multiprocessor database system model is introduced. This model allows us to simulate and evaluate arbitrary hierarchical multiprocessor configurations in the context of the OLTP class database applications. Examples of using the database multiprocessor model for simulation study of multiprocessor database systems are presented.  相似文献   

20.
This paper introduces a new closed-form solution for the reliability of large-scale multiprocessor systems. The systems are based on SCI rings interconnected in hierarchical structures. Reliability expressions using enumeration technique are derived assuming Weibull failure process. The reliability function derived in this paper is general and valid for any hierarchical ring-based system with arbitrary number of levels. The hierarchical interconnections are constructed from self-healing rings and basic rings. The analysis shows the improvement achieved in reliability when self-healing rings are used. Although we used hierarchical systems based on SCI rings, the technique followed in this work is applied for any type of rings such as slotted or token rings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号