期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance modeling and analysis of heterogeneous meta-computing systems interconnection networks

Bahman Javadi Author Vitae Mohammad K. Akbari^{Author Vitae} 《Computers & Electrical Engineering》2008,34(6):488-502

The overall performance of a distributed system often depends on the effectiveness of its interconnection network. Thus, the study of the communication networks for distributed systems is very important, which is the focus of this paper. In particular, we address the problem of interconnection networks performance modeling for heterogeneous meta-computing systems. We consider the meta-computing system as a typical multi-cluster system. Since the heterogeneity is becoming common in such systems, we take into account network as well as cluster size heterogeneity to propose the model. To this end, we present an analytical network model and validate the model through comprehensive simulation. The results of the simulation demonstrated that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions. 相似文献

2.

Analytical modeling of interconnection networks in heterogeneous multi-cluster systems

Bahman Javadi Jemal H. Abawajy Mohammad K. Akbari 《The Journal of supercomputing》2007,40(1):29-47

The study of interconnection networks is important because the overall performance of a distributed system is often critically hinged on the effectiveness of its interconnection network. This paper addresses the problem of interconnection networks performance modeling of large-scale distributed systems with emphases on heterogeneous multi-cluster computing systems. We present an analytical model to predict message latency in multi-cluster systems in the presence of node, network and system organization heterogeneity. The model is validated through comprehensive simulation, which demonstrates that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions.

Mohammad K. AkbariEmail:

相似文献

3.

TLA: Temporal look-ahead processor allocation method for heterogeneous multi-cluster systems

Po-Chi Shih Kuo-Chan Huang Che-Rung Lee I-Hsin Chung Yeh-Ching Chung 《Journal of Parallel and Distributed Computing》2013

In a heterogeneous multi-cluster (HMC) system, processor allocation is responsible for choosing available processors among clusters for job execution. Traditionally, processor allocation in HMC considers only resource fragmentation or processor heterogeneity, which leads to heuristics such as Best-Fit (BF) and Fastest-First (FF). However, those heuristics only favor certain types of workloads and cannot be changed adaptively. In this paper, a temporal look-ahead (TLA) method is proposed, which uses an allocation simulation process to guide the decision of processor allocation. Thus, the allocation decision is made dynamically according to the current workload and system configurations. We evaluate the performance of TLA by simulations, with different workloads and system configurations, in terms of average turnaround time. Simulation results indicate that, with precise runtime information, TLA outperforms traditional processor allocation methods and has up to an 87% performance improvement. 相似文献

4.

Design and performance of speculative flow control for high-radix datacenter interconnect switches

Cyriel Minkenberg Mitchell Gusat 《Journal of Parallel and Distributed Computing》2009

High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times. 相似文献

5.

A new performance measure for characterizing fault rings in interconnection networks

F. Safaei A. Khonsari M.M. Gilak 《Information Sciences》2010,180(5):664-678

One of the fundamental issues in parallel computers is how to efficiently perform routing in a faulty network where each component fails with some probability. Adaptive fault-tolerant routing algorithms in such systems have been frequently suggested as a means of providing continuous operations in the presence of one or more failures by allowing the graceful system degradation. Many algorithms involve adding buffer space and complex control logic to the routing nodes. However, the addition of extra logic circuits and buffer space makes nodes more liable to failure and less reliable. Further, if the shape of fault pattern is confined, then many non-faulty nodes will be sacrificed and hence their resources are wasted. This is clearly an undesirable solution and motivates solutions that provoke efficient use of non-faulty nodes. One such approach to reducing the number of functional nodes that must be marked as faulty is based on the concept of fault rings to support more flexible routing around rectangular fault regions. Before such schemes can be successfully incorporated in networks, it is necessary to have a clear understanding of the factors that affect their performance potential. In this paper, we propose the first general solution for computing the probability of message facing the fault rings with and without overlapping in the well-known torus networks. We also conduct extensive simulation experiments using various fault patterns, the results of which are used to confirm the good accuracy of the proposed analytical models. 相似文献

6.

An accurate performance model for network-on-chip and multicomputer interconnection networks

Slavko Gajin Zoran Jovanovic 《Journal of Parallel and Distributed Computing》2012

In this paper, we present a mathematical background for a new approach for performances modeling of interconnection networks, based on analyzing the packet blocking and waiting time spent in each channel passing through all possible paths in the channel dependency graph. We have proposed a new, simple and very accurate analytical model for deterministic routing in wormhole networks, which is general in terms of the network topology and traffic distribution. An accurate calculation of the variance of the service time has been developed, which overcomes the rough approximation used, as a rule, in the existing models. The model supports two-dimensional mesh topologies, widely used in network-on-chip architectures, and multidimensional topologies, popular in multicomputer architectures. It is applicable even for irregular topologies and arbitrary application-specific traffic. Results obtained through simulation show that the model achieves a high degree of accuracy. 相似文献

7.

Modeling parallel and distributed systems with finite workloads

Ahmed M. Lester Reda 《Performance Evaluation》2005,60(1-4):303-325

In studying or designing parallel and distributed systems one should have available a robust analytical model that includes the major parameters that determine the system performance. Jackson networks have been very successful in modeling computer systems. However, the ability of Jackson networks to predict performance with system changes remains an open question, since they do not apply to systems where there are population size constraints. Also, the product-form solution of Jackson networks assumes steady-state and exponential service centers or certain specialized queueing discipline. In this paper, we present a transient model for Jackson networks that is applicable to any population size and any finite workload (no new arrivals). Using several non-exponential distributions we show to what extent the exponential distribution can be used to approximate other distributions and transient systems with finite workloads. When the number of tasks to be executed is large enough, the model approaches the product-form solution (steady-state solution). We also, study the case where the non-exponential servers have queueing (Jackson networks cannot be applied). Finally, we show how to use the model to analyze the performance of parallel and distributed systems. 相似文献

8.

Performance analysis of BitTorrent-like systems with heterogeneous users 总被引：1，自引：0，他引：1

Wei-Cherng Fragkiskos Konstantinos 《Performance Evaluation》2007,64(9-12):876-891

Among all peer-to-peer (P2P) systems, BitTorrent seems to be the most prevalent one. This success has drawn a great deal of research interest on the system. In particular, there have been many lines of research studying its scalability, performance, efficiency, and fairness. However, despite the large body of work, there has been no attempt mathematically to model, in a heterogeneous (and hence realistic) environment, what is perhaps the most important performance metric from an end user’s point of view: the average file download delay.

In this paper we propose a mathematical model that accurately predicts the average file download delay in a heterogeneous BitTorrent-like system. Our model is quite general, has been derived with minimal assumptions, and requires minimal system information. Then, we propose a flexible token-based scheme for BitTorrent-like systems that can be used to tradeoff between overall system performance and fairness to high bandwidth users, by properly setting its parameters. We extend our mathematical model to predict the average file download delays in the token- based system, and demonstrate how this model can be used to decide on the scheme’s parameters that achieve a target performance/fairness. 相似文献

9.

A new approach to model virtual channels in interconnection networks

N. Alzeidi A. Khonsari 《Journal of Computer and System Sciences》2007,73(8):1121-1130

Dealing with virtual channels has always been a critical issue in developing analytical performance models for interconnection networks. Almost all previous studies relied on a method proposed by Dally to capture the effect of virtual channels multiplexing in the performance of interconnection networks. This paper presents a new method to model the effect of virtual channel multiplexing in high-speed wormhole-switched interconnection networks. Dally's method loses its accuracy as the traffic load increases due to blocking nature of wormhole-switched networks. Our new method is based on a finite capacity queue, M/G/1/V and comparing to Dally's method achieves a higher degree of accuracy under low, moderate and high traffic loads. Furthermore, its simplicity eases its employment under different network conditions and setup. The presented model is validated by means of an event driven simulator and a detailed comparison with Dally's method is presented. 相似文献

10.

SimuRed: A flit-level event-driven simulator for multicomputer network performance evaluation

Fernando Pardo Jose A. Boluda 《Computers & Electrical Engineering》2009,35(5):803-814

The interconnection network is one of the most important multicomputer components, since it has a great impact on global system performance. Many models and simulators have been proposed to evaluate network performance. This paper presents SimuRed, an event-driven flit-level, cycle-accurate simulator to evaluate different orthogonal network configurations. The core of the simulator has been designed to be expandable and portable to different situations. Some of the advantages of this simulator over other similar tools are its visual interface, its fast execution and its simplicity. Moreover, it is multiplatform and its source code versions (C++ and Java) are freely available under GNU open-source license. The performance of this simulator has been evaluated, including a performance impact study of injection channels and deterministic/adaptive routing for meshes and hypercubes. 相似文献

11.

A comprehensive analytical model of interconnection networks in large‐scale cluster systems

Bahman Javadi Jemal H. Abawajy Mohammad K. Akbari 《Concurrency and Computation》2008,20(1):75-97

The trends in parallel processing system design and deployment have been toward networked distributed systems such as cluster computing systems. Since the overall performance of such distributed systems often depends on the efficiency of their communication networks, performance analysis of the interconnection networks for such distributed systems is paramount. In this paper, we develop an analytical model, under non‐uniform traffic and in the presence of communication locality, for the m‐port n‐tree family interconnection networks commonly employed in large‐scale cluster computing systems. We use the proposed model to study two widely used interconnection networks flow control mechanism namely the wormhole and store&forward. The proposed analytical model is validated through comprehensive simulation. The results of the simulation demonstrated that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

12.

A performance comparison of current HPC systems: Blue Gene/Q,Cray XE6 and InfiniBand systems

《Future Generation Computer Systems》2014

We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the way in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q. 相似文献

13.

Performance analysis of opportunistic broadcast for delay-tolerant wireless sensor networks

Abbas Nayebi Author Vitae Hamid Sarbazi-Azad Author Vitae Gunnar Karlsson^{Author Vitae} 《Journal of Systems and Software》2010,83(8):1310-1317

This paper investigates a class of mobile wireless sensor networks that are unconnected most of the times; we refer to them as delay-tolerant wireless sensor networks (DTWSN). These networks inherit their characteristics from both delay tolerant networks (DTN) and traditional wireless sensor networks. After introducing DTWSNs, three main problems in the design space of these networks are discussed: routing, data gathering, and neighbor discovery. A general protocol is proposed for DTWSNs based on opportunistic broadcasting in delay-tolerant networks with radio device on-off periods. Three performance measures are defined in the study: the energy for sending queries to ask for data from possible neighbors (querying energy), data transfer energy, and absorption time (delay). A simple yet accurate approximation for the data-transfer energy is proposed. An analytic model is provided to evaluate the querying energy per contact (epc). Simulation results for the data propagation delay show that the querying energy per contact measure obtained from the analytic model is proportional to the product of the querying energy and the delay. A practical rule of thumb for an optimal query interval in terms of delay and energy is derived from different parts of the study. 相似文献

14.

Introducing probabilities in Statecharts to specify reactive systems for performance analysis

N.L. Vijaykumar S.V. Carvalho V.M.B. Andrade V. Abdurahiman 《Computers & Operations Research》2006

Statecharts are expressed in a graphical language to specify complex reactive systems. They are extension of state-transition diagrams to which notions of hierarchy and orthogonality have been added. Recently, they have been suggested to represent performance models and in this regard a software package has been developed. In these performance models, the behavior of a system under study is considered to be probabilistic. Therefore, the inclusion of probabilities in Statecharts formalism will be studied. The proposed extension considers that a modeled system reacts probabilistically to events. In order to deal with these models, an analytical computational method based on constructing a Continuous-Time Markov Chain that is equivalent to the Statecharts model is proposed. The aspect of generating a Continuous-Time Markov Chain from Statecharts representation along with the solution to include probabilities among the transitions will be covered in this paper. 相似文献

15.

Efficient and scalable scheduling for performance heterogeneous multicore systems

Pengcheng Nie Zhenhua Duan 《Journal of Parallel and Distributed Computing》2012

Performance heterogeneous multicore processors (HMP for brevity) consisting of multiple cores with the same instruction set but different performance characteristics (e.g., clock speed, issue width), are of great concern since they are able to deliver higher performance per watt and area for programs with diverse architectural requirements than comparable homogeneous ones. However, such power and area efficiencies of performance heterogeneous multicore systems can only be achieved when workloads are matched with cores according to both the properties of the workload and the features of the cores. 相似文献

16.

A framework for modular analysis and exploration of heterogeneous embedded systems

Arne Hamann Marek Jersak Kai Richter Rolf Ernst 《Real-Time Systems》2006,33(1-3):101-137

The increasing complexity of heterogeneous systems-on-chip, SoC, and distributed embedded systems makes system optimization and exploration a challenging task. Ideally, a designer would try all possible system configurations and choose the best one regarding specific system requirements. Unfortunately, such an approach is not possible because of the tremendous number of design parameters with sophisticated effects on system properties. Consequently, good search techniques are needed to find design alternatives that best meet constraints and cost criteria. In this paper, we present a compositional design space exploration framework for system optimization and exploration using SymTA/S, a software tool for formal performance analysis. In contrast to many previous approaches pursuing closed automated exploration strategies over large sets of system parameters, our approach allows the designer to effectively control the exploration process to quickly find good design alternatives. An important aspect and key novelty of our approach is system optimization with traffic shaping. 相似文献

17.

Performance modeling of Cartesian product networks 总被引：1，自引：0，他引：1

Reza MoravejiAuthor Vitae Hamid Sarbazi-AzadAuthor Vitae Albert Y. ZomayaAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(1):105-113

This paper presents a comprehensive performance model for fully adaptive routing in wormhole-switched Cartesian product networks. Besides the generality of the model which makes it suitable to be used for any product graph, experimental (simulation) results show that the proposed model exhibits high accuracy even in heavy traffic and saturation region, where other models have severe problems to predict the performance of the network. Most popular interconnection network can be defined as a Cartesian product of two or more networks including the mesh, hypercube, and torus networks. Torus and mesh networks are the most popular topologies used in recent supercomputing parallel machines. They have been widely used for realizing on-chip network in recent on-chip multicore and multiprocessors system. 相似文献

18.

InferFair: Towards QoS-aware scheduling for performance isolation guarantee in heterogeneous model serving systems

《Future Generation Computer Systems》2024

相似文献

19.

An accurate mathematical performance model of adaptive routing in the star graph

A.E. H. M. 《Future Generation Computer Systems》2008,24(6):461-474

Analytical modelling is indeed the most cost-effective method to evaluate the performance of a system. Several analytical models have been proposed in the literature for different interconnection network systems. This paper proposes an accurate analytical model to predict message latency in wormhole-switched star graphs with fully adaptive routing. Although the focus of this research is on the star graph but the approach used for modelling can be, however, used for modelling some other regular and irregular interconnection networks. The results obtained from simulation experiments confirm that the proposed model exhibits a good accuracy for various network sizes and under different operating conditions. 相似文献

20.

A data-based approach for multivariate model predictive control performance monitoring 总被引：2，自引：0，他引：2

Xuemin TianAuthor VitaeSheng ChenAuthor Vitae 《Neurocomputing》2011,74(4):588-597

An intelligent statistical approach is proposed for monitoring the performance of multivariate model predictive control (MPC) controller, which systematically integrates both the assessment and diagnosis procedures. Model predictive error is included into the monitored variable set and a 2-norm based covariance benchmark is presented. By comparing the data of a monitored operational period with the “golden” user-predefined one, this method can properly evaluate the performance of an MPC controller at the monitored operational stage. Characteristic direction information is mined from the operating data and the corresponding classes are built. The eigenvector angle is defined to describe the similarity between the current data set and the established classes, and an angle-based classifier is introduced to identify the root cause of MPC performance degradation when a poor performance is detected. The effectiveness of the proposed methodology is demonstrated in a case study of the Wood-Berry distillation column system. 相似文献