期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Methodology for predicting performance of distributed and parallel systems

Rakesh Kushwaha 《Performance Evaluation》1993,18(3):189-204

This paper describes an accurate and efficient method to model and predict the performance of distributed/parallel systems. Various performance measures, such as the expected user response time, the system throughput and the average server utilization, can be easily estimated using this method. The methodology is based on known product form queueing network methods, with some additional approximations. The method is illustrated by evaluating performance of a multi-client multi-server distributed system. A system model is constructed and mapped to a probabilistic queueing network model which is used to predict its behavior. The effects of user think time and various design parameters on the performance of the system are investigated by both the analytical method and computer simulation. The accuracy of the former is verified. The methodology is applied to identify the bottleneck server and to establish proper balance between clients and servers in distributed/parallel systems. 相似文献

2.

Performance of symbolic applications on a parallel architecture

Adolfo Guzman Edward J. Krall Patrick F. McGehearty Nader Bagherzadeh 《International journal of parallel programming》1987,16(3):183-214

The results of a study of a family of parallel symbolic architectures executing several parallel applications are presented. The class of architectures being simulated is characterized by a shared memory structure, by a hierarchical interconnect, and by clustered processors. Speedup measurements were obtained from six different application kernels. Measurements were also performed to assess the degradation of speedup as a function of the interconnection delays, and to study the effect of different scheduling algorithms. The results presented support the claim that the proposed architecture would be a powerful parallel symbolic computation system. The paper discusses processor starvation, fine grain parallelism, unever loads, foreign reference, schedule and indeterminate computation with respect to the applications chosen.This work was completed within the Advanced Computer Architecture Program, Micro-electronics and Technology Computer Corporation, Austin, Texas. 相似文献

3.

Performance evaluation methodology for massively parallel computer systems

LIU Jie CHI Li hua JIANG Jie XU Han YAN Yi hui HU Qing feng 《计算机工程与科学》2013,35(3):25

相似文献

4.

A massively parallel fault-tolerant architecture for time-critical computing

Ishfaq Ahmad 《The Journal of supercomputing》1995,9(1-2):135-162

Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called thebisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, calledfault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures. 相似文献

5.

Chiron parallel program performance visualization system

Hendrik A. Goosen Anna R. Karlin David Cheriton Dieter Polzin 《Computer aided design》1994,26(12):899-906

Chiron is a prototype visualization system for displaying the memory system performance of shared memory multiprocessor applications. The system uses 3D graphics techniques to display large amounts of both code-oriented and data-oriented information. Chiron is designed to isolate problems such as low cache block utilization, improper layout of data in memory resulting in excessive replacement interference, and improper partitioning of work among the processors resulting in excessive coherence interference. A 3D interactive user interface provides the user with flexibility in displaying the data and facilitates the job of focusing in on memory bottlenecks. The paper describes the design and implementation of Chiron, and illustrates its use. 相似文献

6.

并行计算系统可扩展性的研究

下载免费PDF全文

祝永志李丙峰孙婷婷李佩《计算机工程与应用》2011,47(21):47-49

可扩展性是设计并行计算系统和并行算法所要考虑的一个重要性能指标。分析了等效率、等速度、平均延迟和等并行计算开销比几种并行系统可扩展性模型的特征,提出了一种新的更有效的可扩展性度量标准。通过实验结果分析,该模型能很好地评测并行计算系统的可扩展性。相似文献

7.

Using redundant parallel architecture to improve speaker recognition performance

Zhengquan QIU Junxun YIN Caiyun FAN 《控制理论与应用》2008,6(2):221-223

In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model (GMM) recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, compared with a conventional device, especially when pruning of some recognizers is performed. 相似文献

8.

面向过程的测试方法在大规模数据密集型系统中的应用

刘莹宋怀明焦丽梅《计算机应用》2006,26(6):1452-1455

针对数据密集型的大规模系统提出了一种面向过程的测试方法,它根据测试节点的不同角色,综合负载特征和资源利用情况,对大规模系统进行综合全面的分析,这不仅简化了大规模系统测试的复杂性,而且较好的屏蔽了上层应用的多样性。该方法已在一些大规模系统测试中得到了应用,及时发现了系统设计和系统设备的问题,取得了很好的效果。相似文献

9.

Scheduling parallel iterative methods on multiprocessor systems

Nikolaos M. Missirlis 《Parallel Computing》1987,5(3):295-302

The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup S_p O(N/3) and an efficiency E_p 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors. 相似文献

10.

Using redundant parallel architecture to improve speaker recognition performance

Zhengquan QIU Junxun YIN Caiyun FAN 《控制理论与应用(英文版)》2008,6(2):221-223

In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model （GMM） recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, comoared with a conventional device, esoeciallv when oruning of some recognizers is performed. 相似文献

11.

Designing communication strategies for heterogeneous parallel systems

Ravi Prakash Dhabaleswar K. Panda 《Parallel Computing》1998,24(14):2035-2052

This paper explores the suitability of the emerging passive star-coupled optical interconnection using wavelength division multiplexing as the system interconnect to provide high bandwidth (Gbits/sec) communication demanded by heterogeneous systems. Several different communication strategies (combinations of communication topologies and protocols) are investigated under a representative master-slave computational model. The interplay between system speed, network speed, task granularity, and degree of parallelism is studied using both analytical modeling and simulations. It is shown that a hierarchical ALOHA-based communication strategy between the master and the slaves, implemented on top of the passive star-coupled network, leads to a considerable reduction in channel contention and provides 50–80% reduction in task completion time for applications with medium to high degrees of coarse grain parallelism. Comparable reduction in channel contention is also shown to be achieved by using tunable acoustooptic filters at master nodes. 相似文献

12.

微机网络环境下提高PVM并行程序性能的策略

尚月强《计算机工程与设计》2007,28(13):3100-3102,3129

网络并行计算是并行计算与分布式计算技术非常重要的发展方向之一,结合具体的数值试验,探讨了Windows操作系统下基于PVM的网络并行数值计算中影响PVM并行程序性能的几个重要因素,包括负载平衡、通信开销、网络性能、任务粒度、处理机个数、精度要求及处理机内存容量问题等,并提出了提高PVM并行程序性能的相应策略,以高效快速地实现问题的求解. 相似文献

13.

Exploitation of parallel processing for implementing high-performance deduction systems

Anita Jindal Ross Overbeek Waldo C. Kabat 《Journal of Automated Reasoning》1992,8(1):23-38

We describe a scheme for parallelizing first-order logic deduction systems. This scheme has been successfully used for parallelizing OTTER, which is a sequential deduction system developed at Argonne National Laboratory. This parallel deduction system, called PARRallel OTter-II (PARROT-II) has attained real speedups in excess of 20 over the best results of current sequential deduction systems. We believe that our results are of interest for two distinct reasons: (1) this is (as far as we know) the first case in which a system has successfully exploited parallelism to outperform the best sequential deduction systems on difficult problems, and (2) we believe that our approach generalizes to other deduction paradigms (e.g., term rewriting systems).This paper discusses the motivation for developing the scheme used by PARROT-II and the implementation details of PARROT-II. It also presents timing results for PARROT-II for some benchmark problems.Work submitted as partial fulfillment of the requirements for the doctoral degree at the Graduate College of the University of Illinois at Chicago. This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. 相似文献

14.

Encoding of parallel program schemata by vector addition systems

A. Thayse 《International journal of parallel programming》1979,8(3):209-218

A new vector addition system is proposed which represents a possible encoding of a parallel schema. This vector addition system contains fewer vectors and vectors of smaller length than the vector addition systems presented in the literature. 相似文献

15.

Applying parallel computer systems to solve symmetric tridiagonal eigenvalue problems

Mi Lu Xiangzhen Qiao 《Parallel Computing》1992,18(12):1301-1315

A block parallel partitioning method for computing the eigenvalues of symmetric tridiagonal matrix is presented. The algorithm is based on partitioning, in a way that ensures load balance during computation. This method is applicable to both shared memory- and distributed memory-MIMD systems. Compared with other parallel tridiagonal eigenvalue algorithms existing in the literature, the proposed algorithm achieves a higher speedup of O(p) on a parallel computer with p-fold parallelism, which is linear, and the data communication between processors is less than that required for other methods. The results were tested and evaluated on an MIMD machine, and were within 62% to 98% of the predicted performance. 相似文献

16.

Integrated office systems over LANs — a performance study

Nicolas Georganas Najah Naffah 《Computer Communications》1987,10(6):291-296

In this paper, an integrated office system environment is modelled and studied. Composed of multi-media workstations, printer servers, database servers, electronic mail servers etc., the office information system (OIS) is interconnected by a LAN. The performance of three selected networks namely Appletalk, Starlan and Ethernet, in handling the typical office system applications, is evaluated. The study applies discrete-event computer simulation, using the QNAP2 simulation software. Results comparing the performance and utilization of the three networks are presented and some useful conclusions obtained. 相似文献

17.

Reduced-order performance of parallel and series-parallel identifiers with weakly observable parasitics

Petros Ioannou C.Richard Johnson 《Automatica》1983,19(1):75-80

The stability properties of discrete-time parallel and series-parallel identifiers with respect to a specific model-plant order mismatch are analyzed. While in a deterministic environment with no modeling error the two schemes give identical results, when used in a deterministic environment with modeling error their performance is different. We assume a singularly perturbed state representation for the plant where the modeling error consists of fast parasitics which are weakly observable in the plant output. Detailed bounds on parameter and output estimate errors are established and the robustness of the adaptive identifiers is established by showing that the error bound goes to zero as the modeling error goes to zero, i.e. as the parasitics become infinitely fast. The dependence of this residual identification error on the input signal, the neglected parasitics, and the initial error conditions is shown to be crucial. The bounds indicate possibilities for reducing the error by a proper choice of the input signal. 相似文献

18.

An approach to evaluation of hierarchical systems performance

I. M. Titenko 《Cybernetics and Systems Analysis》2000,36(4):531-538

An approach to assessment of performance of hierarchical systems with simple subordination is developed with regard to moment characteristics of sequences of random sums. Translated from Kibernetika i Sistemnyi Analiz, No. 4, pp. 70–79, July–August, 2000. 相似文献

19.

What do users of parallel computer systems really need?

David J. Kuck 《International journal of parallel programming》1994,22(1):99-127

High performance computers have played key roles in many scientific and engineering advances over the past 40 years, and many more may be expected in the future. However, unless practical parallel systems can be produced in this decade, a performance crisis will arise by 2000 across the spectrum of systems from workstations to supercomputers. There is widespread confusion today about how best to proceed with future parallel systems because so many different approaches have been taken and the performance results have been so spotty. A fundamental flaw in our approach to parallel computing, as a nation, is the poor understanding we have obtained about delivered performance. This paper analyzes the situation and suggests fundamental changes that are necessary to achieve practical parallelism in this decade. A great deal of money is now being spent and more is planned, to advance the field, but money is not so much the problem as shortages of qualified people and a sharp focus for their work. Our national goals for the end of this decade must be the creation of an infrastructure for understanding performance, and its natural consequence, the development of practical parallel systems. 相似文献

20.

Modelling test data for performance evaluation of large parallel database machines

Chris Bates Innes Jelly Jon Kerridge 《Distributed and Parallel Databases》1996,4(1):5-23

Parallel servers offer improved processing power for relational database systems and provide system scalability. In order to support the users of these systems, new ways of assessing the performance of such machines are required. If these assessments are to show how the machines perform under commercial workloads they need to be based upon models which have a real commercial basis. This paper shows how a realistic model of a financial application has been developed and how a set of tools has been created which allow the implementation of the model on any commercial database system. The tools allow the generation of large quantities of test data in a manner which renders it amenable to subsequent independent analysis. The test data thus generated forms the basis for the performance tuning of parallel database machines.Recommended by: Patrick Valduriez 相似文献