期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

黄忍冬彭舰冯灏《计算机应用》2008,28(9):2371-2374

串行模式下,求解复杂电子分子碰撞的散射问题存在单机处理时间长、内存耗用大等缺陷,而并行处理的思想可显著降低其计算时间,是解决该问题的有效途径。在主-从结构的并行模型基础之上设计并实现了电子分子碰撞散射的并行算法,提出了有效的优化步骤。实验结果得到了满意的加速比和并行效率,验证了该方案的可行性与正确性。相似文献

2.

Predicting and Bounding the Speedup of Multithreaded Solaris Programs

Lars Lundberg 《Journal of Parallel and Distributed Computing》1999,57(3):358

In Solaris, threads are frequently relocated. The data associated with a relocated thread have to be moved from the cache of the old processor to the new processor. In order to avoid poor memory performance due to thread relocation, threads can be bound to processors—static scheduling. Finding a static schedule which results in maximum speedup is NP-hard. It is even difficult to determine if a static schedule is close to the optimal case or not. Here, a technique for predicting the speedup of multithreaded Solaris programs is presented. Based on an existing theoretical result, a lower bound on the maximal speedup is also obtained. The predicted speedup and the bound are based on recordings from a single-processor execution. When comparing the predictions with the real speedup using a multiprocessor with eight processors, we see that the predictions are very good. By comparing the speedup of a static schedule with the bound, we see that it is worthwhile to look for other schedules. 相似文献

3.

Design and implementation of parallel approximate inverse classes using OpenMP

Konstantinos M. Giannoutakis George A. Gravvanis 《Concurrency and Computation》2009,21(2):115-131

A new parallel normalized optimized approximate inverse algorithm, based on the concept of antidiagonal wave pattern, for computing classes of explicitly approximate inverses, is introduced for symmetric multiprocessor systems. The parallel normalized explicit approximate inverses are used in conjunction with parallel normalized explicit preconditioned conjugate gradient schemes for the efficient solution of finite element sparse linear systems. The parallel design and implementation issues of the new algorithm are discussed and the parallel performance is presented using OpenMP. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

4.

Methodology for predicting performance of distributed and parallel systems

Rakesh Kushwaha 《Performance Evaluation》1993,18(3):189-204

This paper describes an accurate and efficient method to model and predict the performance of distributed/parallel systems. Various performance measures, such as the expected user response time, the system throughput and the average server utilization, can be easily estimated using this method. The methodology is based on known product form queueing network methods, with some additional approximations. The method is illustrated by evaluating performance of a multi-client multi-server distributed system. A system model is constructed and mapped to a probabilistic queueing network model which is used to predict its behavior. The effects of user think time and various design parameters on the performance of the system are investigated by both the analytical method and computer simulation. The accuracy of the former is verified. The methodology is applied to identify the bottleneck server and to establish proper balance between clients and servers in distributed/parallel systems. 相似文献

5.

A study of dynamic finite size scaling behavior of the scaling functions—calculation of dynamic critical index of Wolff algorithm

Semra Gündüç Yi?it Gündüç 《Computer Physics Communications》2005,166(1):1-7

In this work we have studied the dynamic scaling behavior of two scaling functions and we have shown that scaling functions obey the dynamic finite size scaling rules. Dynamic finite size scaling of scaling functions opens possibilities for a wide range of applications. As an application we have calculated the dynamic critical exponent (z) of Wolff's cluster algorithm for 2-, 3- and 4-dimensional Ising models. Configurations with vanishing initial magnetization are chosen in order to avoid complications due to initial magnetization. The observed dynamic finite size scaling behavior during early stages of the Monte Carlo simulation yields z for Wolff's cluster algorithm for 2-, 3- and 4-dimensional Ising models with vanishing values which are consistent with the values obtained from the autocorrelations. Especially, the vanishing dynamic critical exponent we obtained for d=3 implies that the Wolff algorithm is more efficient in eliminating critical slowing down in Monte Carlo simulations than previously reported. 相似文献

6.

航天领域高性能并行计算研究进展

龚春叶包为民汤国建王玲孙学功刘杰《计算机工程与科学》2014,36(9):1629-1636

航天领域的大规模科学与工程问题的数值模拟既依赖于高性能并行计算的支撑,同时也是高性能并行计算发展的动力。综述了航天领域高性能并行计算的研究进展,对高性能并行计算环境进行简单介绍,对相关研究领域包括气动力、气动热、化学非平衡、结构强度、热防护、蒙特卡罗方法和湍流研究等进行分类和详细阐述;总结了航天领域高性能并行计算存在科学计算高并行效率和工程计算低实用价值、并行应用的多样性和缺少科学的并行方法的矛盾,并指出了进一步研究方向。相似文献

7.

On the promise of general-purpose parallel computing

James J. Hack 《Parallel Computing》1989,10(3):261-275

It has become generally accepted that continued improvements in high-performance scientific computation will be achieved only through the ‘exploitation of parallelism’. Despite the nebulous nature of this expression, enthusiasm for the potential of parallel computing has led to calls for improvements in computational performance of more than a thousand-fold in the next few years, or for what is sometimes referred to as a Teraflop (one trillion floating-point operations per second) Computer. Such a system is envisioned as a general-purpose tool for accelerating progress in such widely varied applications as astronomy, biochemistry, circuit analysis, computational fluid dynamics, global economic modeling, high energy physics, materials science, structural analysis, and weather prediction.

Although parallel architectures appear to offer the greatest promise for significant improvements in overall computational performance, it is not yet clear whether a general-purpose parallel architecture can realize the large increases solicited by the scientific community. This note will take a practical look at the prospect for general-purpose parallel computation and will consider some of the potential limitations by using a simple parametric model of computational performance. 相似文献

8.

Optimum tactics of parallel multi-grid algorithm with virtual boundary forecast method running on a local network with the PVM platform

下载免费PDF全文

郭庆平章社生卫加宁《计算机科学技术学报》2000,15(4):355-359

In this paper,an optimum tactic of multi-grid parallel algorithm with virtual boundary forecast method is disscussed,and a two-stage implementation is presented.The numerical results of solving a non-linear heay transfer equation show that the optimum implementation is much better than the non-optimum one. 相似文献

9.

Optimization of parallel query execution plans in XPRS 总被引：1，自引：0，他引：1

Wei Hong Michael Stonebraker 《Distributed and Parallel Databases》1993,1(1):9-32

In this paper, we describe our approach to optimization of query execution plans in XPRS, a multiuser parallel database system based on a shared memory multiprocessor and a disk array. The main difficulties in this optimization problem are the compile-time unknown parameters such as available buffer size and number of free processors, and the enormous search space of possible parallel plans. We deal with these problems with a novel two phase optimization strategy which dramatically reduces the search space and allows run time parameters without significantly compromising plan optimality. In this paper we present our two phase optimization strategy and give experimental evidence from XPRS benchmarks that indicate that it almost always produces optimal or close to optimal plans. 相似文献

10.

Updating method for the computation of orbits in parallel and sequential dynamical systems

《国际计算机数学杂志》2012,89(9):1796-1808

In this article, we provide a matrix method in order to compute orbits of parallel and sequential dynamical systems on Boolean functions. In this sense, we develop algorithms for systems defined over directed (and undirected) graphs when the evolution operator is a general minterm or maxterm and, likewise, when it is constituted by independent local Boolean functions, so providing a new tool for the study of orbits of these dynamical systems. 相似文献

11.

Strategy and Simulation of Adaptive RID for Distributed Dynamic Load Balancing in Parallel Systems

下载免费PDF全文

Lin Chengiiang Li Sanli 《计算机科学技术学报》1997,12(2):113-120

Dynamic load balancing schemes are significant for efficiently executing nonuniform problems in highly parallel multicomputer systems.The objective is to minimize the total exectuion time of single applications.This paper has proposed an ARID strategy for distributed dynamic load balancing.Its principle and control protocol are described,and te communication overhead,the effect on system stability and the performance efficiency are analyzed.Finally,simulation experiments are carried out to compare the adaptive strategy with other dynamic load balancing schemes. 相似文献

12.

Modeling and analyzing the energy consumption of fork‐join‐based task parallel programs

Thomas Rauber Gudula Rünger 《Concurrency and Computation》2015,27(1):211-236

Because of environmental and monetary concerns, it is increasingly important to reduce the energy consumption in all areas, including parallel and high performance computing. In this article, we propose an approach to reduce the energy consumption needed for the execution of a set of tasks computed in parallel in a fork‐join fashion. The approach consists of an analytical model for the energy consumption of a parallel computation in fork‐join form on dynamic voltage frequency scaling processors, a theoretical specification of an energy‐optimal frequency‐scaled state, and the energy minimization by computing optimal scaling factors. For larger numbers of tasks, the approach is extended by scheduling algorithms, which exploit the analytical result and aim at a reduction of the energy. Energy measurements of a complex numerical method and the SPEC CPU2006 benchmarks as well as simulations for a large number of randomly generated tasks illustrate and validate the energy modeling, the minimization, and the scheduling results. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

13.

Reduced-order performance of parallel and series-parallel identifiers with weakly observable parasitics

Petros Ioannou C.Richard Johnson 《Automatica》1983,19(1):75-80

The stability properties of discrete-time parallel and series-parallel identifiers with respect to a specific model-plant order mismatch are analyzed. While in a deterministic environment with no modeling error the two schemes give identical results, when used in a deterministic environment with modeling error their performance is different. We assume a singularly perturbed state representation for the plant where the modeling error consists of fast parasitics which are weakly observable in the plant output. Detailed bounds on parameter and output estimate errors are established and the robustness of the adaptive identifiers is established by showing that the error bound goes to zero as the modeling error goes to zero, i.e. as the parasitics become infinitely fast. The dependence of this residual identification error on the input signal, the neglected parasitics, and the initial error conditions is shown to be crucial. The bounds indicate possibilities for reducing the error by a proper choice of the input signal. 相似文献

14.

Computer comparisons in the presence of performance variation

Samuel IRVING Bin LI Shaoming CHEN Lu PENG Weihua ZHANG Lide DUAN 《Frontiers of Computer Science》2020,14(1):21-41

Performance variability,stemming from nondeterministic hardware and software behaviors or deterministic behaviors such as measurement bias,is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data analytic increases.Conventional methods use various measures(such as geometric mean)to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions.In this paper,we propose three resampling methods for performance evaluation and comparison:a randomization test for a general performance comparison between two computers,bootstrapping confidence estimation,and an empirical distribution and five-number-summary for performance evaluation.The results show that for both PARSEC and highvariance BigDataBench benchmarks 1)the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large;2)bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure(e.g.,ratio of geometric means);and 3)when the difference is very small,a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems.We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance.We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines.We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5.Finally,we propose the utilization of a novel biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior.We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples. 相似文献

15.

A collection of parallel linear equations routines for the Denelcor HEP

Jack J. Dongarra Robert E. Hiromoto 《Parallel Computing》1984,1(2):133-142

This paper describes the implementation and performance results for a few standard linear algebra routines on the Denelcor HEP computer. The algorithms used here are based on high-level modules that facilitate portability and perform efficiently in a wide range of environments. The modules are chosen to be of a large enough computational granularity so that reasonably optimum performance may be insured. The design of algorithms with such fundamental modules in mind will also facilitate their replacement by others more suited to gain the desired performance on a particular computer architecture. 相似文献

16.

Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the SolarisTM

Kelvin K. Yue David J. Lilja 《Concurrency and Computation》2001,13(6):449-464

Parallel applications typically do not perform well in a multiprogrammed environment that uses time‐sharing to allocate processor resources to the applications' parallel threads. Co‐scheduling related parallel threads, or statically partitioning the system, often can reduce the applications' execution times, but at the expense of reducing the overall system utilization. To address this problem, there has been increasing interest in dynamically allocating processors to applications based on their resource demands and the dynamically varying system load. The Loop‐Level Process Control (LLPC) policy (Yue K, Lilja D. Efficient execution of parallel applications in multiprogrammed multiprocessor systems. 10th International Parallel Processing Symposium, 1996; 448–456) dynamically adjusts the number of threads an application is allowed to execute based on the application's available parallelism and the overall system load. This study demonstrates the feasibility of incorporating the LLPC strategy into an existing commercial operating system and parallelizing compiler and provides further evidence of the performance improvement that is possible using this dynamic allocation strategy. In this implementation, applications are automatically parallelized and enhanced with the appropriate LLPC hooks so that each application interacts with the modified version of the Solaris operating system. The parallelism of the applications are then dynamically adjusted automatically when they are executed in a multiprogrammed environment so that all applications obtain a fair share of the total processing resources. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

17.

提高并行数据库性能的几点思考

王勇智胡虚怀唐志平唐乙秋《计算机与现代化》2005,(5):69-71

讨论了并行数据库结构,并分析了影响并行数据库性能的原因,提出了提高并行数据库性能的一些策略。相似文献

18.

多处理器计算环境中基于能量节约的实时动态调度算法

韩建军李庆华缪天鹏《小型微型计算机系统》2006,27(5):866-872

当前处理器由于较高的能量消耗，导致处理器热量散发的提高及系统可靠性的降低，已经成为目前计算机领域较为关心的问题．然而目前一些有效降低能量消耗的技术大多针对单处理器系统，较少考虑多处理器系统．提出的调度算法针对多处理器计算环境，以执行时间最快的任务优先调度为基础，结合其它有效技术（共享空闲时间回收），使得实时任务在其截止期内完成的同时能够有效地减低整个系统的能量消耗．针对独立任务集及具有依赖关系的任务集，提出两种针对同构计算环境的算法：STFBA1（Shortest—Task—First—Based Algorithm）及STFBA2，及两钟针对多任务集的算法HSA1（Hybrid Seheduling Algorithm）及HAS2．在单任务集计算环境下，与目前所知的有效算法相比，算法具有更好的性能（调度长度及能量消耗）．在多任务集计算环境下，基于混合调度策略的算法能够明显改进调度性能．相似文献

19.

Overview of parallel processing research in Japan

R Ohbuchi 《Parallel Computing》1985,2(3):219-228

This paper gives an overview of Japanese research and development efforts on the parallel processing architectures. Projects are categorized by their application domains. Following an introduction, general trends and some examples of research projects for each of the application domains such as artificial intelligence, numerical processing, and others like database, image, graphics, etc. are presented. 相似文献

20.

AVS标准中整数DCT变换的CUDA并行算法 总被引：1，自引：0，他引：1

孟小华刘坚强《微计算机应用》2011,32(11)

随着图形处理器(GPU)的处理能力的不断增强,图形处理器越来越多的运用在计算密集型的数据处理中.AVS标准视频压缩算法中一些步骤存在典型的并行特性,高清、超清视频压缩的串行算法执行时间开销较大,难以满足实时编码的需要,因此利用GPU的并行处理能力和CUDA的编程框架对AVS标准中的整数DCT变换算法进行了并行实现.经过实验测试,并行算法与串行算法相比具有较高的加速比. 相似文献