期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

许瑾晨黄永忠郭绍忠周蓓赵捷《软件学报》2015,26(6):1306-1321

数学函数库作为CPU软件的重要组成部分,对于高性能计算机平台上的科学计算、工程数值计算起着极为关键的作用.现有的测试工具只能片面地对函数库进行测试,没有从正确性、精度和函数性能这3方面加以考虑,而且往往只针对一类目标体系结构,适用性有限.针对现有测试工具的缺陷,提出了面向多目标体系结构、全面可复用的一体化测试平台BMltest(basic math library test).测试平台结合函数特征值、IEEE-754特殊数以及利用浮点数生成规则实现的全浮点域指数分布的IEEE-754规范数构造了测试集,有效提高了测试集浮点数的覆盖率;提出了基于多精度库MPFR(multiple-precision floating-point reliable library)的精度测试方法,提高了精度测试的可靠性;提出了基于代码隔离的性能测试方法,最大限度地降低了外部环境对性能测试的干扰.针对大量的浮点测试结果,给出了合理的结果评价方案.测试平台使用的测试集数据与函数做到了相关性的极大分离,保证了测试方法的普适性.通过对包括GNU,Open64及Mlib函数库内所有855个函数的测试结果表明:BMltest平台的测试数据集更全面、有效,精度测试方法更可靠;与其他测试平台相比,性能测试结果更准确、稳定. 相似文献

2.

软件可靠性测试方法新探 总被引：2，自引：0，他引：2

蔡建平《计算机工程与设计》2009,30(20)

针对传统软件可靠性测试方法在对软件因长期使用软件性能下降,甚至完全失效这种严重影响软件可靠性的测试存在不足的现状,通过对软件自身特性以及软件可靠性估算面临问题的深入分析和研究,结合传统的软件可靠性测试方法,提出了基于操作剖面的软件可靠性压力测试思想和操作剖面、压力测试点相互结合、互为补充的软件可靠性测试方法,并给出了软件可靠性测试新方法实施的技术途径.该思想与方法既是对传统软件可靠性测试方法的一个大胆探索,也是对软件可靠性测试方法的一个有益补充. 相似文献

3.

时序电路状态覆盖向量的遗传方法筛选

杨修涛鲁巍李晓维《计算机辅助设计与图形学学报》2006,18(2):251-256

传统的状态覆盖方法对电路的数据单元测试不足，而随机测试方法又具有盲目性．在综合2种方法的基础上，给出一种以状态与状态转换覆盖率为评估、以遗传筛选为工具对生成的测试向量进行择优选择的方法．为了指导测试生成，给出了动态状态转换与静态状态转换概念．同时，基于该方法给出一个测试生成工具GRTT．最后，将文中方法实验于ITC99-benchmark电路，并将实验结果与测试生成系统X-Pulling的结果进行比较．相似文献

4.

一种模拟驱动的Web应用程序性能测试方法 总被引：6，自引：0，他引：6

梁晟李明树梁金能陈振冲《计算机研究与发展》2003,40(7):1069-1075

性能是Web应用程序成功的要素之一，性能测试则是保证这一要素的重要手段．但由于Internet及Web用户的不确定性，Web应用程序的性能测试难于传统Client／Server的测试．比较了3种主要的Web性能测试方法；提出了一种简单可行的、通用的方法——模拟驱动的自动负载测试方法．关键的步骤有：根据系统使用方式和客户端各种特征的分布信息来确定测试负载、设计测试用例；利用测试工具开发相应的测试脚本；运行测试用例模拟不同类型用户的典型行为；收集被测程序的性能数据．结合实例详述了该方法，并给出了测试计划的模板．相似文献

5.

IPv4/IPv6转换网关性能测试方法研究

下载免费PDF全文

孙红兵陈沫蔡一兵李忠诚《计算机工程》2006,32(24):93-95

参照现有的路由器、防火墙等网络设备测试标准并结合转换网关自身的特点，提出了系列转换网关性能测试方法。文中提出的单台设备测试方法和两台设备联测方法可用于不同的测试条件下的转换网关性能测试。在两台设备联测方法的基础上，提出了TCP并发连接数的混合网络测试方法。相似文献

6.

防火墙性能基准测试研究 总被引：2，自引：0，他引：2

田原云晓春朱晓晖《计算机仿真》2003,20(7):123-126

该文通过分析防火墙性能测试影响因素及两种测试模型，重点讨论了防火墙性能评价的指标、测试方法及报告内容，并结合测试原理给出了两种典型的测试系统设计实例。相似文献

7.

IPv4/IPv6转换网关性能指标的研究

下载免费PDF全文

孙红兵钟声陈沫李忠诚《计算机工程》2007,33(6):150-152

参照通用网络设备的测试标准及方法，结合IPv4/IPv6转换网关自身的特点，提出了一套适用于转换网关的性能测试指标。针对网络混合流应用环境，提出了3项综合评价网络性能的测试指标：双向平均延迟，混合平均延迟和加权混合延迟。这3个指标更能体现用户的感受。基于所提出的测试指标，给出了相应的测试方法和测试实例，对测试结果进行分析。相似文献

8.

Web服务测试的研究 总被引：2，自引：0，他引：2

李乔郑啸秦锋《计算机技术与发展》2006,16(9):93-96

Web服务是一种全新的分布式计算技术，它具有真正意义上的系统平台异构性和语言的独立性。随着Web服务技术的不断发展和广泛应用，需要运用测试技术来保障Web服务的正确有效运行。然而由于Web服务采用了新的体系结构和核心协议，其测试方法有别于以往的传统软件测试或网络协议测试，所以有必要对其测试方法和技术进行研究。文中对Web服务的测试进行了分析，针对不同的测试目的提出了对Web服务进行测试的方法，并给出了一种Web服务的测试执行框架。相似文献

9.

振动变送器测试方法研究

刘陆燕袁雪松《物联网技术》2020,(2):90-91,95

振动变送器测试需求随着其应用的增加而不断增多,为了解决振动变送器针对性测试方法缺乏的问题,开展对其测试方法的研究。给出常用振动变送器的安装方式,重点介绍其测试方法及数据验证结果。按照文中给出的方法,通过测试多个厂家不同型号的振动变送器得到统计数据,并分析给出振动变送器的误差判别方法,该方法能较好地满足振动变送器测试需求。相似文献

10.

应用走向云端的性能测试挑战

周悦覃文闯胡一鸣《微型机与应用》2013,32(17):1-2,8

由于云计算应用和传统应用的特性区别,使用传统的测试方法和工具对云端应用进行性能测试有很多限制和不足。提出云端应用模式的性能测试模式需要创新．并介绍了几种新的测试方法。相似文献

11.

构建基于Windows和MPI的Beowulf并行计算系统 总被引：7，自引：0，他引：7

陈星黄卡玛《计算机工程与应用》2003,39(4):59-61

利用普通微机构建并行计算集群(常称为Beowulf系统),能够以低廉的价格获得强大的计算能力。文章介绍了利用16台微机构建一套Beowulf并行计算系统,节点微机上运行Windows2000操作系统,采用MPI(Message-Passing-Interface)的MPICH最新版本:MPICH.NT1.2.3作为并行计算的支撑环境,并以100Mbps高速交换式以太网作为互连网络。通过编制的并行计算程序对该Beowulf系统进行了并行效率的实际测试,测试结果表明该Beowulf系统能够达到非常高的并行加速比和并行效率。相似文献

12.

Latency Metric: An Experimental Method for Measuring and Evaluating Parallel Program and Architecture Scalability

Zhang X. D. Yan Y. He K. Q. 《Journal of Parallel and Distributed Computing》1994,22(3)

Latency measures the delay caused by communication between processors and memory modules over the network in a parallel system. Using intensive measurements and simulation, we show that network latency forms a major obstacle to improving parallel computing performance and scalability. We present an experimental metric, using network latency to measure and evaluate the scalability of parallel programs and architectures. This latency metric is an extension to the isoefficiency function [Grama et al., IEEE Parallel Distrib. Technology 1, 3 (1993), 12-21] and iso-speed metric [Sun and Rover, IEEE Trans. Parallel Distrib. Systems 5, 6 (1994), 599-613]. We give a measurement method for using this latency metric, and report the experimental results of evaluating the scalabilities of several scientific computing algorithms on the KSR-1 shared-memory architecture. Our analysis and experiments show that the latency metric is a practical method to effectively predict and evaluate scalability based on measured latencies inherent in the program and the architecture. 相似文献

13.

Parallel algorithms for bipartite matching problems on distributed memory computers

Johannes Langguth Md. Mostofa Ali PatwaryFredrik Manne 《Parallel Computing》2011,37(12):820-845

We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers.The presented algorithm is based on the Push-Relabel algorithm which is known to be one of the fastest algorithms for the bipartite matching problem. Previous attempts at developing parallel implementations of it have focused on shared memory computers using only a limited number of processors.We first present a straightforward adaptation of these shared memory algorithms to distributed memory computers. However, this is not a viable approach as it requires too much communication. We then develop our new algorithm by modifying the previous approach through a sequence of steps with the main goal being to reduce the amount of communication and to increase load balance. The first goal is achieved by changing the algorithm so that many push and relabel operations can be performed locally between communication rounds and also by selecting augmenting paths that cross processor boundaries infrequently. To achieve good load balance, we limit the speed at which global relabelings traverse the graph. In several experiments on a large number of instances, we study weak and strong scalability of our algorithm using up to 128 processors.The algorithm can also be used to find ?-approximate matchings quickly. 相似文献

14.

适用于异构集群的混合并行流线生成系统

刘俊高阳单桂华迟学斌《计算机系统应用》2021,30(3):60-69

流线是流场可视化的主要方法之一,而针对大规模流场的流线生成由于计算量大往往需要采用高性能计算机这样的并行计算环境结合并行化算法以实现计算加速.在当前异构计算系统越来越普遍的情况下,为了充分利用并行异构计算环境的计算能力,实现更高效的并行流线生成,本文采用了基于数据并行原语结合分布式消息通讯的技术架构,设计了一套适用于异构集群的混合并行流线生成系统,并在此基础上针对数据分块、数据冗余化及进程通讯策略等方面进行设计,提出并实现了一套并行粒子追踪算法.该系统被部署于国产超算平台上,并针对大规模CFD流场模拟结果数据可视化应用开展了实验.本文给出了相关实验结果,分析了核心并行算法的速度性能、可扩展性以及负载均衡等方面情况,说明了系统及算法的有效性和可扩展性. 相似文献

15.

Scalable Load Balancing Strategies for Parallel A* Algorithms

Dutt S. Mahapatra N. R. 《Journal of Parallel and Distributed Computing》1994,22(3)

In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log²P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies. 相似文献

16.

Parallel Computation of High-Dimensional Robust Correlation and Covariance Matrices

James Chilson Raymond Ng Alan Wagner Ruben Zamar 《Algorithmica》2006,45(3):403-431

The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as Quadrant Correlation (QC) and the Maronna method, have been proposed. However, existing algorithms for QC only give acceptable performance when the dimensionality of the matrix is in the hundreds; and the Maronna method is rarely used in practice because of its high computational cost. In this paper we develop parallel algorithms for both QC and the Maronna method. We evaluate these parallel algorithms using a real data set of the gene expression of over 6000 genes, giving rise to a matrix of over 18 million entries. In our experimental evaluation, we explore scalability in dimensionality and in the number of processors, and the trade-offs between accuracy and computational efficiency. We also compare the parallel behaviours of the two methods. From a statistical standpoint, the Maronna method is more robust than QC. From a computational standpoint, while QC requires less computation, interestingly the Maronna method is much more parallelizable than QC. After a thorough experimentation, we conclude that for many data mining applications, both QC and Maronna are viable options. Less robust, but faster, QC is the recommended choice for small parallel platforms. On the other hand, the Maronna method is the recommended choice when a high degree of robustness is required, or when the parallel platform features a large number of processors (e.g., 32). 相似文献

17.

Parallel programming in computational science: an introductory practical training course for computer science undergraduates at Aachen University

H. M. B. C. H. 《Future Generation Computer Systems》2003,19(8):1309-1319

Parallel programming of high-performance computers has emerged as a key technology for the numerical solution of large-scale problems arising in computational science and engineering (CSE). The authors believe that principles and techniques of parallel programming are among the essential ingredients of any CSE as well as computer science curriculum. Today, opinions on the role and importance of parallel programming are diverse. Rather than seeing it as a marginal beneficial skill optionally taught at the graduate level, we understand parallel programming as crucial basic skill that should be taught as an integral part of the undergraduate computer science curriculum. A practical training course developed for computer science undergraduates at Aachen University is described. Its goal is to introduce young computer science students to different parallel programming paradigms for shared and distributed memory computers as well as to give a first exposition to the field of computational science by simple, yet carefully chosen sample problems. 相似文献

18.

数值并行计算可扩展性评价与测试 总被引：3，自引：1，他引：2

迟利华刘杰胡庆丰《计算机研究与发展》2005,42(6):1073-1078

分析了几种可扩展性能评价模型存在的问题,针对实际评价与测试的需要,提出了一种基于等平均负载的数值并行计算可扩展性评价模型．该评价模型对可扩展性能加速比和可扩展性进行了重新定义,给出了使用该模型的进行可扩展加速比和可扩展性测试的方法,结合曲线拟合或并行计算时间模型可以预测并行系统的可扩展性,对NPB BT,SP和矩阵乘法进行了可扩展性预测．相似文献

19.

High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers 总被引：3，自引：0，他引：3

Takahashi Daisuke Kanada Yasumasa 《The Journal of supercomputing》2000,15(2):207-228

In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2^p3^q5^r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2. 相似文献

20.

一种更有效的并行系统可扩展性模型 总被引：12，自引：0，他引：12

王与力杨晓东《计算机学报》2001,24(1):84-90

文中首先分析了等效率、等速度和等并行开销计算比三种并行系统可扩展性模型的特点,论证了等效率、等速度和等并行开销计算比三种条件的等价性,并指出这三种模型在描描可扩展性时的不直观及其局限性。然后提出了一种新的可扩展性模型。此模型直观地反映出并行系统在机器规模和问题规模扩展时,其性能的扩展特性。实例研究表明,该模型能更有效地解决下列问题：（1）定量研究并行系统的可扩展性;（2）全面地反映程序、机器、环境方面的因素对可扩展性的影响;（3）指导如何保持并行系统的可扩展性。相似文献