期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Conventional Benchmarks as a Sample of the Performance Spectrum

Gustafson John L. Todi Rajat 《The Journal of supercomputing》1999,13(3):321-342

Most benchmarks are smaller than actual application programs. One reason is to improve benchmark universality by demanding resources every computer is likely to have. However, users dynamically increase the size of application programs to match the power available, whereas most benchmarks are static and of a size appropriate for computers available when the benchmark was created; this is particularly true for parallel computers. Thus, the benchmark overstates computer performance, since smaller problems spend more time in cache. Scalable benchmarks, such as HINT, examine the full spectrum of performance through various memory regimes, and express a superset of the information given by any particular fixed-size benchmark. Using 5,000 experimental measurements, we have found that performance on the NAS Parallel Benchmarks, SPEC, LINPACK, and other benchmarks is predicted accurately by subsets of HINT performance curve. Correlations are typically better than 0.995. Predicted ranking is often perfect. 相似文献

2.

面向云文件系统的隔离性度量方法研究

周丽张天明任祖杰施巍松万健张纪林李尤慧子叶正《计算机工程与科学》2017,39(7):1227-1233

随着云计算的快速发展,云文件系统在云计算基础设施中扮演着越来越重要的角色。尽管目前业界已有不少面向云文件系统的性能评测工具,但大多数评测工具仅关注于传统的系统性能指标,比如IOPS和吞吐量,难以评估云文件系统在多租户环境下的性能隔离性。由于云环境I/O负载的动态性和异构性,所以准确评估云文件系统的隔离性变得更加具有挑战性。提出了一种新型的云文件系统隔离性度量模型,并在一个基准测试工具Porcupine中进行了实现。Porcupine通过模拟真实负载特征的I/O请求,实现对负载与性能的准确仿真并提高文件系统的测试效率。通过对Ceph文件系统的实验,验证了所提出的隔离性度量模型的有效性及准确性。相似文献

3.

2017 年中国高性能计算机发展现状分析与展望

张云泉《数据与计算发展前沿》2018,9(1):5-12

本文根据 2017 年 11 月发布的中国高性能计算机 TOP100 排行榜的数据,对国内高性能计算机的发展现状从总体性能、制造商、行业领域等方面进行了深入分析。中国 TOP100 的平均 Linpack 性能继续保持比国际 TOP500 平均 Linpack 性能高的局面,且 TOP100 的入门性能门槛仍然超过 TOP500。中国 TOP100 上的超级计算系统已经几乎全部都是国产超算系统,浪潮成为新的数量冠军,曙光、联想和浪潮三强争霸的局面继续保持和加强。在此基础上,本文根据十五届排行榜的性能数据,对未来中国大陆高性能计算机的发展趋势进行了分析预测。根据新的数据,我们认为：峰值 Exaflops 的机器将在 2018 年到 2019 年间出现;峰值 10Exaflops 的机器将在 2022 年到 2023 年间出现;峰值 100Exaflops 的机器将在 2024 年到 2025 年间出现。相似文献

4.

2019年中国高性能计算机发展现状分析与展望

张云泉袁良袁国兴李希代《数据与计算发展前沿》2020,(1):18-26

【目的】本文根据2019年11月发布的中国高性能计算机TOP100排行榜的数据,对国内高性能计算机的发展现状从总体性能、制造商、行业领域等方面进行了深入分析。【结果】中国TOP100的平均Linpack性能继续保持比国际TOP500平均Linpack性能高的局面,且TOP100的入门性能门槛仍然超过TOP500。中国TOP100上的超级计算系统依然全部都是国产超算系统,曙光和联想并列为数量冠军,曙光、联想和浪潮三强争霸的局面继续保持和加强。【结论】在此基础上,本文根据十八届排行榜的性能数据,对未来中国大陆高性能计算机的发展趋势进行了分析预测。根据新的数据,我们认为:峰值Exaflops的机器将在2020年到2021年间出现;峰值10Exaflops的机器将在2022年到2023年间出现;峰值100Exaflops的机器将在2024年到2025年间出现。相似文献

5.

Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays在大规模系统上优化 TPC-C 评测程序

Jidong Zhai Feng Zhang Qingwen Li Wenguang Chen Weimin Zheng 《中国科学:信息科学(英文版)》2016,59(9):92104

Transaction processing performance council benchmark C (TPC-C) is the de facto standard for evaluating the performance of high-end computers running on-line transaction processing applications. Differing from other standard benchmarks, the transaction processing performance council only defines specifications for the TPC-C benchmark, but does not provide any standard implementation for end-users. Due to the complexity of the TPC-C workload, it is a challenging task to obtain optimal performance for TPC-C evaluation on a large-scale high-end computer. In this paper, we designed and implemented a large-scale TPC-C evaluation system based on the latest TPC-C specification using solid-state drive (SSD) storage devices. By analyzing the characteristics of the TPC-C workload, we propose a series of system-level optimization methods to improve the TPC-C performance. First, we propose an approach based on SmallFile table space to organize the test data in a round-robin method on all of the disk array partitions; this can make full use of the underlying disk arrays. Second, we propose using a NOOP-based disk scheduling algorithm to reduce the utilization rate of processors and improve the average input/output service time. Third, to improve the system translation lookaside buffer hit rate and reduce the processor overhead, we take advantage of the huge page technique to manage a large amount of memory resources. Lastly, we propose a locality-aware interrupt mapping strategy based on the asymmetry characteristic of non-uniform memory access systems to improve the system performance. Using these optimization methods, we performed the TPC-C test on two large-scale high-end computers using SSD arrays. The experimental results show that our methods can effectively improve the TPC-C performance. For example, the performance of the TPC-C test on an Intel Westmere server reached 1.018 million transactions per minute. 相似文献

6.

Statistical modelling of communities and ecosystems using the LAMDA software tool

Steven V. Viscido Elizabeth E. Holmes 《Environmental Modelling & Software》2010,25(12):1905-1908

Understanding species interactions is critical to discovering community dynamics. Recently, statistical methods for estimating species interaction strengths from time series data have been developed based on multivariate auto-regressive first-order, or MAR(1), models. However, the complex coding required presents a substantial barrier for most ecologists. We have developed LAMBDA, a software program that allows users to easily fit MAR(1) models to multi-species time series data. The LAMBDA package covers: data input and transformation, selection of the interactions to include via a search algorithm and model selection, estimation of interaction parameters via conditional least squares (CLS) regression or two different maximum-likelihood (ML) algorithms, estimation of confidence intervals via bootstrapping, and computation of community stability properties using the estimated model. We describe performance tests on the variability of estimates, computation speed, and CLS versus ML estimation using simulated data. 相似文献

7.

The design of a scalable, fixed-time computer benchmark

John Gustafson Diane Rover Stephen Elbert Michael Carter 《Journal of Parallel and Distributed Computing》1991,12(4)

By using the principle of fixed-time benchmarking, it is possible to compare a wide range of computers, from a small personal computer to the most powerful parallel supercomputer, on a single scale. Fixed-time benchmarks promise greater longevity than those based on a particular problem size and are more appropriate for “grand challenge” capability comparison. We present the design of a benchmark, SLALOM, that adjusts automatically to the computing power available and corrects several deficiencies in various existing benchmarks: it is highly scalable, solves a real problem, includes input and output times, and can be run on parallel computers of all kinds, using any convenient language. The benchmark provides an estimate of the size of problem solvable on scientific computers. It also can be used to demonstrate a new source of superlinear speedup in parallel computers. Results that span six orders of magnitude for contemporary computers of various architectures are presented. 相似文献

8.

A three-tier redundant architecture for safe and reliable cloud-based CNC over public internet networks

《Robotics and Computer》2020

Cloud-based CNC (C-CNC) is an emerging concept within Industry 4.0 where computer numerical control (CNC) functionalities are moved to the cloud and provided to manufacturing machines as a service. Among many benefits, C-CNC allows manufacturing machines to leverage advanced control algorithms running on cloud computers to boost their performance at low cost, without need for major hardware upgrades. However, a fundamental challenge of C-CNC is how to guarantee safety and reliability of machine control given variable Internet quality of service (e.g. delays), especially on public Internet networks. We propose a three-tier redundant architecture to address this challenge. We then prototype tier one of the architecture on a 3D printer successfully controlled via C-CNC over public Internet connections, and discuss follow-on research opportunities. 相似文献

9.

Improved estimation of clutter properties in speckled imagery

Francisco Cribari-Neto Alejandro C. Frery Michel F. Silva 《Computational statistics & data analysis》2002,40(4)

This paper's aim is to evaluate the effectiveness of bootstrap methods in improving estimation of clutter properties in speckled imagery. Estimation is performed by standard maximum likelihood methods. We show that estimators obtained this way can be quite biased in finite samples, and develop bias correction schemes using bootstrap resampling. In particular, we propose a bootstrapping scheme which is an adaptation of that proposed by Efron (J. Amer. Statist. Assoc. 85 (1990) 79). The proposed bootstrap does not require the quantity of interest to have closed form, as does Efron's original proposal. The adaptation we suggest is particularly important since the maximum likelihood estimator of interest does not have a closed form. We show that this particular bootstrapping scheme outperforms alternative forms of bias reduction mechanisms, thus delivering more accurate inference. We also consider interval estimation using bootstrap methods, and show that a particular parametric bootstrap-based confidence interval is typically more reliable than both the asymptotic confidence interval and other bootstrap-based confidence intervals. An application to real data is presented and discussed. 相似文献

10.

Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Ramanan Sankaran Jordan Angel W. Michael Brown 《Concurrency and Computation》2015,27(17):4763-4783

The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter‐task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm‐based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. Application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, thereby enabling the applications to achieve better time to solution and scalability on Titan during production. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

11.

Parallelizing genetic linkage analysis: a case study for applying parallel computation in molecular biology 总被引：1，自引：0，他引：1

P L Miller P Nadkarni J E Gelernter N Carriero A J Pakstis K K Kidd 《Computers and biomedical research》1991,24(3):234-248

Parallel computers offer a solution to improve the lengthy computation time of many conventional, sequential programs used in molecular biology. On a parallel computer, different pieces of the computation are performed simultaneously on different processors. LINKMAP is a sequential program widely used by scientists to perform genetic linkage analysis. We have converted LINKMAP to run on a parallel computer, using the machine-independent parallel programming language, Linda. Using the parallelization of LINKMAP as a case study, the paper outlines an approach to converting existing highly iterative programs to a parallel form. The paper describes the steps involved in converting the sequential program to a parallel program. It presents performance benchmarks comparing the sequential version of LINKMAP with the parallel version running on different parallel machines. The paper also discusses alternative approaches to the problem of "load balancing," making sure the computational load is shared as evenly as possible among the available processors. 相似文献

12.

Benchmarking and Comparison of the Task Graph Scheduling Algorithms 总被引：2，自引：0，他引：2

Yu-Kwong Kwok Ishfaq Ahmad 《Journal of Parallel and Distributed Computing》1999,59(3):289

The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather meaningless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories according to their assumptions and functionalities. We then propose a set of benchmarks that are based on diverse structures and are not biased toward a particular scheduling technique. We have implemented 15 scheduling algorithms and compared them on a common platform by using the proposed benchmarks, as well as by varying important problem parameters. We interpret the results based upon the design philosophies and principles behind these algorithms, drawing inferences why some algorithms perform better than others. We also propose a performance measure called scheduling scalability (SS) that captures the collective effectiveness of a scheduling algorithm in terms of its solution quality, the number of processors used, and the running time. 相似文献

13.

A comparative workload-based methodology for performance evaluation of parallel computers 总被引：1，自引：0，他引：1

E. Onbasioglu Y. Paker 《Future Generation Computer Systems》1997,12(6):521-545

A practical methodology for evaluating and comparing the performance of distributed memory Multiple Instruction Multiple Data (MIMD) systems is presented. The methodology determines machine parameters and program parameters separately, and predicts the performance of a given workload on the machines under consideration. Machine parameters are measured using benchmarks that consist of parallel algorithm structures. The methodology takes a workload-based approach in which a mix of application programs constitutes the workload. Performance of different systems are compared, under the given workload, using the ratio of their speeds. In order to validate the methodology, an example workload has been constructed and the time estimates have been compared with the actual runs, yielding good predicted values. Variations in the workload are analysed in terms of increase in problem sizes and changes in the frequency of particular algorithm groups. Utilization and scalability are used to compare the systems when the number of processors is increased. It has been shown that performance of parallel computers is sensitive to the changes in the workload and therefore any evaluation and comparison must consider a given user workload. Performance improvement that can be obtained by increasing the size of a distributed memory MIMD system depends on the characteristics of the workload as well as the parameters that characterize the communication speed of the parallel system. 相似文献

14.

Performance based design of high-level language-directed computerarchitectures

Katti R.S. Manwaring M.L. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1998,28(2):219-227

This paper is concerned with the analytical modeling of computer architectures to aid in the design of high-level language-directed computer architectures. High-level language-directed computers are computers that execute programs in a high-level language directly. The design procedure of these computers are at best described as being ad hoc. In order to systematize the design procedure, we introduce analytical models of computers that predict the performance of parallel computations on concurrent computers. We model computers as queueing networks and parallel computations as precedence graphs. The models that we propose are simple and lead to computationally efficient procedures of predicting the performance of parallel computations on concurrent computers. We demonstrate the use of these models in the design of high-level language-directed computer architectures. 相似文献

15.

Puppetry vs. Creationism: Why AI Must Cross the Chasm

Hayes-Roth R. 《Intelligent Systems, IEEE》2006,21(5):7-9

Artificial intelligence began with an enthusiastic embrace of newly available computing machinery and the basic question of what kinds of problems we could solve with it. The first 50 years focused on programming computers to perform tasks that previously only humans could do. Then, people began comparing machines to humans as problem solvers, and the race was on to see where machines could match or even surpass human performance. Success in solving math word problems, winning checkers and chess championships, understanding natural language, and generating plans and schedules reinforced our efforts to build supercapable machines. The author call these puppets, not to derogate the machines but to respect the importance of the programmers and builders who were actually responsible for their accomplishments. From time to time, many of us have recognized the field's rate-limiting factor under various names and viewpoints, such as the knowledge-acquisition bottleneck and the challenges of machine learning, system bootstrapping, artificial life, and self-organizing systems. Mostly, however, these efforts have had limited success. The little bit of learning and adaptation they've demonstrated has paled in comparison to the puppeteers' laborious inputs 相似文献

16.

PORTING REGULAR APPLICATIONS ON HETEROGENEOUS WORKSTATION NETWORKS: PERFORMANCE ANALYSIS AND MODELING

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(3):205-226

Abstract

Heterogeneous networks of workstations and/or personal computers (NOW) are increasingly used as a powerful platform for the execution of parallel applications. When applications previously developed for traditional parallel machines (homogeneous and dedicated) are ported to NOWs, performance worsens owing in part to less efficient communications but more often to unbalancing.

In this paper, we address the problem of the efficient porting to heterogeneous NOWs of data-parallel applications originally developed using the SPMD paradigm for homogeneous parallel systems with regular topology like ring.

To achieve good performance, the computation time on the various machines composing the NOW must be as balanced as possible. This can be obtained in two ways: by using an heterogeneous data partition strategy with a single process per node, or by splitting homogeneously data among processes and assigning to each node a number of processes proportional to its computing power. The first method is however more difficult, since some modifications in the code are always needed, whereas the second approach requires very few changes.

We carry out a simplified but reliable analysis, and propose a simple model able to simulate performance in the various situations. Two test cases, matrix multiplication and computation of long-range interactions, are considered, obtaining a good agreement between simulated and experimental results.

Our analysis shows that an efficient porting of regular homogeneous data-parallel applications on heterogeneous NOWs is possible. Particularly, the approach based on multiple processes per node turns out to be a straightforward and effective way for achieving very satisfying performance in almost all situations, even dealing with highly heterogeneous systems. 相似文献

17.

New architectures: Performance highlights and new algorithms

Oliver A. McBryan 《Parallel Computing》1988,7(3):477-499

Parallel computers are having a profound impact on computational science. Recently highly parallel machines have taken the lead as the fastest supercomputers, a trend that is likely to accelerate in the future. We describe some of these new computers, and issues involved in using them. We present elliptic PDE solutions currently running at 3.8 gigaflops, and an atmospheric dynamics model running at 1.7 gigaflops, on a 65 536-processor computer.

One intrinsic disadvantage of a parallel machine is the need to perform inter-processor communication. It is important to ensure that such communication time is maintained at a small fraction of computation time. We analyze standard multigrid algorithms in two and three dimensions from this point of view, indicating that performance efficiencies in excess of 95% are attainable under suitable conditions on moderately parallel machines. We also demonstrate that such performance is not attainable for multigrid on massively parallel computers, as indicated by an example of poor multigrid efficiency on 65 536 processors. The fundamental difficulty is the inability to keep 65 536 processors busy when operating on very coarse grids.

Most algorithms used for implementing applications on parallel machines have been derived directly from algorithms designed for serial machines. The previously mentioned multigrid example indicates that such ‘parallelized’ algorithms may not always be optimal. Parallel machines open the possibility of finding totally new approaches to solving standard tasks—intrinsically parallel algorithms. In particular, we present a class of superconvergent multiple scale methods that were motivated directly by massevely parallel machines. These methods differ from standard multigrid methods in an intrinsic way, and allow all processors to be used at all times, even when processing on the coarsest grid levels. Their serial versions are not sensible algorithms. The idea that parallel hardware—the Connection Machine in this case—can lead to discovery of new mathematical algorithms was surprising for us. 相似文献

18.

Simulation-Based Performance Prediction for Large Parallel Machines 总被引：1，自引：0，他引：1

Gengbin Zheng Terry Wilmarth Praveen Jagadishprasad Laxmikant V. Kalé 《International journal of parallel programming》2005,33(2-3):183-207

We present a performance prediction environment for large scale computers such as the Blue Gene machine. It consists of a parallel simulator, BigSim, for predicting performance of machines with a very large number of processors, and BigNetSim, which incorporates a pluggable module of a detailed contention-based network model. The simulators provide the ability to make performance predictions for very large machines such as Blue Gene/L. We illustrate the utility of our simulators using validation and prediction studies of several applications using smaller numbers of processors for simulations. 相似文献

19.

Partial 3‐D Correspondence from Shape Extremities

Y. Sahillioğlu Y. Yemez 《Computer Graphics Forum》2014,33(6):63-76

We present a 3‐D correspondence method to match the geometric extremities of two shapes which are partially isometric. We consider the most general setting of the isometric partial shape correspondence problem, in which shapes to be matched may have multiple common parts at arbitrary scales as well as parts that are not similar. Our rank‐and‐vote‐and‐combine algorithm identifies and ranks potentially correct matches by exploring the space of all possible partial maps between coarsely sampled extremities. The qualified top‐ranked matchings are then subjected to a more detailed analysis at a denser resolution and assigned with confidence values that accumulate into a vote matrix. A minimum weight perfect matching algorithm is finally iterated to combine the accumulated votes into an optimal (partial) mapping between shape extremities, which can further be extended to a denser map. We test the performance of our method on several data sets and benchmarks in comparison with state of the art. 相似文献

20.

Geometric properties estimation from discrete curves using discrete derivatives

Yi An Cheng Shao Xiaoliang Wang Zhuohan Li 《Computers & Graphics》2011,35(4):916-930

Accurate geometric properties estimation from discrete curves is an important problem in many application domains, such as computer vision, pattern recognition, image processing, and geometric modeling. In this paper, we propose a novel method for estimating the geometric properties from discrete curves based on derivative estimation. We develop derivative estimation by defining the derivative of a discrete function at a point, which will be called the discrete derivative. Similarly, the second and higher order discrete derivatives at that point are also defined, and their convergence is demonstrated by theory analysis. These definitions of the different order discrete derivatives provide a simple and reliable way to estimate the derivatives from discrete curves. Based on the discrete derivatives, classical differential geometry can be discretized, and the geometric properties are estimated from discrete curves by using differential geometry theory. The proposed method is independent of any analytic curve and estimates the geometric properties directly from discrete data points, which makes it robust to the geometric shapes of discrete curves. Another advantage of the proposed method is the robustness to noise because of the calculation characteristics of the discrete derivatives. The proposed method is evaluated and compared with other existing methods in the experiments with both synthetic and real discrete curves. The test results show that the proposed method has good performance, and is robust to noise and suitable for different curve shapes. 相似文献