共查询到20条相似文献,搜索用时 718 毫秒
1.
2.
3.
4.
5.
高性能计算越来越广泛地应用到科学和工程的各个领域,但实际应用程序获得的性能并未随着机器峰值性能的提高而同比例提高,应用程序只能发挥峰值性能的5%~10%左右,而且两者的差距在扩大,程序性能优化作为解决该问题的方法之一得到了学术界的广泛关注。本文基于安腾微处理器,总结了程序优化的通用方法,给出了程序优化与分析的一般步骤。根据优化与分析步骤,首先对四个程序进行了详细的性能分析,找到性能瓶颈和重点子程序;然后分别根据四个程序的特点,采用基于Cache和指令流水线的优化技术,对程序进行了性能优化;最后给出了性能优化测试结果,分别得到8%~33%的性能提高,取得了良好的优化效果。 相似文献
6.
为了更好地防御僵尸网络,研究了僵尸网络的程序设计与网络组建方法.分析了僵尸网络的功能结构和工作机制,设计了一个僵尸程序,该僵尸程序主要由扫描、漏洞攻击、上传工具和通信模块组成.利用Windows编程技术实现了各个模块,并搭建实验环境对其整体性能进行了测试.测试结果表明,僵尸网络的整体性能达到预期效果.最后讨论了僵尸网路的防范措施. 相似文献
7.
基于SUIF2的程序特征分析技术研究 总被引:1,自引:0,他引:1
程序特征尤其是存储特征和循环特征对计算机系统及编译器的设计和优化至关重要,但庞大的软件系统和程序分析本身的复杂性使得实现程序特征提取和分析的自动化十分困难.针对这个问题,提出了一种新的基于SUIF2的程序特征分析方法,并以SUIF2为平台设计和实现了一个C程序特征分析工具.首先介绍了工具的设计思想和整体结构,然后详细说明了各部分的功能和实现原理,接着总结了该工具的特点,最后给出了SPEC2000中2个基准程序188.ammp和177.mesa的测试结果及分析. 相似文献
8.
高性能计算系统的体系结构日益复杂和现有性能分析工具的智能程度不足,导致高性能计算应用的程序性能分析和优化的成本代价日益高昂。所幸,人工智能领域目前取得了重要进展,其中深度学习技术发挥了重要作用,它给性能分析工具的智能化带来了契机。提出一种基于深度学习的程序性能智能分析框架,其核心思想是将程序的性能分析问题抽象成可用机器学习技术描述的分类问题,使用处理器支持的PMU采集分类所需的性能数据并标准化,使用簇评估技术结合簇的实际含义确定性能问题类别,通过稀疏编码自动学习性能数据特征并构建性能问题分类模型。在神威太湖之光超级计算机上实现了程序性能分析框架原型。实验结果表明,该性能分析方法能够直观地指导程序员快速把握当前应用最为突出的性能瓶颈问题,提高应用优化的效率,降低用户调优代码的成本。 相似文献
9.
针对现有的深度学习模型将程序代码考虑为一个串行序列而错失较大性能优化空间的问题,提出了一种新的基于深度图网络的程序启发式优化方法.该方法采用图神经网络对程序的数据和依赖图进行建模,自动从源代码中抽取有效程序特征,然后再将抽取的特征输入下游模型进行循环向量化参数预测.在LLVM循环向量测试集上,所提出的方法取得了2.08倍的加速比,与现有方法相比提高了12%的性能. 相似文献
10.
主成分线性回归模型分析应用程序性能 总被引:3,自引:0,他引:3
应用程序的性能分析能够给体系架构设计者和性能优化者提供有效的参考和指导.采用主成分线性回归模型分析了SPEC CPU2006的整型程序性能.模型选取性能监测单元采样到的事件为自变量,每条指令的时钟周期数(CPI)作为因变量.模型中采用主成分分析法消除了性能事件之间的相关性.实验结果表明,模型的拟合优度在90%以上,对性能进行预测的平均相对误差为15%.模型从量化上分析了L1,L2高速缓存缺失作为影响性能的关键因素是怎样影响程序性能的. 相似文献
11.
Intel Xeon Phi协处理器的指令集IMCI引入了硬件实现的vgather指令,旨在帮助512位SIMD寄存器访问非连续内存地址上的数据。然而实验结果显示,vgather很有可能成为应用在Xeon Phi协处理器上关键的性能瓶颈之一。基于以上结论,针对vgather的性能建模可以帮助用户深入地掌握和理解Xeon Phi协处理器的性能特性。在实验方法上,本文方法与现存的通过程序段内嵌入汇编代码进行数据统计不同,使用PAPI等性能分析工具直接收集硬件计数器的统计结果,作为模型的实验数据。本文的性能模型基于AGI事件次数和根据VPU_DATA_READ次数估算得出的vgather所导致的平均延迟构建而成。该模型能够对Xeon Phi应用代码中由vgather所导致的总延迟进行预测。最终,为了验证模型预测的准确性,将该模型应用在三维7点stencil应用代码上,预测结果显示,vgather耗时占计算总耗时的约40%。再将该结果与利用intrinsics指令去除vgather后的计算耗时进行了对比验证,结果显示模型预测准确。基于上述结论,采用硬件计数器的统计结果在Xeon Phi协处理器上针对vgather构建了性能模型。同时,通过与其他平台的vgather对比,认为该模型也可以应用在同样具备vgather的Intel CPU处理器平台上。 相似文献
12.
13.
Stefan Manegold Peter A. Boncz Martin L. Kersten 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):231-246
In the past decade, advances in the speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access
is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article,
we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines
for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented data structures
optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and
introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical
model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system.
We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters
found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms
makes them perform well, which is confirmed by experimental results.
Received April 20, 2000 / Accepted June 23, 2000 相似文献
14.
Jung-Hua Wang Jen-Da Rau Chung-Yun Peng 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2000,30(4):586-593
This paper optimizes the performance of the growing cell structures (GCS) model in learning topology and vector quantization. Each node in GCS is attached with a resource counter. During the competitive learning process, the counter of the best-matching node is increased by a defined resource measure after each input presentation, and then all resource counters are decayed by a factor alpha. We show that the summation of all resource counters conserves. This conservation principle provides useful clues for exploring important characteristics of GCS, which in turn provide an insight into how the GCS can be optimized. In the context of information entropy, we show that performance of GCS in learning topology and vector quantization can be optimized by using alpha=0 incorporated with a threshold-free node-removal scheme, regardless of input data being stationary or nonstationary. The meaning of optimization is twofold: (1) for learning topology, the information entropy is maximized in terms of equiprobable criterion and (2) for leaning vector quantization, the use is minimized in terms of equi-error criterion. 相似文献
15.
In this paper we propose a methodology underlying a development of system-wide energy consumption models for servers, which is based on the analysis of performance counters. It enables to estimate the power usage of a machine under any load at runtime. By clustering applications we extract groups of programs having similar characteristics. This allows us to create more specialized and accurate power usage models. By using decision trees it is possible to automatically select an appropriate model to current system load. Training and test sets of programs were used to test the estimates. The presented models are accurate within an error of 4% as verified on servers from different vendors, including the latest pre-production one. 相似文献
16.
Due to the increasing complexity of the processors, developers often seek for tools that would simplify the process of finding bottlenecks while executing applications. Although more and more data may be collected from processors, usually much detailed knowledge about the internals of a given architecture is required to understand them.This paper introduces a Top-Down Characterization Approximation for the analysis of applications performance executed on AMD processors and is an extension of a Top-Down Method initially developed by Intel. Since not all required performance counters are available on AMD processors to calculate the exact values of metrics, this method was named as an approximation. It allows one to get a deeper understanding of different stages of program execution, compare different architectures and identify bottlenecks in out-of-order processors. It hides from the user the complexity of microarchitecture details and at the same time exposes the main contributors of inefficient program execution. This method aims at defining a few main metrics on top of performance counters to easily locate the main efficiency issues.At this time this method was applied to Intel processors only. The main reason behind it was the fact that it uses designated performance counters that are unique among different processors and its portability is not straightforward. Positive feedback from users encouraged the authors to develop a similar technique for AMD processors. 相似文献
17.
The unprecedented burst in power consumption encountered by contemporary datacenters continually boosts the development of energy efficient techniques from both hardware and software perspectives to alleviate the energy problem. The most widely adopted power saving solutions in datacenters that deliver cloud computing services are power capping and VM consolidation. However, without the capability to track the VM power usage precisely, the combined effect of the above two techniques could cause severe performance degradation to the consolidated VMs, thus violating the user service level agreements. In this paper, we propose an integrated VM power model called iMeter, which overcomes the drawbacks of overpresumption and overapproximation in segregated power models used in previous studies. We leverage the kernel-based performance counters that provide accurate performance statistics as well as high portability across heterogeneous platforms to build the VM power model. Principal component analysis is applied to identify performance counters that show strong impact on the VM power consumption with mathematical confidence. We also present a brief interpretation of the first four selected principal components on their indications of VM power consumption. We demonstrate that our approach is independent of underlying hardware and virtualization configurations with clustering analysis. We utilize the support vector regression to build the VM power model predicting the power consumption of both a single VM and multiple consolidated VMs running various workloads. The experimental results show that our model is able to predict the instantaneous VM power usage with an average error of 5% and 4.7% respectively against the actual power measurement. 相似文献
18.
19.