期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘华徐炜民孙强《计算机工程》2004,30(10):82-84

介绍了一个基于MPI编程环境的性能监测／分析工具，该工具对程序运行时的相关硬件系统资源数据进行采集，提供实时和事后两种可视化视图，以便程序员对程序进行实时监测和事后性能分析，帮助找出性能瓶颈并加以改进，提高并行程序的性能。相似文献

2.

陆璐厉旻叶喻《微计算机应用》2009,30(5)

研究了基于模型驱动方式自动产生数据用例的功能测试平台,研究测试模型的建立,提出了基于Windows GUI环境下该功能测试的体系结构,分析了其中的测试脚本开发平台,设计并实现了一种基于模型驱动的功能测试的动作捕获与回放技术的原型工具,该工具能生成测试脚本,并能通过运行录制下的测试脚本动态回归测试基于图形用户接口程序,实际应用系统的测试效果验证了该工具的有效性. 相似文献

3.

基于WSAAsyncSelect模型的GPRS服务端程序设计

熊光亚李晓辉解祥富李冰袁帅《测控技术》2016,35(12):110-113

简要介绍了GPRS通信的基本原理以及在自动化监测系统中的应用情况.详细描述了基于WSAAsyncSelect网络套接字模型的GPRS服务端程序架构和关键实现代码,对使用该模型设计的GPRS服务端程序的大数据量并发测试结果表明,WSAAsyncSelect模型的性能可以应付至少1000个GPRS终端的数据并发,满足中小规模自动化监测系统基于GPRS通信的需求. 相似文献

4.

基于多面体表示的向量化收益评估方法

下载免费PDF全文

张媛媛赵荣彩韩林《计算机工程》2012,38(7):266-268,272

循环变换可提高程序性能,但对其向量化后可能会导致代码性能损失,并不一定会得到预期性能提升。针对该问题,结合目标体系结构特征,在Open64中实现一个基于多面体表示指导循环变换的向量化收益评估模型。该模型可以有效分析各种循环变换方案的代价,选择向量化收益最大的方案组合作为最终的向量化方案。对SPEC测试集的swim等5个程序进行测试,结果表明,收益评估结果与实测向量化加速比相近,可避免盲目优化。相似文献

5.

基于安腾微处理器的程序性能优化与分析

迟利华刘杰《计算机工程与科学》2011,33(9):42

高性能计算越来越广泛地应用到科学和工程的各个领域,但实际应用程序获得的性能并未随着机器峰值性能的提高而同比例提高,应用程序只能发挥峰值性能的5%～10%左右,而且两者的差距在扩大,程序性能优化作为解决该问题的方法之一得到了学术界的广泛关注。本文基于安腾微处理器,总结了程序优化的通用方法,给出了程序优化与分析的一般步骤。根据优化与分析步骤,首先对四个程序进行了详细的性能分析,找到性能瓶颈和重点子程序;然后分别根据四个程序的特点,采用基于Cache和指令流水线的优化技术,对程序进行了性能优化;最后给出了性能优化测试结果,分别得到8%～33%的性能提高,取得了良好的优化效果。相似文献

6.

僵尸网络关键技术及其防御研究

李庆朋郑连清杨仝张串绒《计算机工程与设计》2012,33(1):78-82

为了更好地防御僵尸网络,研究了僵尸网络的程序设计与网络组建方法.分析了僵尸网络的功能结构和工作机制,设计了一个僵尸程序,该僵尸程序主要由扫描、漏洞攻击、上传工具和通信模块组成.利用Windows编程技术实现了各个模块,并搭建实验环境对其整体性能进行了测试.测试结果表明,僵尸网络的整体性能达到预期效果.最后讨论了僵尸网路的防范措施. 相似文献

7.

基于SUIF2的程序特征分析技术研究 总被引：1，自引：0，他引：1

陈桂茸窦勇徐炜遐《计算机研究与发展》2007,44(Z1):254-258

程序特征尤其是存储特征和循环特征对计算机系统及编译器的设计和优化至关重要,但庞大的软件系统和程序分析本身的复杂性使得实现程序特征提取和分析的自动化十分困难.针对这个问题,提出了一种新的基于SUIF2的程序特征分析方法,并以SUIF2为平台设计和实现了一个C程序特征分析工具.首先介绍了工具的设计思想和整体结构,然后详细说明了各部分的功能和实现原理,接着总结了该工具的特点,最后给出了SPEC2000中2个基准程序188.ammp和177.mesa的测试结果及分析. 相似文献

8.

一种基于深度学习的性能分析框架设计与实现

冯赟龙刘勇何王全《计算机工程与科学》2018,40(6):984-991

高性能计算系统的体系结构日益复杂和现有性能分析工具的智能程度不足,导致高性能计算应用的程序性能分析和优化的成本代价日益高昂。所幸,人工智能领域目前取得了重要进展,其中深度学习技术发挥了重要作用,它给性能分析工具的智能化带来了契机。提出一种基于深度学习的程序性能智能分析框架,其核心思想是将程序的性能分析问题抽象成可用机器学习技术描述的分类问题,使用处理器支持的PMU采集分类所需的性能数据并标准化,使用簇评估技术结合簇的实际含义确定性能问题类别,通过稀疏编码自动学习性能数据特征并构建性能问题分类模型。在神威太湖之光超级计算机上实现了程序性能分析框架原型。实验结果表明,该性能分析方法能够直观地指导程序员快速把握当前应用最为突出的性能瓶颈问题,提高应用优化的效率,降低用户调优代码的成本。相似文献

9.

基于深度图网络的编译器向量化启发式算法

冯晖王亚刚《计算机应用研究》2021,38(8):2349-2353

针对现有的深度学习模型将程序代码考虑为一个串行序列而错失较大性能优化空间的问题,提出了一种新的基于深度图网络的程序启发式优化方法.该方法采用图神经网络对程序的数据和依赖图进行建模,自动从源代码中抽取有效程序特征,然后再将抽取的特征输入下游模型进行循环向量化参数预测.在LLVM循环向量测试集上,所提出的方法取得了2.08倍的加速比,与现有方法相比提高了12％的性能. 相似文献

10.

主成分线性回归模型分析应用程序性能 总被引：3，自引：0，他引：3

李胜梅程步奇高兴誉乔林汤志忠《计算机研究与发展》2009,46(11)

应用程序的性能分析能够给体系架构设计者和性能优化者提供有效的参考和指导.采用主成分线性回归模型分析了SPEC CPU2006的整型程序性能.模型选取性能监测单元采样到的事件为自变量,每条指令的时钟周期数(CPI)作为因变量.模型中采用主成分分析法消除了性能事件之间的相关性.实验结果表明,模型的拟合优度在90%以上,对性能进行预测的平均相对误差为15%.模型从量化上分析了L1,L2高速缓存缺失作为影响性能的关键因素是怎样影响程序性能的. 相似文献

11.

利用Stencil建模及评估Intel IMCI vgather指令

林新华王一超秦强李硕文敏华松岡聡《计算机工程与科学》2016,38(9):1741-1747

Intel Xeon Phi协处理器的指令集IMCI引入了硬件实现的vgather指令,旨在帮助512位SIMD寄存器访问非连续内存地址上的数据。然而实验结果显示,vgather很有可能成为应用在Xeon Phi协处理器上关键的性能瓶颈之一。基于以上结论,针对vgather的性能建模可以帮助用户深入地掌握和理解Xeon Phi协处理器的性能特性。在实验方法上,本文方法与现存的通过程序段内嵌入汇编代码进行数据统计不同,使用PAPI等性能分析工具直接收集硬件计数器的统计结果,作为模型的实验数据。本文的性能模型基于AGI事件次数和根据VPU_DATA_READ次数估算得出的vgather所导致的平均延迟构建而成。该模型能够对Xeon Phi应用代码中由vgather所导致的总延迟进行预测。最终,为了验证模型预测的准确性,将该模型应用在三维7点stencil应用代码上,预测结果显示,vgather耗时占计算总耗时的约40%。再将该结果与利用intrinsics指令去除vgather后的计算耗时进行了对比验证,结果显示模型预测准确。基于上述结论,采用硬件计数器的统计结果在Xeon Phi协处理器上针对vgather构建了性能模型。同时,通过与其他平台的vgather对比,认为该模型也可以应用在同样具备vgather的Intel CPU处理器平台上。相似文献

12.

基于梯度提升回归树的处理器性能数据挖掘研究

吕依蓉孙斌喻之斌《集成技术》2018,7(5):47-57

现代处理器一般只内置了 4～8 个性能计数器,但可以监测多达上千个时钟周期级别的性能事件。这些事件可以轻易地产生大量数据,称为处理器性能大数据。然而,如何从这些性能大数据中提取有价值的信息面临着许多挑战。该文提出一种处理器性能数据分析方法,通过迭代使用梯度提升回归树算法构建性能模型,为云计算负载的性能事件进行重要性排序,从而指导云计算平台的性能调优。相似文献

13.

Optimizing database architecture for the new bottleneck: memory access

Stefan Manegold Peter A. Boncz Martin L. Kersten 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):231-246

In the past decade, advances in the speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented data structures optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system. We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms makes them perform well, which is confirmed by experimental results. Received April 20, 2000 / Accepted June 23, 2000 相似文献

14.

Toward optimizing a self-creating neural network

Jung-Hua Wang Jen-Da Rau Chung-Yun Peng 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2000,30(4):586-593

This paper optimizes the performance of the growing cell structures (GCS) model in learning topology and vector quantization. Each node in GCS is attached with a resource counter. During the competitive learning process, the counter of the best-matching node is increased by a defined resource measure after each input presentation, and then all resource counters are decayed by a factor alpha. We show that the summation of all resource counters conserves. This conservation principle provides useful clues for exploring important characteristics of GCS, which in turn provide an insight into how the GCS can be optimized. In the context of information entropy, we show that performance of GCS in learning topology and vector quantization can be optimized by using alpha=0 incorporated with a threshold-free node-removal scheme, regardless of input data being stationary or nonstationary. The meaning of optimization is twofold: (1) for learning topology, the information entropy is maximized in terms of equiprobable criterion and (2) for leaning vector quantization, the use is minimized in terms of equi-error criterion. 相似文献

15.

Runtime power usage estimation of HPC servers for various classes of real-life applications

《Future Generation Computer Systems》2014

In this paper we propose a methodology underlying a development of system-wide energy consumption models for servers, which is based on the analysis of performance counters. It enables to estimate the power usage of a machine under any load at runtime. By clustering applications we extract groups of programs having similar characteristics. This allows us to create more specialized and accurate power usage models. By using decision trees it is possible to automatically select an appropriate model to current system load. Training and test sets of programs were used to test the estimates. The presented models are accurate within an error of 4% as verified on servers from different vendors, including the latest pre-production one. 相似文献

16.

Top-Down Characterization Approximation based on performance counters architecture for AMD processors

《Simulation Modelling Practice and Theory》2016

Due to the increasing complexity of the processors, developers often seek for tools that would simplify the process of finding bottlenecks while executing applications. Although more and more data may be collected from processors, usually much detailed knowledge about the internals of a given architecture is required to understand them.This paper introduces a Top-Down Characterization Approximation for the analysis of applications performance executed on AMD processors and is an extension of a Top-Down Method initially developed by Intel. Since not all required performance counters are available on AMD processors to calculate the exact values of metrics, this method was named as an approximation. It allows one to get a deeper understanding of different stages of program execution, compare different architectures and identify bottlenecks in out-of-order processors. It hides from the user the complexity of microarchitecture details and at the same time exposes the main contributors of inefficient program execution. This method aims at defining a few main metrics on top of performance counters to easily locate the main efficiency issues.At this time this method was applied to Intel processors only. The main reason behind it was the fact that it uses designated performance counters that are unique among different processors and its portability is not straightforward. Positive feedback from users encouraged the authors to develop a similar technique for AMD processors. 相似文献

17.

iMeter: An integrated VM power model based on performance profiling

《Future Generation Computer Systems》2014

The unprecedented burst in power consumption encountered by contemporary datacenters continually boosts the development of energy efficient techniques from both hardware and software perspectives to alleviate the energy problem. The most widely adopted power saving solutions in datacenters that deliver cloud computing services are power capping and VM consolidation. However, without the capability to track the VM power usage precisely, the combined effect of the above two techniques could cause severe performance degradation to the consolidated VMs, thus violating the user service level agreements. In this paper, we propose an integrated VM power model called iMeter, which overcomes the drawbacks of overpresumption and overapproximation in segregated power models used in previous studies. We leverage the kernel-based performance counters that provide accurate performance statistics as well as high portability across heterogeneous platforms to build the VM power model. Principal component analysis is applied to identify performance counters that show strong impact on the VM power consumption with mathematical confidence. We also present a brief interpretation of the first four selected principal components on their indications of VM power consumption. We demonstrate that our approach is independent of underlying hardware and virtualization configurations with clustering analysis. We utilize the support vector regression to build the VM power model predicting the power consumption of both a single VM and multiple consolidated VMs running various workloads. The experimental results show that our model is able to predict the instantaneous VM power usage with an average error of 5% and 4.7% respectively against the actual power measurement. 相似文献

18.

MOTEC: 一个存储一致性模型验证工具

下载免费PDF全文

吕正陈昊陈峰吕毅《计算机工程》2012,38(11):242-246

由于缺乏可利用的额外观察条件,在芯片流片后阶段进行存储一致性模型验证较困难。为此,利用多核处理器系统中通用的性能计数器,通过定期扫描性能计数器以获得关键活动访存指令集合的信息,实现MOTEC工具。该工具由MOTEC随机指令发生模块、多核处理器性能计数器记录模块和MOTEC分析模块3个部分组成。对其核心算法的分析结果表明,MOTEC的时间复杂度仅为 ,在目前流片后阶段进行验证的工具中时间复杂度最低。相似文献

19.

OpenMP并行程序的性能数据采集

富弘毅周海芳杨学军《计算机工程》2005,31(19):67-69,78

目前,随着大规模并行计算的高速发展,并行程序性能分析与建模的地位日益重要,而并行程序性能数据的采集是进行性能分析的基础。硬件计数器的使用使人们能够更加便利地在程序执行过程中采集性能数据。文章讨论了OpenMP并行程序的性能数据采集技术,并介绍一种利用PAPI进行数据采集的实现方法。相似文献

20.

基于硬件性能计数器的软件性能数据采集与分析研究

程克非张聪汪林林张勤《计算机应用》2005,25(10):2431-2433

引入了基于CPU硬件性能计数器的性能数据采集和分析方法，从软件运行时刻的细粒度参数入手分析软件运行时刻的性能表现，从而更为准确地反映系统实际的动态运行状态。实验证明，这种方法对于需要详细掌握系统动态运行状态的应用能够提供非常有效的分析数据，同时也在一定程度上对编译器的性能优化给出了相关参考数据。相似文献