期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Intel Cascade Lake架构CPU SPEC CPU2017评测

杜琦黄卉龚盛刘新娃黄春《计算机工程与科学》2021,43(1):49-57

SPEC CPU2017基准包中包含SPEC的下一代行业标准,是目前CPU性能评测的客观和可信的基准程序之一.采用SPEC CPU2017对Intel Cascade Lake架构的Intel Xeon Gold 6252N型号CPU做了不同内存频率、不同副本数、打开/关闭T urbo的组合测试,总结了不同应用程序在不... 相似文献

2.

高性能计算机系统相对持续性能度量模型

肖华东孙婧魏敏李娟沈瑜《计算机工程与应用》2015,51(5):33-37

高性能计算机系统的持续性能是反映实际领域应用中高性能计算机系统性能强弱的重要度量标准。简单介绍了高性能计算机系统常用的性能评价方法,结合应用基准程序集,提出了相对持续性能的度量模型。实验基于高性能计算气象应用评测,结果表明应用相对持续性能度量模型可区分不同厂商的高性能计算机系统的性能强弱,为高性能计算机系统的选择提供参考,并在一定程度上反映了气象应用本身的可扩展性。相似文献

3.

一种基于推测代价评估的推测多线程并行粒度调节方法

李美蓉赵银亮《计算机应用与软件》2019,36(4)

传统的推测多线程技术总是假定程序的并行粒度大小应该随着处理器核资源数目的增加而增大,未考虑不同数目的处理器核资源对程序自身并行性能的影响作用。针对这个问题,提出一种自适应的循环并行粒度调节方法用于优化处理器核资源的分配过程。以推测级为单位,通过动态收集循环中所有推测线程的性能量化分析结果,进行推测代价评估。并利用评估结果动态调整循环的并行粒度大小,优化所分配到的处理器核资源的数目,以减少不必要的推测代价。实验表明,该方法不但在SPEC CPU基准测试程序集上能取得较好的性能提升,而且进一步优化了推测时的能耗开销。相似文献

4.

龙芯2号处理器的同时多线程设计 总被引：1，自引：0，他引：1

李祖松许先超胡伟武唐志敏《计算机学报》2009,32(11)

提出了适合龙芯2号处理器的同时多线程处理器模型,并介绍了具体的微体系结构设计以及相应的Linux操作系统的实现方案.通过在设计的龙芯2号同时多线程处理器上启动Linux操作系统,并运行应用程序,例如SPEC CPU2000,进行性能评测.结果表明,龙芯2号同时多线程处理器通过挖掘线程级并行性,将龙芯2号处理器的性能提高了31.1%. 相似文献

5.

一种静态LoC关键性预测器设计

下载免费PDF全文

李清波苟鹏飞孙骏杨兵王进祥《计算机工程》2012,38(7):253-256

针对不同分簇超标量处理器结构下SPEC2000程序中指令关键可能性(LoC)的特性,提出一种静态LoC关键性预测器的设计方法。对指令LoC进行研究,根据其结构无关性和动态不变性,设计预测器。仿真结果表明,在对1×8分簇超标量处理器使用该设计时,程序的每周期指令数平均提升5.3%,性能优于动态LoC预测器。相似文献

6.

新闻

《中国计算机用户》2007,(32):8-10

AMD将重拾订单下月首发四核CPU AMD公司将于9月10日全球首发四核处理器"巴塞罗那"。据悉,AMD已向SPEC基准评测组织提交了最新的性能评估报告,报告显示巴塞罗那的性能比竞争产品提升25%以上。届时,大型OEM厂商都会推出基于巴塞罗那的产品,很多软件企业也与AMD针对巴塞罗那展开合作。AMD公司CEO预期:巴塞罗那将会迅速为AMD重拾订单。相似文献

7.

龙芯2号同时多线程处理器的软硬件接口设计 总被引：1，自引：0，他引：1

李祖松许先超胡伟武唐志敏《软件学报》2007,18(7):1806-1817

随着生产工艺的提高,芯片上能集成越来越多的晶体管,多线程技术也逐步成为一种主流的处理器体系结构技术,而多线程处理器的软硬件接口也就成为急需解决的问题.在分析同时多线程的软件需求的基础上,提出龙芯2号同时多线程处理器的软硬件接口协同设计解决方案,给出相应的操作系统实现方案.同时,在Linux 2.4.20的基础上实现了龙芯2号同时多线程处理器相应的操作系统.通过运行SPEC CPU2000等测试程序进行性能评测,充分说明实现软硬件接口的龙芯2号同时多线程处理器极大地提高了多进程负载的性能.分析和设计方案不仅适用于同时多线程处理器,而且对于片内多核处理器的设计也有借鉴作用. 相似文献

8.

主流卷积神经网络的硬件设计与性能分析

徐青青安虹武铮金旭《计算机系统应用》2020,29(2):49-57

作为深度学习领域中最具有影响力的网络结构之一,卷积神经网络朝着更深更复杂的方向发展,对硬件计算能力提出了更高的要求,随之出现了神经网络专用处理器.为了对这类处理器进行客观比较,并指导软硬件优化设计,本文针对卷积神经网络提出了宏基准测试程序和微基准测试程序.其中,宏基准测试程序包含主流的卷积神经网络模型,用于处理器性能的多方位评估和对比;微基准测试程序包含卷积神经网络中的核心网络层,用于细粒度定位性能瓶颈并指导优化.为了准确描述这套基准测试程序在真实硬件平台上的性能表现,本文选取了I/O等待延迟、跨节点通信延迟和CPU利用率3大系统性能评测指标以及IPC、分支预测、资源竞争和访存表现等微架构性能评测指标.基于评测结果,本文为处理器的硬件设计与架构改进提出了可靠建议. 相似文献

9.

Linux温度感知的调度器研究与实现

夏亮祝永新《微型电脑应用》2009,25(2):21-24

针对处理器的温度管理问题,在操作系统层次上提出一种轮转调度算法,改进基于门限温度的调度算法,并在Linux内核中实现了这两个算法。将SPEC2K进行分类并组合成不同冷热特性的负载,在Intel双核处理器下进行了测试。表明Linux基准调度程序在调度方面缺乏有效的温度管理。基于门限温度的调度算法把热的任务迁移到冷的处理器上,缓解了处理器的温度过高问题。轮转调度算法有规律地让任务在双核上执行相等的时问,更好地平衡了处理器的温度,并且系统的吞吐量不受影响。相似文献

10.

基于LLVM中间表示的数据依赖并行计算方法

朱燕《计算机应用研究》2020,37(2):437-442

底层虚拟机（LLVM）是一个广泛使用的编译框架,其中间表示（IR）中包含有丰富的程序分析信息,众多以LLVM为平台的相关工作均以IR为基础开展。数据依赖关系在错误检测、定位及程序调试等领域有着重要应用,基于IR的数据依赖关系计算多采用串行迭代方式,但在应对较大规模IR文件时可扩展性不够理想。对此进行了数据依赖关系计算中指令读写的可并行性挖掘,结合图形处理器并行计算优势,提出一种基于LLVM IR的数据依赖关系并行计算方法DRPC。以IR为输入,采用CPU-GPU双端协同方式实现程序数据依赖关系的高效计算。实验结果表明,针对基准程序集SPEC,DRPC分别在直接及传递数据依赖关系计算上最高获得了3.48×和4.91×的加速比。相似文献

11.

Accelerating sequential programs on commodity multi-core processors

Yuanming Zhang Gang Xiao Takanobu Baba 《Journal of Parallel and Distributed Computing》2014

A recently proposed pipelined multithreading (PMT) technique exhibits wide applicability in parallelizing general sequential programs on multi-core processors. However, significant inter-core communication overhead limits PMT performance and prevents its commercial utilization. A simple and effective clustered pipelined multithreading (CPMT) approach is presented to accelerate sequential programs on commodity multi-core processors. This CPMT technique adopts a clustered communication mechanism that can yield very low average communication overhead by eliminating false sharing as well as reducing communication operation and transit delays in the software-only approach. A single-producer/single-consumer concurrent lock-free clusteredQueue algorithm based on a two-level queue structure is also proposed. The accuracy of CPMT is theoretically demonstrated. The performances of the algorithm and CPMT are evaluated on a commodity AMD Phenom four-core processor. The number of enqueue and dequeue times of the algorithm are 20.8 and 23 cycles given an appropriate parameter, respectively. The speedup of CPMT ranges from 13.1% to 119.8% for typical loops extracted from the SPEC CPU 2000 benchmark suite. 相似文献

12.

异构平台上X86仿真的I/O框架

下载免费PDF全文

方明蒋烈辉赵秋霞董卫宇徐金龙《计算机工程》2011,37(15):246-248

针对异构处理器平台进行X86体系结构仿真的问题,提出一种I/O框架,介绍该框架中的3个主要模块：总线与接口函数的注册与映射,桥芯片中数据结构的设计与函数体的布局,中断信号选择与传递的实现技术。根据不同框架结构,通过运行SPEC2000测试集,证明该I/O框架与其他同类框架相比,性能可提升10%~20%。相似文献

13.

SPEC as a performance evaluation measure

Giladi R. Ahitav N. 《Computer》1995,28(8):33-42

Potential computer system users or buyers usually employ a computer performance evaluation technique only if they believe its results provide valuable information. System Performance Evaluation Cooperative (SPEC) measures are perceived to provide such information and are therefore the ones most commonly used. SPEC measures are designed to evaluate the performance of engineering and scientific workstations, personal vector computers, and even minicomputers and superminicomputers. Along with the Transaction Processing Council (TPC) measures for database I/O performance, they have become de facto industry standards, but do SPEC's evaluation outcomes actually provide added information value? In this article, we examine these measures by considering their structure, advantages and disadvantages. We use two criteria in our examination: are the programs used in the SPEC suite properly blended to reflect a representative mix of different applications, and are they properly synthesized so that the aggregate measures correctly rank computers by performance? We conclude that many programs in the SPEC suites are superfluous; the benchmark size can be reduced by more than 50%. The way the measure is calculated may cause distortion. Substituting the harmonic mean for the geometric mean used by SPEC roughly preserves the measure, while giving better consistency. SPEC measures reflect the performance of the CPU rather than the entire system. Therefore, they might be inaccurate in ranking an entire system. To remedy these problems, we propose a revised methodology for obtaining SPEC measures 相似文献

14.

MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research 总被引：1，自引：0，他引：1

《Computer Architecture Letters》2002,1(1):7-7

Computer architects must determine how tomost effectively use finite computational resources whenrunning simulations to evaluate new architectural ideas.To facilitate efficient simulations with a range of benchmarkprograms, rn have developed the MinneSPEC inputset for the SPEC CPU 2000 benchmark suite. Thisnew workload allows computer architects to obtain simulationresults in a reasonable time using existing sirnulators.While the MinneSPEC workload is derived from thestandard SPEC CPU 2000 warklcad, it is a valid benchmarksuite in and of itself for simulation-based research.MinneSPEC also may be used to run Iarge numbers ofsimulations to find "sweet spots" in the evaluation parameterspace. This small number of promising designpoints subsequently may be investigated in more detailwith the full SPEC reference workload. In the processof developing the MinneSPEC datasets, we quantify itsdifferences in terms of function-level execution patterns,instruction mixes, and memory behaviors compared tothe SPEC programs when executed with the reference inputs.We find that for some programs, the MinneSPECprofiles match the SPEC reference dataset program behaviorvery closely. For other programs, however, theMinneSPEC inputs produce significantly different programbehavior. The MinneSPEC workload has been recognizedby SPEC and is distributed with Version 1.2 andhigher of the SPEC CPU 2000 benchmark suite. 相似文献

15.

龙芯链接后优化器设计与分析

陈瑜朱晓静邹琼刘玲《计算机研究与发展》2006,43(8):1450-1456

链接后优化技术是在编译链接后对整个程序再进行优化的一种技术．它克服了传统编译器优化局限于一个函数、一个模块的缺点,将优化范围扩展到整个程序,并且充分利用了链接后确定的信息．参照Arizona大学为Alpha处理器设计的链接后优化器ALTO,针对龙芯2号处理器的微体系结构和指令集的特征,设计了龙芯上的链接后优化器GLTO（Godson link time optimizer）．GLTO使得龙芯处理器SPEC2000定点程序ref分值提高了9．4％,具有显著的优化效果．分析了主要优化策略的效果和产生的原因,提出了处理器的结构设计中的改进设想,并将GLTO与ALT0做了对比分析．相似文献

16.

A demonstration of repeatable,non‐intrusive measurement of program performance and compiler optimization in Linux using IN‐Tune

W. E. Cohen R. K. Gaede J. B. Rodgers 《Software》2000,30(8):895-906

Collecting accurate program metrics is often complicated by environmental artifacts such as operating system workload, cache operation, and processor configuration. This paper demonstrates the ability of the IN‐Tune system to make accurate and repeatable measurements of program metrics by analyzing the computational workload of programs in the SPEC95 benchmark suite. It shows that metrics which are characteristic of program performance can be collected in both lightly loaded and heavily loaded environments without corruption. The IN‐Tune system accomplishes this by creating unique ‘virtual performance registers’ for each process or kernel thread monitored on an Intel processor. Further, this paper investigates the effect optimization has on the performance of the benchmarks. The results clearly show improvements in the quality of code generated by the compiler when optimizations are performed and that, whereas measurements of time can be misleading, the IN‐Tune measurements are not. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

17.

SPEC CPU2000: measuring CPU performance in the New Millennium 总被引：1，自引：0，他引：1

Henning J.L. 《Computer》2000,33(7):28-35

As computers and software have become more powerful, it seems almost human nature to want the biggest and fastest toy you can afford. But how do you know if your toy is tops? Even if your application never does any I/O, it's not just the speed of the CPU that dictates performance. Cache, main memory, and compilers also play a role. Software applications also have differing performance requirements. So whom do you trust to provide this information? The Standard Performance Evaluation Corporation (SPEC) is a nonprofit consortium whose members include hardware vendors, software vendors, universities, customers, and consultants. SPEC's mission is to develop technically credible and objective component- and system-level benchmarks for multiple operating systems and environments, including high-performance numeric computing, Web servers, and graphical subsystems. On 30 June 2000, SPEC retired the CPU95 benchmark suite. Its replacement is CPU2000, a new CPU benchmark suite with 19 applications that have never before been in a SPEC CPU suite. The article discusses how SPEC developed this benchmark suite and what the benchmarks do 相似文献

18.

面向非一致Cache的任意步长预提升技术 总被引：2，自引：0，他引：2

下载免费PDF全文

吴俊杰杨学军《计算机科学与探索》2010,4(7):577-588

随着微电子工艺的不断进步,片上大容量非一致cache的研究受到广泛关注。提出了一种面向非一致cache的任意步长预提升技术,它能够优化非一致cache中的数据组织,使得即将访问的数据被放置在距离处理器较近的cachebank中,从而降低访存延迟,提升系统性能。详细介绍了任意步长预提升技术的设计,比较了预提升技术与预取技术的差别,并提出了二者的结合技术。通过对来自NPB和SPEC2000的11个基准测试程序在全系统模拟器上的实验评测,发现任意步长预提升技术能够有效减小访存延迟,在访存预测表尺寸为16和32的情况下,系统IPC分别平均增长4.17%和4.91%;在结合预提升和预取技术的情况下,系统IPC分别平均增长8.84%和11.06%。相似文献