首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
尹孟嘉  许先斌  熊曾刚  张涛 《计算机科学》2015,42(12):13-17, 22
性能评价和优化是设计高效率并行程序必不可少的重要工作,存储系统的性能高低直接影响到处理器的整体性能。利用GPGPU-Sim对GPU的存储层次结构进行了模拟,找出了SM数量与存储控制器数量之间最佳配置关系。矩阵乘法是科学计算领域中的基本组成部分,是一种具有计算和访存密集特点的典型应用,其性能是GPU高性能计算的一个重要指标。性能模型作为并行系统性能评价的新的技术解决方案,具有许多其它性能评价方法无法比拟的优势。建立了一个性能模型,模型通过对指令流水线、共享存储器访存、全局存储器访存进行定量分析,找到了程序运行瓶颈,提高了执行速度。实验证明,该模型具有实用性,并有效地实现了矩阵乘法的优化。  相似文献   

2.
网格重排序是提升流体力学CPU和GPU并行计算效率的重要手段之一。对于非结构网格,由于其数据存储无规律,数据的间接访问会导致访存延迟,尤其是在GPU并行计算时,数据的间接访问将引起内存的非对齐访问,放大了访存延迟的影响。对此,采用Reverse Cuthill-Mckee网格重排序方法优化了非结构网格的数据局部性,并设计了一种面向编号重排序方法。算例测试表明,网格重排序不影响最终计算结果。对比分析了网格重排序对非结构求解器在CPU和GPU上的性能影响:对CPU计算,可以使部分热点函数运行时间降低约20%,整体运行时间降低15%~20%;对GPU计算,大部分热点函数运行时间可降低35%~60%,程序整体运行时间降低约40%。  相似文献   

3.
申威众核片上多级存储层次是缓解众核“访存墙”的重要结构.完全由软件管理的SPM结构和片上RMA通信机制给应用性能提升带来很多机会,但也给应用程序开发优化与移植提出了很大挑战.为充分挖掘片上存储层次特点提升应用程序性能,同时减轻用户编程优化负担,本文提出了一种多级存储层次访存与通信融合的编译优化方法.该方法首先设计了融合编译指示,将程序高层信息传递给编译器.其次构建了编译优化收益模型并设计了启发式循环优化方案迭代求解框架,并由编译器完成循环优化方案的求解和优化代码的变换.通过编译生成的DMA和RMA批量数据传输操作,将较低存储层次空间中高访问延迟的核心数据批量缓冲进低访问延迟的更高存储层次空间中.在三个典型测试用例上进行了优化实验测试与分析,结果表明本文所提出的优化在性能上与手工优化相当,较未优化版程序性能有显著提升.  相似文献   

4.
随着GPU的发展,其计算能力和访存带宽都超过了CPU,在GPU上进行通用计算也变得越来越流行,这样就构成了CPU-GPGPU的新型异构体系结构。虽然这种新型体系结构表现出了强大的性能优势并受到了学术界和产业界的广泛关注,但如何更好地在这种结构上高效地编写和运行程序仍然存在很大的挑战。本文综述了针对这一体系结构现有的可编程性技术、可靠性技术和低功耗技术,并结合这些技术展望了CPU-GPGPU这种异构系统的发展趋势。  相似文献   

5.
王桂彬 《计算机学报》2012,35(5):979-989
作为众核体系结构的典型代表,GPU(Graphics Processing Units)芯片集成了大量并行处理核心,其功耗开销也在随之增大,逐渐成为计算机系统中功耗开销最大的组成部分之一,而软件低功耗优化技术是降低芯片功耗的有效方法.文中提出了一种模型指导的多维低功耗优化技术,通过结合动态电压/频率调节和动态核心关闭技术,在不影响性能的情况下降低GPU功耗.首先,针对GPU多线程执行模型的特点,建立了访存受限程序的功耗优化模型;然后,基于该模型,分别分析了动态电压/频率调节和动态核心关闭技术对程序执行时间和能量消耗的影响,进而将功耗优化问题归纳为一般整数规划问题;最后,通过对9个典型GPU程序的评测以及与已有方法的对比分析,验证了该文提出的低功耗优化技术可以在不影响性能的情况下有效降低芯片功耗.  相似文献   

6.
当前众核已成为构建高性能计算(HPC)超级计算机的主流微处理器架构,为HPC领域E级超算提供强大的算力。随着众核处理器片上集成的运算核心数量不断增加,众多核心对存储资源竞争愈加激烈,“访存墙”问题越来越突出。众核片上存储层次是缓解“访存墙”问题并帮助HPC应用更好地发挥众核处理器的计算优势以提升实际应用性能的重要结构。众核片上存储层次的设计对众核片上系统性能、功耗和面积具有重要影响,是众核结构设计中的重要环节,也是业界的研究热点。由于众核芯片发展历史和片上微体系结构设计技术的不同,以及所面向的应用领域需求不同等原因,目前的HPC主流众核片上存储层次结构并不单一,但从横向比较和各处理器自身纵向发展趋势,以及从HPC与数据科学、机器学习不断融合发展带来的应用需求变化来看,SPM+Cache的混合结构最可能成为今后HPC E级超算系统众核处理器片上存储层次设计的主流选择。在面向E级计算的软件和算法层面,开展针对众核存储层次特点的设计与优化,可以帮助HPC应用更好地发挥众核处理器的计算优势,从而有效提升实际应用性能,因此面向众核片上存储层次特点的软件及算法设计与优化技术也是业界的研究热点之一。...  相似文献   

7.
以相变存储器(PCM)为代表的新型非易失存储器,具有存储密度高和静态功耗低等传统动态随机存取存储器(DRAM)不具备的优势,但是过长的写操作延时会严重影响访存的性能.设计了基于PCM的图形处理器(GPU)中的存储系统.仿真结果显示,GPU程序中的内存写请求分布极不均匀,对少量的内存地址有非常高的访问频率.面向访存分布不均匀特点的专用缓冲单元设计,能够有效地存储频繁访问的内存数据,从而减少对PCM的访问次数,消除过长的写操作延时对系统性能的负面影响.GPU仿真器上的结果显示,基于缓冲单元的PC以存储系统能够有效地提高GPU的运算性能.  相似文献   

8.
OpenCL是面向异构计算平台的通用编程框架,然而由于硬件体系结构的差异,如何在平台间功能移植的基础上实现性能移植仍是有待研究的问题。当前已有算法优化研究一般只针对单一硬件平台,它们很难实现在不同平台上的高效运行。在分析了不同GPU平台底层硬件架构的基础上,从Global Memory的访存效率、GPU计算资源的有效利用率及其硬件资源的限制等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响;并在此基础上实现了基于OpenCL的拉普拉斯图像增强算法。实验结果表明,优化后的算法在不考虑数据传输时间的前提下,在AMD和NVIDIA GPU上都取得了3.7~136.1倍、平均56.7倍的性能加速,优化后的kernel比NVIDIA NPP库中相应函数也取得了12.3%~346.7%、平均143.1%的性能提升,验证了提出的优化方法的有效性和性能可移植性。  相似文献   

9.
数据预取是为缓解微处理器与DRAM之间速度差异而出现的隐藏访存延迟的方法。当前Intel各系列处理器都采用多种预取机制来加速数据和代码向Cache的移动,从而提升程序的性能。通过对Intel64体系结构存储层次的分析,剖析了X86/X64体系的数据预取机制,包括硬件预取和软件预取,并且分析了编译器对软件预取机制的支持。最后测试了Intel64体系结构数据预取对科学计算程序中紧嵌套循环性能的影响,总结出了影响数据预取有效性的几个因素。此项工作对在Intel平台上进行循环数组预取优化有指导意义。  相似文献   

10.
针对GPU并行计算领域缺少精确的性能分析模型和有针对性的性能优化方法,提出一种基于GPU的并行计算性能定量分析模型,其通过对指令流水线、共享存储器访存、全局存储器访存的性能建模,来定量分析并行程序,帮助程序员找到程序运行瓶颈,进行有效的性能优化。实验部分通过3个具有代表性的实际应用(稠密矩阵乘法、三对角线性方程组求解、稀疏矩阵矢量乘法)的性能分析证明了该模型的实用性,并有效地实现了算法的优化。  相似文献   

11.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

12.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

13.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

14.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

15.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

16.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

17.
正The demands of a rapidly advancing technology for faster and more accurate controllers have always had a strong influence on the progress of automatic control theory.In recent years control problems have been arising with increasing frequency in widely different areas,which cannot be addressed using conventional control techniques.The principal reason for this is the fact that a highly competitive economy is forcing systems to operate in regimes where  相似文献   

18.
正Aim The Journals of Zhejiang University-SCIENCE(A/B/C)areedited by the international board of distinguished Chinese andforeign scientists,and are aimed to present the latest devel-opments and achievements in scientific research in China andoverseas to the world’s scientific circles,especially to stimulateand promote academic exchange between Chinese and for-eign scientists everywhere.  相似文献   

19.
The relative concentrations of different pigments within a leaf have significant physiological and spectral consequences. Photosynthesis, light use efficiency, mass and energy exchange, and stress response are dependent on relationships among an ensemble of pigments. This ensemble also determines the visible characteristics of a leaf, which can be measured remotely and used to quantify leaf biochemistry and structure. But current remote sensing approaches are limited in their ability to resolve individual pigments. This paper focuses on the incorporation of three pigments—chlorophyll a, chlorophyll b, and total carotenoids—into the LIBERTY leaf radiative transfer model to better understand relationships between leaf biochemical, biophysical, and spectral properties.Pinus ponderosa and Pinus jeffreyi needles were collected from three sites in the California Sierra Nevada. Hemispheric single-leaf visible reflectance and transmittance and concentrations of chlorophylls a and b and total carotenoids of fresh needles were measured. These data were input to the enhanced LIBERTY model to estimate optical and biochemical properties of pine needles. The enhanced model successfully estimated reflectance (RMSE = 0.0255, BIAS = 0.00477, RMS%E = 16.7%), had variable success estimating transmittance (RMSE = 0.0442, BIAS = 0.0294, RMS%E = 181%), and generated very good estimates of carotenoid concentrations (RMSE = 2.48 µg/cm2, BIAS = 0.143 µg/cm2, RMS%E = 20.4%), good estimates of chlorophyll a concentrations (RMSE = 10.7 µg/cm2, BIAS = − 0.992 µg/cm2, RMS%E = 21.1%), and fair estimates of chlorophyll b concentrations (RMSE = 7.49 µg/cm2, BIAS = − 2.12 µg/cm2, RMS%E = 43.7%). Overall root mean squared errors of reflectance, transmittance, and pigment concentration estimates were lower for the three-pigment model than for the single-pigment model. The algorithm to estimate three in vivo specific absorption coefficients is robust, although estimated values are distorted by inconsistencies in model biophysics. The capacity to invert the model from single-leaf reflectance and transmittance was added to the model so it could be coupled with vegetation canopy models to estimate canopy biochemistry from remotely sensed data.  相似文献   

20.
This article discusses the history and design of the special versions of the bombe key-finding machines used by Britain’s Government Code & Cypher School (GC&CS) during World War II to attack the Enigma traffic of the Abwehr (the German military intelligence service). These special bombes were based on the design of their more numerous counterparts used against the traffic of the German armed services, but differed from them in important ways that highlight the adaptability of the British bombe design, and the power and flexibility of the diagonal board. Also discussed are the changes in the Abwehr indicating system that drove the development of these machines, the ingenious ways in which they were used, and some related developments involving the bombes used by the U.S. Navy’s cryptanalytic unit (OP-20-G).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号