共查询到20条相似文献,搜索用时 46 毫秒
1.
经典的闪存转换层(flash translation layer, FTL)地址映射方法DFTL(demand-based FTL)将全局映射信息放在闪存中,仅缓存最近最常使用的映射信息,解决了页级映射策略中映射信息较大和缓存容量有限的矛盾.但是,DFTL没有充分利用负载的空间局部性特点提高缓存命中率;在缓存失效时频繁的脏映射项换出也会导致大量的映射页写操作;此外,它未能优化垃圾回收过程中有效页迁移导致的写放大问题.针对上述不足,提出一种基于缓存映射项重用距离的地址映射方法IRR-FTL(inter-reference recency-based FTL),通过设置映射页缓存槽,充分挖掘负载空间局部性;基于缓存映射项重用距离实现负载自适应的写缓存映射表冷热分区,并分别采取不同的管理策略,减少映射页写操作;此外,实现基于重用距离的冷热数据分离存储,提高垃圾回收效率.通过采用多种负载对该方法进行验证实验,实验结果表明IRR-FTL相比DFTL缓存命中率提高29.1%,平均响应时间降低了27.3%,擦除次数降低了10.7%. 相似文献
2.
3.
核外计算中,由于磁盘I/O操作特点是启动开销大,所以对文件的访问时间占的比例较大。如果能减少读取文件操作的次数则可以大幅度地提高运行效率。数据重用是一种有效的减少I/O操作次数的技术。本文将数据分成几个文件,然后将本次Cholesky分解完毕的文件继续的留在内存缓冲区中。当对下一个文件进行分解时,可用上一个刚分解完的文件进行数据的更新。这样就减少了读取数据的I/O操作次数,从而提高了分解效率。 相似文献
4.
5.
传统的缓存替换算法由于不能适应应用程序的流式访问行为而导致缓存性能不佳.设计基于周期检测的预测方法,分析程序访存重用距离的规律性和流式访问的复杂性,提出用重用距离预测能同时适应简单流和复杂流访问模式的RDP算法.RDP的基本思想是预测重用距离并动态维护重用距离计数,动态调整缓存数据的替换顺序,通过流采样缩减存储开销.实验结果表明,RDP算法能够很好地适应程序中多样化的流访问模式,其总体性能优于LRU算法和DIP算法,在32MB缓存上比传统LRU算法平均减少了27.5%的缓存缺失. 相似文献
6.
综合自然环境建模与仿真是当前军事建模与仿真领域研究的热点与难点,而在不同应用领域里重用(SNE)数据则是发展趋势。首先,从综合环境数据层次入手,研究基于交换机制的数据重用方法的基本原理和处理过程;然后,通过具体分析SEDRIS(Synthetic Environmental Data Representation and Interchange Specification)来阐述上述方法的可行性与实用性;最后,根据SEDRIS应用上的不足,分析数据交换方法今后的发展方向,为建模与仿真领域的数据重用提供了一些新的思路和见解。 相似文献
7.
提出一种适用于短突发信号的CMA盲均衡算法。算法基于数据重用思想,使得CMA盲均衡算法在计算复杂度基本不变的情况下,收敛时间大大减少。详细描述了算法的数学表达,并对其性能进行了分析。最后的仿真数据证明,该算法对于短突发信号的盲均衡确实具有较高的实用价值。 相似文献
8.
综合自然环境建模与仿真是当前军事建模与仿真领域研究的热点与难点,而在不同应用领域的 SNE 数据的重用则是一个发展趋势.首先,从综合环境数据层次入手,研究了基于数据产品机制的数据重用方法的基本原理和处理方法,并分析设计了该数据产品机制的仿真实现方式;然后,通过具体分析 CDB(Common DataBases),来阐述了上述方法的可行性与实用性;最后,根据 CDB 在应用上的不足,分析了数据产品机制今后的发展方向,为建模与仿真领域的数据重用提供了一些新的见解 相似文献
9.
10.
11.
12.
Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers 总被引:1,自引:0,他引:1
In this paper, we propose a compilation scheme to analyze and exploit the implicit reuses of vector register data. According to the reuse analysis, we present a translation strategy that translates the vectorized loops into assembly vector codes with exploitation of vector reuses. Experimental results show that our compilation technique can improve the execution time and traffic between shared memory and vector registers. Techniques discussed here are simple, systematic, and easy to be implemented in the conventional vector compilers or translators to enhance the data locality of vector registers. 相似文献
13.
Multi‐core systems equipped with micro processing units and accelerators such as digital signal processors (DSPs) and graphics processing units (GPUs) have become a major trend in processor design in recent years in attempts to meet ever‐increasing application performance requirements. Open Computing Language (OpenCL) is one of the programming languages that include new extensions proposed to exploit the computing power of these kinds of processors. Among the newly extended language features, the single‐instruction multiple‐data (SIMD) linguistics and vector types are added to OpenCL to exploit hardware features of the accelerators. The addition makes it necessary to consider how traditional compiler data flow analysis can be adopted to meet the optimization requirements of vector linguistics. In this paper, we propose a calculus framework to support the data flow analysis of vector constructs for OpenCL programs that compilers can use to perform SIMD optimizations. We model OpenCL vector operations as data access functions in the style of mathematical functions. We then show that the data flow analysis for OpenCL vector linguistics can be performed based on the data access functions. Based on the information gathered from data flow analysis, we illustrate a set of SIMD optimizations on OpenCL programs. The experimental results incorporating our calculus and our proposed compiler optimizations show that the proposed SIMD optimizations can provide average performance improvements of 22% on x86 CPUs and 4% on advanced micro devices GPUs. For the selected 15 benchmarks, 11 of them are improved on x86 CPUs, and six of them are improved on advanced micro devices GPUs. The proposed framework has the potential to be used to construct other SIMD optimizations on OpenCL programs. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
14.
复用距离已经成为程序cache行为的一种重要度量标准,但高复杂度和可能的内存溢出问题使得其难以应用.本文在引入最大cache大小的基础上提出一种受限的复用距离分析方法.该方法有效地避免了一般复用距离分析可能导致的内存溢出问题,同时使得复用距离分析达到线性时间复杂度.文章通过对一系列整数和浮点程序的实验说明基于该复用距离分析的cache失效率分析的可行性和正确性. 相似文献
15.
Performance metrics and models are prerequisites for scientific understanding and optimization. This paper introduces a new footprint-based theory and reviews the research in the past four decades leading to the new theory. The review groups the past work into metrics and their models in particular those of the reuse distance, metrics conversion, models of shared cache, performance and optimization, and other related techniques. 相似文献
16.
多核处理器上共享缓存使用效率,即程序局部性是影响并行程序性能的关键因素之一。提出了以足迹为基础的局部性理论。介绍了缺失率、重用距离和足迹之间的转化关系,并利用足迹可组合性特征建立了并行程序局部性预测模型。 相似文献
17.
针对奇偶合并排序中存在的巨大数据级并行性潜力,通过将其实现于提供了强大数据级并行性的GPU处理器之上而获取较高的加速比.同时,针对OpenCL不支持各工作组间的工作线程的同步问题,提出两种解决方法,一种是通过主机程序控制迭代过程,从而完全避免所有工作线程对于同步操作的需求;另一种是通过桶划分预处理技术将对于同步操作的需求控制在单个工作组,然后利用单个工作组提供的各工作线程间的同步机制以正确的处理同步操作.实验结果表明,按照本文方法实现的程序性能相对于C++STL库中的sort实现有着明显的提高. 相似文献
18.
The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, “Swan” for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance.
Program summary
Program title: SwanCatalogue identifier: AEIH_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GNU Public License version 2No. of lines in distributed program, including test data, etc.: 17 736No. of bytes in distributed program, including test data, etc.: 131 177Distribution format: tar.gzProgramming language: CComputer: PCOperating system: LinuxRAM: 256 MbytesClassification: 6.5External routines: NVIDIA CUDA, OpenCLNature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An alternative programming model promoted as an industry standard, OpenCL, provides similar capabilities to CUDA and is also supported on non-NVIDIA hardware (including multicore ×86 CPUs, AMD GPUs and IBM Cell processors). The adaptation of a program from CUDA to OpenCL is relatively straightforward but laborious. The Swan tool facilitates this conversion.Solution method:Swan performs a translation of CUDA kernel source code into an OpenCL equivalent. It also generates the C source code for entry point functions, simplifying kernel invocation from the host program. A concise host-side API abstracts the CUDA and OpenCL APIs. A program adapted to use Swan has no dependency on the CUDA compiler for the host-side program. The converted program may be built for either CUDA or OpenCL, with the selection made at compile time.Restrictions: No support for CUDA C++ featuresRunning time: Nominal 相似文献19.
20.
在分布式并行机上,数据布局的质量极大的影响着应用程序的执行性能,以往的研究一般将自动数据布局优化问题近似分解为数据对准优化和数据分布优化两步来解决,且对两者的结合只研究了一维的情况,在相关研究工作的基础上,在多维情况下将数据对准优化和数据分布优化结合在一个模型当中,提出了一个数据对准优化与数据分布优化统一的多维静态数据布局模型,避免了采用启发式策略,从而更加精确地描述了自动数据布局优化问题,同时给 相似文献