排序方式: 共有68条查询结果,搜索用时 140 毫秒
1.
2.
一种DSP折反射全景图像实时展开方法 总被引:1,自引:0,他引:1
为了提高折反射全景图像展开速度,可采用查表来取代耗时的计算,但查找表需要大量的存储空间,于是采用分块预取方法。通过分块,将待处理目标图像块对应的查找表和原始图像分块预先存入DSP片内存储器并完成展开,可以降低存取访问开销,加快查表展开。实验结果表明,分块预取方法在对1024×768分辨率的原始全景图像展开为1152×256分辨率的目标全景图像时,展开数据可达每秒97帧,与不采用分块预取策略的查表展开方法相比,此方法要快近20倍。 相似文献
3.
存储器访问延迟已经成为高性能微处理器性能发挥的关键障碍之一。预取是隐藏访存延迟的重要手段。其通常做法是显式执行指令将数据在实际使用前先和取到离微处理器附近的地方,但是这种方法增加了程序设计人员的负担。本文提出了一种硬件预取方法,即在存储控制器中设计一个VPFB机构用来隐藏访存延迟,并通过模拟分析了它的效果。 相似文献
4.
Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users’ satisfaction.The single most effective way to improve application launch performance is to replace a hard disk drive (HDD) with a solid state drive (SSD),which has recently become affordable and popular.A natural question is then whether or not to replace the traditional HDD-aware application launchers with a new SSD-aware optimizer.We address this question by analyzing the inefficiency of the HDD-aware application launchers on SSDs and then proposing a new SSD-aware application prefetching scheme,called the Fast Application STarter (FAST).The key idea of FAST is to overlap the computation (CPU) time with the SSD access (I/O) time during an application launch.FAST is composed of a set of user-level components and system debugging tools provided by Linux OS (operating system).Hence,FAST can be easily deployed in any recent Linux versions without kernel recompilation.We implement FAST on a desktop PC with an SSD running Linux 2.6.32 OS and evaluate it by launching a set of widely-used applications,demonstrating an average of 28% reduction of application launch time as compared to PC without a prefetcher. 相似文献
5.
非透明PCI-PCI桥的性能是的影响通用语音处理平台的关键因素之一。文中分析了影响PCI-PCI桥性能的各种因素,并提出了如何提高系统性能的方法。 相似文献
6.
Recently, NAND flash memory has emerged as a next generation storage device because it has several advantages, such as low power consumption, shock resistance, and so on. However, it is necessary to use a flash translation layer (FTL) to intermediate between NAND flash memory and conventional file systems because of the unique hardware characteristics of flash memory. This paper proposes a new clustered FTL (CFTL) that uses clustered hash tables and a two‐level software cache technique. The CFTL can anticipate consecutive addresses from the host because the clustered hash table uses the locality of reference in a large address space. It also adaptively switches logical addresses to physical addresses in the flash memory by using block mapping, page mapping, and a two‐level software cache technique. Furthermore, anticipatory I/O management using continuity counters and a prefetch scheme enables fast address translation. Experimental results show that the proposed address translation mechanism for CFTL provides better performance in address translation and memory space usage than the well‐known NAND FTL (NFTL) and adaptive FTL (AFTL). 相似文献
7.
Data deduplication has been widely utilized in large-scale storage systems, particularly backup systems. Data deduplication systems typically divide data streams into chunks and identify redundant chunks by comparing chunk fingerprints. Maintaining all fingerprints in memory is not cost-effective because fingerprint indexes are typically very large. Many data deduplication systems maintain a fingerprint cache in memory and exploit fingerprint prefetching to accelerate the deduplication process. Although fingerprint prefetching can improve the performance of data deduplication systems by leveraging the locality of workloads, inaccurately prefetched fingerprints may pollute the cache by evicting useful fingerprints. We observed that most of the prefetched fingerprints in a wide variety of applications are never used or used only once, which severely limits the performance of data deduplication systems. We introduce a prefetch-aware fingerprint cache management scheme for data deduplication systems (PreCache) to alleviate prefetch-related cache pollution. We propose three prefetch-aware fingerprint cache replacement policies (PreCache-UNU, PreCache-UOO, and PreCache-MIX) to handle different types of cache pollution. Additionally, we propose an adaptive policy selector to select suitable policies for prefetch requests. We implement PreCache on two representative data deduplication systems (Block Locality Caching and SiLo) and evaluate its performance utilizing three real-world workloads (Kernel, MacOS, and Homes). The experimental results reveal that PreCache improves deduplication throughput by up to 32.22% based on a reduction of on-disk fingerprint index lookups and improvement of the deduplication ratio by mitigating prefetch-related fingerprint cache pollution. 相似文献
8.
9.
文章[1]中提出了数组之间的数据融合优化方法,并以IA-32服务器为平台测试了数据融合优化的效果。测试结果表明,在IA-32机器上,数据融合优化在性能代价模型的控制下,能较好地改善具有非连续数据访问特征的应用程序的CACHE利用率。那么,在新一代体系结构IA-64平台上,数据融合优化的效果如何呢?该文分别以IntelIA-32服务器和HPITANIUM服务器为平台,用IntelFORTRAN编译器ifc和efc及自由软件编译器g95分别编译并运行数据融合优化变换前后的程序,获得两种平台上的执行时间及相关的性能数据。测试结果表明,源程序级的数据融合优化不能很好地与IA-64平台上的EFC编译器高级优化配合工作,在O3级优化开关控制下,优化效果是负值。此测试结果进一步表明,编译高级优化如数据预取、循环变换和数据变换等各种优化必须结合体系结构的特点统筹考虑,才能取得好的全局优化效果。该文为研究各种面向IA-32体系结构的编译优化算法在IA-64体系结构上的性能可移植性优化起到抛砖引玉的作用。 相似文献
10.
在含Cache的处理器中,代码排布和指令预取是减少取指延迟的常用技术.代码排布侧重研究代码执行的空间相对位置,指令预取则关注于代码执行的时间相对关系.片上Trace技术非入侵地获得程序的执行路径及时间信息,将代码执行的时空关系联系起来,因此为排布技术和预取技术的结合使用提供了基础.基于YHFT-DSP平台,利用程序运行的周期行为特性设置预取,利用VLIW结构处理器的空闲单元执行预取指令,提出以增加预取容限为目标的函数级代码排布方法.实验结果表明,该方法能有效预取并减少指令Cache失效. 相似文献