期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Linked instruction caches for enhancing power efficiency of embedded systems

Chang-Jung Ku Ching-Wen Chen An Hsia Chun-Lin Chen 《Microprocessors and Microsystems》2014

The power consumed by memory systems accounts for 45% of the total power consumed by an embedded system, and the power consumed during a memory access is 10 times higher than during a cache access. Thus, increasing the cache hit rate can effectively reduce the power consumption of the memory system and improve system performance. In this study, we increased the cache hit rate and reduced the cache-access power consumption by developing a new cache architecture known as a single linked cache (SLC) that stores frequently executed instructions. SLC has the features of low power consumption and low access delay, similar to a direct mapping cache, and a high cache hit rate similar to a two way-set associative cache by adding a new link field. In addition, we developed another design known as a multiple linked caches (MLC) to further reduce the power consumption during each cache access and avoid unnecessary cache accesses when the requested data is absent from the cache. In MLC, the linked cache is split into several small linked caches that store frequently executed instructions to reduce the power consumption during each access. To avoid unnecessary cache accesses when a requested instruction is not in the linked caches, the addresses of the frequently executed blocks are recorded in the branch target buffer (BTB). By consulting the BTB, a processor can access the memory to obtain the requested instruction directly if the instruction is not in the cache. In the simulation results, our method performed better than selective compression, traditional cache, and filter cache in terms of the cache hit rate, power consumption, and execution time. 相似文献

2.

An on-chip instruction cache design with one-bit tag for low-power embedded systems

Ji Gu^{Author Vitae} Hui Guo Author VitaePatrick LiAuthor Vitae 《Microprocessors and Microsystems》2011,35(4):382-391

On-chip instruction cache is a potential power hungry component in embedded systems due to its large chip area and high access-frequency. Aiming at reducing power consumption of the on-chip cache, we propose a Reduced One-Bit Tag Instruction Cache (ROBTIC), where the cache size is judiciously reduced and the cache tag field only contains the least significant bit of the full-tag. We develop a cache operational control scheme for ROBTIC so that with the one-bit cache tag, the program locality can still be efficiently exploited. For applications where most of the memory accesses are localized, our cache can achieve similar performance as a traditional full-tag cache; however, the power consumption of the cache can be significantly reduced due to the much smaller cache size, narrower tag array (just one bit), and tinier tag comparison circuit being used. Experiments on a set of benchmarks implemented in CMOS 180 nm process technology demonstrate that our proposed design can reduce up to 27.3% dynamic power consumption and 30.9% area of the traditional cache when the cache size is fixed at 32 instructions, which outperforms the existing partial-tag based cache design. With the cache size customization, a further 47.8% power saving can be achieved. Our experimental results also show that when implemented in the deep sub-micron technologies where the leakage power is not ignorable, our design is still efficient - a coherent power saving trend (about 22%) has been observed for technologies from 130 nm down to 65 nm. 相似文献

3.

功耗仿真器HMSim的I/O接口功耗仿真模块设计与实现

周雪梅郭兵沈艳王继禾伍元胜《计算机应用》2010,30(7):1987-1990

在目前全球倡导“低碳经济”的背景下,嵌入式软件功耗已成为嵌入式系统设计的重要瓶颈,利用仿真技术实现嵌入式软件功耗的度量与实验是一种重要的开发手段。HMSim是一个高精度的指令级嵌入式软件功耗仿真器,介绍了HMSim的总体设计以及指令集仿真器结构,详细设计UART和LCD控制器等I/O接口的功能仿真模型,提出一种I/O接口功耗统计方法,最后通过运行基于μC/OS-II RTOS的应用程序,验证HMSim I/O接口功耗仿真模块的设计实现正确性。相似文献

4.

嵌入式系统源程序级软件能耗建模与分析

叶珊郭荣佐黄君《计算机应用研究》2017,34(10)

针对嵌入式系统能耗对各种嵌入式设备工作时长的影响,本文从系统指令级到源程序级的软件能耗考虑,首先通过分析设备源程序级语句的相关特征,基于源程序语句的指令能耗,提出一种针对源程序级的能耗模型,然后基于模型分析对五个经典算法的源程序中不同类别语句进行能耗优化,最后分别对五组经典算法优化前后的能耗比较。实验表明,本模型使得优化后的源程序能耗降低了9.46%-50.29%,达到了降低嵌入式系统软件能耗的目的。相似文献

5.

FPGA emulation methodology for fast and accurate power estimation of embedded processors

《Journal of Systems Architecture》2017

Early estimation of application-specific power consumption has become one of the major constraints of modern ASIC design. While in early stages of the design process precise power consumption can only be obtained from very time consuming gate-level (GTL) simulation, power estimation methodologies aim to reduce computational overhead by deriving models to approximate power consumption on higher levels. This work presents an FPGA accelerated power estimation methodology for programmable processors based on a hybrid functional level (FLPA) and instruction level power analysis (ILPA) that can be mapped onto an FPGA together with the functional emulation. It enables fast and accurate estimation of application-specific power consumption and energy per task which is crucial for power-aware design of embedded processor architectures. The approach allows both hardware and software designers to optimize their implementations not only for processing performance but also for power efficiency. The power emulation methodology and considerations for the FPGA implementation of the power estimation is described in detail. Model validation against GTL power simulation and results are given for a typical embedded RISC processor and a commercial-grade Application Specific Instruction Set Processor (ASIP). Power consumption models yield fast and accurate power estimation with a %MAE of less than 9% and NRMSE of less than 7% enabling co-optimization of both hardware and software with respect to power consumption in early design stages. 相似文献

6.

A tagless cache design for power saving in embedded systems

Ching-Wen Chen Chang-Jung Ku 《The Journal of supercomputing》2012,62(1):174-198

In embedded systems, cache is commonly used to improve system performance. However, the cache consumes a large amount of power, and among the components of the cache memory, tag comparisons consume the most amount of power. Therefore, how to design a cache that does not consume so much power when comparing tags and that has a high hit ratio is an important challenge. In this paper, we propose a Tagless Instruction Cache, called TL-IC, that does not perform tag comparisons in order to save power in embedded systems. To guarantee that an instruction fetched from TL-IC is the desired instruction, instead of cache lines being used, the basic blocks of programs are placed into TL-IC. In addition, to utilize TL-IC as much as possible in order to save the most amount of power and to take into account the general-purpose and special-purpose applications, both the static allocation and the dynamic allocation of basic blocks are used to select the frequently executed basic blocks of programs in TL-IC. With a high utilization of TL-IC that does not perform tag comparisons, the power consumed in fetching instructions can be efficiently reduced. In the simulation results, we show and compare the power consumption of our proposed TL-IC, L0 cache, Linebuffer, and TH-IC. 相似文献

7.

基于预取和缓存原理的片上Flash加速控制器设计

蒋进松黄凯陈辰王钰博严晓浪《计算机工程与科学》2016,38(12):2381-2391

为了提高片上Flash在嵌入式应用中的读取速度,提出了一种基于预取和缓存原理的片上Flash加速控制器。该控制器包括预取缓存和高速缓存两种加速方案。其中预取缓存方案采用位宽扩展和预取技术加速顺序指令的读取,并采用分支缓存存储非顺序指令,降低由非顺序指令造成的预取缺失代价;而高速缓存方案采用组相联和路预测技术,提高指令重用率,减少Flash访问次数,降低系统功耗。针对不同的应用场景,两种加速方案既可通过寄存器来静态切换,也可通过软件流程来自适应动态切换,从而获得最佳的读取速度提升。多项基准程序的测试结果表明了所提出的片上Flash加速控制器在性能和功耗优化上的可行性和高效性。相似文献

8.

Data memory power optimization and performance exploration of embedded systems for implementing motion estimation algorithms

《Real》2003,9(6):371-386

A memory power optimization and performance exploration methodology based on high-level (C language) code transformations that allows the system designer to explore various data memory power, data memory area and performance trade-offs early in the design process of embedded multimedia systems is introduced. This exploration strategy is introduced for both single and multiprocessor environments. The latter requires partitioning of the application. After employing software transformations, the experimental results, obtained using four well-known motion estimation kernels provide an insight on the performance and energy consumption trade-offs, comparing memory hierarchies for the ARM programmable core and prove the validity of the proposed approach. 相似文献

9.

Compressed tag architecture for low-power embedded cache systems

Jong Wook Kwak Young Tae Jeon 《Journal of Systems Architecture》2010,56(9):419-428

Processors in embedded systems mostly employ cache architectures in order to alleviate the access latency gap between processors and memory systems. Caches in embedded systems usually occupy a major fraction of the implemented chip area. The power dissipation of cache system thus constitutes a significant fraction of the power dissipated by the entire processor in embedded systems. In this paper, we propose the compressed tag architecture to reduce the power dissipation of the tag store in cache systems. We introduce a new tag-matching mechanism by using a locality buffer and a tag compression technique. The main power reduction feature of our proposal is the use of small tag space matching instead of full tag matching, with modest additional hardware costs. The simulation results show that the proposed model provides a power and energy-delay product reduction of up to 27.8% and 26.5%, respectively, while still providing a comparable level of system performance to regular cache systems. 相似文献

10.

Optimizing a combined WCET-WCEC problem in instruction fetching for real-time systems

《Journal of Systems Architecture》2013,59(9):667-678

In real-time systems, time is usually so critical that other parameters such as energy consumption are often not even considered. However, optimizing the worst energy consumption case can be a key factor in systems with severe power-supply limitations. In this paper we study several memory architectures using combined time and energy optimization models for real-time multitasking systems. Each task is modeled using Lock-MS, a method to optimize the WCET of a task, with an added set of constraints to model in the same way the WCEC (worst case energy consumption). Our tested hardware components focus on instruction fetching, including a lockable cache, a line buffer and a sequential prefetch buffer. We test a variety of instruction fetch alternatives optimizing time and energy consumption. Our results show that the accuracy of the estimation of the number of context switches in the worst case may affect very much the resulting WCEC (up to 8 times in our experiments) and that optimizing the WCEC may provide similar execution times than optimizing the WCET, with up to 5 times less energy consumption Additionally optimization functions combining WCET and WCEC with different weights show very interesting WCET-WCEC trade-offs. This confirms that methodologies testing such optimizations at design time could be very helpful to provide a precise system set-up. 相似文献

11.

COMPASS – A tool for evaluation of compression strategies for embedded processors

Sreejith K. Priti 《Journal of Systems Architecture》2008,54(10):995-1003

A major concern of embedded system architects is the design for low power. We address one aspect of the problem in this paper, namely the effect of executable code compression. There are two benefits of code compression – firstly, a reduction in the memory footprint of embedded software, and secondly, potential reduction in memory bus traffic and power consumption. Since decompression has to be performed at run time it is achieved by hardware. We describe a tool called COMPASS which can evaluate a range of strategies for any given set of benchmarks and display compression ratios. Also, given an execution trace, it can compute the effect on bus toggles, and cache misses for a range of compression strategies. The tool is interactive and allows the user to vary a set of parameters, and observe their effect on performance. We describe an implementation of the tool and demonstrate its effectiveness. To the best of our knowledge this is the first tool proposed for such a purpose. 相似文献

12.

基于路访问轨迹的指令高速缓存低功耗策略

冷冰严晓浪孟建熠葛海通《传感器与微系统》2012,31(9):14-17

现代嵌入式处理器中指令高速缓存的功耗十分显著,对此提出一种基于路访问轨迹的组相联指令高速缓存的低功耗策略,利用改进的指令高速缓存和转移目标缓存建立和维护运行时指令高速缓存的路访问轨迹来减少指令高速缓存命中检测及无关路访问.进一步提出了基于跨行访问前驱指针、转移前驱状态、转移前驱指针及转移目标索引的路访问轨迹信息维护策略用以降低信息重建的频度,从而更有效地利用已建立的路访问轨迹信息.实验结果表明:采用优化后的路访问轨迹策略的指令高速缓存的标志存储器访问和数据存储器访问分别降低到传统指令高速缓存的3.60％和27.70％. 相似文献

13.

A comparison of instruction memories from the WCET perspective

《Journal of Systems Architecture》2014,60(5):452-466

Hard real-time systems demand high performance in combination with a timing predictable program execution. The performance of a system in the worst-case, represented by its worst case execution time (WCET), highly depends on the design of the memory subsystem. In this paper we focus on the instruction memory hierarchy and quantify the impact of different on-chip instruction memories on the worst-case timing of the system. A function-based dynamic instruction scratchpad (D-ISP), an instruction cache, and static instruction scratchpads using basic-block-based and function-based assignment algorithms are compared. Therefore, we provide WCET bounds for systems with different on-chip instruction memories and different off-chip memory timings.We show that for small memory sizes a static instruction scratchpad usually outperforms the other memories in terms of the WCET estimate. However, with increasing memory sizes the D-ISP is able to reach lower WCET bounds. An instruction cache can only provide lower WCET bounds than the other memories, if no suitable assignment for the static instruction scratchpads is found or if the D-ISP suffers from thrashing or frequently loads unused code. 相似文献

14.

基于指令聚类与指令调度的嵌入式软件功耗优化研究

陈嘉董渊杨阳戴桂兰王生原《小型微型计算机系统》2006,27(1):175-179

选用指令级能耗评估模型，提出和验证了一种基于指令聚类与指令调度的功耗优化方案．该方案采用深度优先算法搜索局部最优解，挑选出能耗较小的一种指令序列．又兼顾测试工作量与精确度，将能耗相似的指令归入同类，有效降低了获取相邻指令切换能耗参数的工作量过大这一问题．通过分析基于SimpleSealar／Wattch模拟器的实验结果，指出仅用指令调度技术进行指令级功耗优化，其效果有限，为了提高优化效率，必须进行更高级别的功耗评估与优化．相似文献

15.

The ChARM tool for tuning embedded systems

Prete C.A. Graziano M. Lazzarini F. 《Micro, IEEE》1997,17(4):67-76

ChARM is a simulation tool for tuning ARM-based embedded systems that include cache memories. ChARM provides a parametric, trace-driven simulation for tuning system configuration. A designer can observe performance while varying the timing, the architectural features, and the management policies of the system components. Designers can therefore evaluate the execution time of the program, the time spent in memory accesses, miss ratio, code miss ratio, and data miss ratio, and the number of burst-read operations. They can also evaluate the number of write operations for write-through cache models and burst-write operations for copy-back cache models. finally, ChARM's program locality analysis illustrates the sequentiality, temporality, and loops of a program in easy-to-read three dimensional graphs. These graphs, together with the graphs showing the distribution of the replacement conflicts in cache, help designers understand how a program works and how it stresses the memory hierarchy 相似文献

16.

嵌入式系统软硬件协同模拟验证环境设计与实现 总被引：1，自引：1，他引：1

严迎建王世好刘明业《计算机工程》2004,30(9):45-47

介绍了一个嵌入式系统软硬件协同模拟验证环境，该环境以指令集模拟器和事件驱动硬件模拟器为基本框架，并由总线调度模型和总线界面模型提供软硬件模拟交互界面。重点讨论该环境中软硬件模拟器之间的接口设计与实现方法，最后给出一个嵌入式系统协同验证的应用实例。相似文献

17.

SEProf: A high-level software energy profiling tool for an embedded processor enabling power management functions

Shiao-Li Tsao Jian Jhen Chen 《Journal of Systems and Software》2012,85(8):1757-1769

Energy efficiency has become one of the most important design issues for embedded systems. To examine the power consumption of an embedded system, an energy profiling tool is highly demanded. Although a number of energy profiling tools have been proposed, they are not directly applicable to the embedded processors with power management functions that are widely utilized in battery-operated embedded systems to reduce power consumption. Hence, this study presents a high-level energy profiling tool, called SEProf, that estimates the energy consumption of an embedded system running multithread software and a multitasking operating system (OS) that supports power management functions. This study implements the proposed SEProf in Linux 2.6.19 and evaluates its performance on an ARM11 MPCore processor. Experimental results demonstrate that the proposed tool can provide accurate energy profiling results with a low profiling overhead. 相似文献

18.

A Novel Memory Structure for Embedded Systems: Flexible Sequential and Random Access Memory

下载免费PDF全文

Ying Chen Karthik Ranganathan Vasudev V. Pai DavidJ. Lilja and Kia Bazargan 《计算机科学技术学报》2005,20(5):596-606

The on-chip memory performance of embedded systems directly affects the system designers＇ decision about how to allocate expensive silicon area. A novel memory architecture, flexible sequential and random access memory （FSRAM）, is investigated for embedded systems. To realize sequential accesses, small “links”are added to each row in the RAM array to point to the next row to be prefetched. The potential cache pollution is ameliorated by a small sequential access buyer （SAB）. To evaluate the architecture-level performance of FSRAM, we ran the Mediabench benchmark programs on a modified version of the SimpleScalar simulator. Our results show that the FSRAM improves the performance of a baseline processor with a 16KB data cache up to 55%, with an average of 9%; furthermore, the FSRAM reduces 53.1% of the data cache miss count on average due to its prefetching effect. We also designed RTL and SPICE models of the FSRAM, which show that the FSRAM significantly improves memory access time, while reducing power consumption, with negligible area overhead. 相似文献

19.

Two-level caches tuning technique for energy consumption in reconfigurable embedded MPSoC

《Journal of Systems Architecture》2013,59(8):656-666

In order to meet the ever-increasing computing requirement in the embedded market, multiprocessor chips were proposed as the best way out. In this work we investigate the energy consumption in these embedded MPSoC systems. One of the efficient solutions to reduce the energy consumption is to reconfigure the cache memories. This approach was applied for one cache level/one processor architecture, but has not yet been investigated for multiprocessor architecture with two level caches. The main contribution of this paper is to explore two level caches (L1/L2) multiprocessor architecture by estimating the energy consumption. Using a simulation platform, we first built a multiprocessor architecture, and then we propose a new algorithm that tunes the two-level cache memory hierarchy (L1 and L2). The tuning caches approach is based on three parameters: cache size, line size, and associativity. To find the best cache configuration, the application is divided into several execution intervals. And then, for each interval, we generate the best cache configuration. Finally, the approach is validated using a set of open source benchmarks; Spec 2006, Splash-2, MediaBench and we discuss the performance in terms of speedup and energy reduction. 相似文献

20.

Optimizing CAM-based instruction cache designs for low-power embedded systems

Juan L. Alexander V. 《Journal of Systems Architecture》2008,54(12):1155-1163

Energy consumption and power dissipation are important concerns in the design of embedded systems and they will become even more crucial with finer process geometry, higher frequencies, deeper pipelines and wider issue designs. In particular, the instruction cache consumes more energy than any other processor module, especially with commonly used highly associative CAM-based implementations.Two energy-efficient approaches for highly associative CAM-based instruction cache designs are presented by means of using a segmented wordline and a predictor-based instruction fetch mechanism. The latter is based on the fact that not all instructions in a given I-cache fetch are used due to taken branches. The proposed Fetch Mask Predictor unit determines which instructions in a cache access will actually be used to avoid fetching any of the other instructions. Both proposed approaches are evaluated for an embedded 4-wide issue processor in 100 nm technology. Experimental results show average I-cache energy savings of 48% and overall processor energy savings of 19%. 相似文献