期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴贞海刘福岩李雪敏易松《计算机工程与设计》2010,31(13)

为了提高操作系统内核在地址空间切换时候的效率,利用ARM处理器上的快速上下文切换扩展(fast context switching extension,FCSE),实现任务地址空间的二次重定向.该方法首先将任务地址空间重定向到一个大的单地址空间,内存管理单元再进行从虚拟地址空间到物理地址空间的重定向.单地址空间到物理地址空间的地址转换信息始终保持有效.因而,内存管理单元(memory management unit,MMU)在任务切换的时候不必清空转换查询表(translation look-aside buffer,TLB),可提高处理器在任务切换时候的效率.通过在ARM920RT处理器上的实验发现,按该方法设计实现的内核可以稳定地运行在虚拟机上. 相似文献

2.

基于移动终端的WLAN快速切换方案

下载免费PDF全文

徐伟杨怡陶军《计算机工程》2009,35(14):135-137

移动终端在AP间切换产生的时延和抖动严重影响实时业务的质量。通过分析移动终端切换的过程和现有的改进方案,提出一种基于动态域值的扫描触发机制,有效地避免移动终端在静止和AP信号较好条件下的cache更新。在STA上实现基于动态域值触发扫描的分片cache的更新算法,该算法在保证cache及时更新的同时降低每次更新cache的开销且能有效减小切换时延。相似文献

3.

MIPS内存管理单元的设计与实现

下载免费PDF全文

卢仕听尤凯迪韩军曾晓洋《计算机工程》2010,36(21):270-271,274

设计MIPS32 4kc处理器内存管理单元(MMU),该模块对处理器地址进行合法性检查,并按照不同的地址空间对虚拟地址进行静态或动态映射。在硬件上采用三级流水线方式实现JTLB,并为处理器指令端口和数据端口设计相应的快表以提高TLB的查询速度。MMU与总线接口模块的时序采用简化的AMBA协议,与处理器进行联合调试并运行Linux操作系统,同时在功能上通过FPGA验证。该模块经过DC综合后,面积约为32K等效逻辑门。相似文献

4.

墓于MIPS架构的内存虚拟化研究

蔡万伟台运方刘奇张戈《计算机研究与发展》2013,50(10)

内存虚拟化是系统虚拟化中如何有效抽象、利用、隔离计算机物理内存的重要方法,决定着系统虚拟化的整体性能.传统的纯软件内存虚拟化方法会产生较大的资源开销并且兼容性差,而硬件辅助的内存虚拟化方法需要重新设计处理器硬件架构.基于MIPS架构处理器提出一种软硬件协同的内存虚拟化方法,在不增加硬件支持的情况下提高内存虚拟化性能.提出的多层虚拟地址空间模型不仅可以解决MIPS架构处理器存在的虚拟化缺陷,而且可以在已有的内存虚拟化方法上提高性能.在多层虚拟地址空间模型的基础上,提出基于地址空间标识码(address space identity,ASID)、动态划分的旁路转换缓冲(translation lookaside buffer,TLB)共享方法,降低了虚拟机切换的开销.最终,在MIPS架构的龙芯3号处理器上实现了系统虚拟机VIRT-LOONGSO)N.性能测试表明,提出的方法可以提高大多数测试程序的性能,达到二进制翻译执行性能的3～5倍,并在TLB模拟方法的基础上提高了5％～16％的性能. 相似文献

5.

面向分层移动IPv6的高效快速切换方案

杨怡《计算机技术与发展》2012,(10)

快速切换方案在降低切换延迟方面所取得的效果已经得到了公认,但其交互过程较为复杂,不利于该技术的大规模推广.针对该问题,文中结合分层移动IPv6提出了一种高效快速切换方案.该方案不仅通过缩短重复地址检测的时延进一步降低快速切换的切换延迟,还借助应用层和链路层的信息有选择的执行快速切换,从而既降低了切换延迟也简化了切换交互过程、降低切换注册中的消息流量.实验结果表明,本方案中注册开销和切换延迟均低于现有快速分层移动IPv6方案. 相似文献

6.

一种并行计算机互连网络中的地址转换Cache

张建民黎铁军李思昆《计算机研究与发展》2016,53(2):390-398

当前在大规模并行计算机中,多数并行程序的用户习惯于使用虚拟地址进行编程.因此,虚拟地址与物理地址之间的转换效率直接影响了并行程序的执行性能,而cache能够有效地提高虚实地址转换的效率并降低延迟.提出了一种在大规模并行计算机互连网络中的地址转换cache.它采用了嵌入式DRAM(embedded dynamic random access memory, eDRAM)存储器,容纳更多的地址转换表项,从而提高命中率.并设计一种eDRAM刷新机制,隐藏了刷新操作,避免刷新导致的性能损失.ATC(address translation cache)中实现了诸如纠错码与旁路机制等多种可靠性设计.在32个计算结点上运行业界公认的NPB测试程序,结果显示32个结点中ATC的平均命中率达到了95.3%,表明ATC设计的正确性与高性能.并且通过与3种传统SRAM(static random access memory)实现的cache进行对比实验,说明了cache容量是提高命中率的关键因素. 相似文献

7.

一种改进的预先式快速切换技术

李慧芹高仲合胡广昌《微型机与应用》2010,29(11)

介绍了移动IPv6快速切换技术的原理,分析了快速切换技术的切换过程,指出了它的不足,并提出了快速切换技术的优化方案.新的切换技术使用链路层触发器和快速路由器通告来获取接入路由器的IP地址信息,并且使用IP地址与MAC地址映射来加快获得AR的IP地址.在切换过程中提前建立基站之间的隧道进行数据传输,进一步改善了快速切换的切换性能. 相似文献

8.

一种DSP的快速上下文切换机制*

刘月吉张盛兵黄嵩人《计算机应用研究》2012,29(1):203-206

针对嵌入式系统实时控制和信号处理的需求,建立了一种基于DSP架构的快速上下文切换机制,为实时处理提供了有力支持。机制采用两条独立的总线,分别用来传送地址和数据信息,实现地址和数据信息的并行传输,增加了上下文保存和恢复的带宽;同时应用影子寄存器与通用寄存器之间的切换,有效减少了对存储器的访问;引入对上下文的延后保存和提前恢复操作,解决了任务或中断嵌套调用时的低效问题,显著地提高了上下文切换的速度。相似文献

9.

免重复地址检测移动IPv6快速切换方案

房家保王振兴张连成《计算机应用研究》2017,34(5)

移动IPv6(MIPv6)的切换性能是保证移动互联网服务质量的重要因素之一。针对MIPv6中移动节点切换时延长的问题,提出一种免重复地址检测移动IPv6快速切换方案。该方案重新划分IPv6地址的64位接口标识符为1位移动标识位、32位家乡代理IPv4地址和31位随机数,可保证移动节点接口标识符在全网的唯一性,使移动节点在不同链路切换过程中不再需要重复地址检测,进而可有效提高移动节点的切换性能。仿真实验结果表明,该方案不需要进行任何重复地址检测即可完成移动节点无缝切换,且缩短移动节点切换时延约25%。相似文献

10.

一种802.16网络中的移动IPv6快速切换方案

刘剑孙世新《福建电脑》2007,(11):167-168

移动IPv6是IETF提出来解决网络层移动性问题的标准,但移动IPv6在切换过程中存在无法接受的切换延时和数据包丢失,无法满足实时应用的要求.通过对802.16链路层切换过程的研究,本文提出了一种802.16网络中的移动IPv6快速切换方案,该方案利用链路层的触发信号和隧道减少移动IPv6的移动检测,转交地址配置和绑定更新的时延以及切换过程中的数据包丢失. 相似文献

11.

Moving address translation closer to memory in distributed shared-memory multiprocessors

Qiu X. Dubois M. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(7):612-623

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low. 相似文献

12.

一种TLB结构优化方法

下载免费PDF全文

何军张晓东郭勇《计算机工程》2012,38(21):253-256

针对国产处理器地址代换旁路缓冲(TLB)性能不足的问题,通过对现有的虚实地址代换流程进行分析,提出设置独立第三级页表基址虚实映射缓存,对数据TLB结构进行优化的方法,减少低级页表虚实映射关系对高级页表虚实映射关系的挤占淘汰。SPEC CPU2000测试结果表明,近一半的课题能减少60％以上数据TLB的DM次数,少数课题甚至能减少90％以上,有效减少数据TLB缺失率。相似文献

13.

Code Transformations for TLB Power Reduction

Reiley Jeyapaul Aviral Shrivastava 《International journal of parallel programming》2010,38(3-4):254-276

The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. The TLB though small is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Recently, several circuit and microarchitectural implementations of TLBs have been proposed to reduce TLB power. One simple, yet effective TLB design for power reduction is the Use-Last TLB architecture proposed in IEEE J Solid State Circuits, 1190–1199, (2004). The Use-Last TLB architecture reduces the power consumption when the last page is accessed again. In this work, we develop code transformation techniques to reduce the page switchings in data cache accesses and propose an efficient page-aware code placement technique to enhance the energy reduction capabilities achieved by the Use-Last TLB architecture for instruction cache accesses. Our comprehensive page switch reduction algorithm results in an average of 39% reduction in the data-TLB page switching, and our code placement heuristic results in an average of 76% reduction in the instrucion-TLB page switchings with negligible impact on the performance on benchmarks from MiBench, Multimedia, DSPStone and BDTI suites. The reduced page switch count through our techniques achieves an equivalent power savings, above and beyond the reduction achieved by the Use-Last TLB architecture implementation. 相似文献

14.

一种高速TLB的设计与实现

刘宗林吴虎成唐涛党桂斌《计算机工程与应用》2007,43(16):1-3

为了加快微处理器中线性地址向物理地址转换的速度,提出了一种高速TLB结构。结构采用全定制的CAM阵列和SRAM阵列,并根据CAM和SRAM单元的输出特点设计了精巧的读出放大逻辑,有效提高了TLB的读出速度。经流片测试,表明设计正确可靠,能够保证地址转换延时在1 ns左右。相似文献

15.

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

下载免费PDF全文

孙含欣杨鲲鹏赵雨来佟冬程旭《计算机科学技术学报》2008,23(1):141-153

The instruction fetch unit （IFU） usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs. 相似文献

16.

A complete compiler approach to auto-parallelizing C programs for multi-DSP systems

Franke B. O'Boyle M.F.P. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(3):234-245

Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). This work develops, for the first time, a complete auto-parallelization approach, which overcomes these issues. It first combines a pointer conversion technique with a new modulo elimination transformation for program recovery enabling later parallelization stages. Next, it integrates a novel data transformation technique that exposes the processor location of partitioned data. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. Furthermore, as DSPs do not possess any data cache structure, an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth. This parallelization approach is applied to the DSPstone and UTDSP benchmark suites, giving an average speedup of 3.78 on four analog devices TigerSHARC TS-101 processors. 相似文献

17.

针对嵌入式系统的存储器管理单元设计

朱贺飞陆超周晓方闵昊周电《计算机工程与应用》2007,43(1):96-99

针对Linux操作系统,实现了面向32位RSIC嵌入式处理器的存储器管理单元。通过在指令快表中增加预比较电路,提高了处理器连续访问同一虚拟页面时的地址转换效率。快表失效时,设计了专门的硬件来实现页表查询及快表填充,处理速度明显优于软件。论文设计的MMU能够很好地和Linux配合,完成地址映射及存储权限管理。相似文献

18.

嵌入式处理器中的Sector Cache:性能与面积的折衷

左琦付宇卓鲁欣《小型微型计算机系统》2006,27(1):166-170

Sector Cache曾经被用于一些最早使用Cache技术的计算机系统中．虽然Sector Cache的性能略差于普通Cache，但同样Cache容量下Sector结构所需的标记位明显少于普通结构．由于嵌入式处理器对芯片面积的要求非常严格，Sector Cache的优点在嵌入式处理器中就更为明显．本文用基于仿真的方法详细分析了Sector结构的Cache在嵌入式应用环境下的性能．仿真结果表明，合理使用Sector结构可以以较小的性能代价有效地减少标记位数量．因此，采用Sector Cache就可以在满足性能要求的前提下尽可能减小Cache控制器的面积．本文认为Sector Cache是嵌入式处理器设计者进行性能／面积折衷有效手段．相似文献

19.

Translation-lookaside buffer consistency

Teller P.J. 《Computer》1990,23(6):26-36

Nine solutions to the cache consistency problem for shared-memory multiprocessors with multiple translation-lookaside buffers (TLBs) are described. A TLB's function is defined, and it is shown how TLB inconsistency arises in uniprocessor and multiprocessor architectures. The problem of TLB consistency is solved in a uniprocessor and in multiprocessors with a shared bus, virtual-address caches, and hardware cache consistency. Solutions that can be implemented in multiprocessors with more general interconnection networks and without hardware cache consistency are presented 相似文献

20.

H-NMRU: An Efficient Cache Replacement Policy with Low Area

Sourav Roy 《International journal of parallel programming》2010,38(3-4):277-287

In present multi-core devices, the individual processors do not need to operate at the highest possible frequencies. Instead there is a need to reduce the power, complexity and area of individual processor components like caches. In this paper we propose a low area, high performance cache replacement policy for embedded processors called Hierarchical Non-Most-Recently-Used (H-NMRU). The H-NMRU is a parameterizable policy where we can trade-off performance with area. We extended the Dinero cache simulator with the H-NMRU policy and performed architectural exploration with a set of cellular and multimedia benchmarks. On a 16 way cache, a two level H-NMRU policy where the first and second levels have 8 and 2 branches, respectively, performs as good as the Pseudo-LRU policy with storage area saving of 27%. Compared to true LRU, H-NMRU on a 16 way cache saves huge amount of area (82%) with marginal increase of cache misses (3%). Similar result was also noticed on other cache like structures like branch target buffers. Therefore, the two level H-NMRU cache replacement policy (with associativity/2 and 2 branches on the two levels) is a very attractive option for caches on embedded processors with associativities greater than 4. We present a case-study where it can be used on the L2 cache with substantial gain in performance and area for single and dual core platforms. 相似文献