期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

贾朝阳张敦博王琼沈立《计算机工程与科学》2020,42(9):1521-1528

通用图形处理单元（GPGPU）已被广泛应用于现代高性能计算系统中。GPGPU的单指令多线程执行模型导致快表命中率较低,特别是对于那些不规则应用,需要借助PWC减少实际的页表访问次数。传统PWC中存在很多冗余信息,加之容量有限,实际效果并不理想。分析了传统PWC中的信息冗余情况,提出了一种新结构——压缩PWC。压缩PWC在保证查找开销不变的基础上,完全消除了冗余信息,压缩了空间,使PWC能够记录更多的页表访问历史,从而有效减少地址转换过程中访问页表的次数。测试结果表明,与相同容量的传统PWC相比,压缩PWC可以显著缩短虚实地址转换时间开销。相似文献

2.

ARM对SDSM操作系统虚地址转换支持研究

王庆民刘福岩连嘉《微计算机信息》2007,23(11):170-172

单一数据存储模型操作系统(SDSM)是一个新型的操作系统,它只存在一种数据类型-文件,现使用ARM的内存管理单元MMU部件提供的地址映射功能和ARM MMU中所创建的页表的方法,结合单一数据存储模型操作系统中内存管理的特点建立段表,给出单一数据存储模型操作系统在使用ARM一级页表中虚地址地址转换方法,说明ARM可以使用页表对单一数据存储模型OS的虚地址转换提供支持。相似文献

3.

一种基于内存虚拟化技术的EPT机制优化方法

李绍罗省贤《广东电脑与电讯》2011,1(2):0-0

基于linux内核发展的KVM虚拟机解决方案的应用日益广泛,为减少虚拟客户机运行过程中的地址空间转换开销,本文提出了一种利用基于Intel-VT技术的EPT页表进行地址转换的优化方法,并经实验证实了该优化方法的有效性。相似文献

4.

基于IA64架构的虚拟哈希页表的研究与实现

下载免费PDF全文

陈海燕邓让钰邢座程《计算机工程与科学》2006,28(8):101-104

虚拟哈希页表（VHPT）是高性能微处理器系统实现虚拟地址到物理地址的转换映像，是存储管理的关键技术之一。本文在讨论IA64微处理器地址空间的基础上分析了单地址空间（SAS）和多地址空间（脑）模型的应用需求，研究了长格式、短格式两种页表映射机制，实现了基于这两种格式的64位虚地址空间的哈希地址算法，增强了虚地址转换的
性能。模拟结果表明，该设计与IA64架构兼容。相似文献

5.

减少TLB失效开销提高64位Linux系统性能的方法

许先超《计算机工程》2006,32(2):70-72

针对64位的Linux提出了一个减少TLB失效开销的方法——FAST_TLB_REFILL（快速TLB重载入）。测试结果表明,这种方法可以将TLB失效处理时间减少30％以上,对TLB失效比较频繁的程序会有1％～7％的性能提高。相似文献

6.

页迁移系统中反向页表技术的设计与实现

杜静戴华东杨学军《计算机科学》2004,31(12):210-213

页迁移技术是实现CC-NUMA存储优化的一种重要策略,它动态开发了数据的局部性。页迁移策略的实现涉及到虚存系统中物理地址到虚拟地址的转换,传统做法需要遍历所有进程的虚拟地址空间,效率低、开销大。针对此问题,本文介绍了一种能够高效实现物理地址到虚拟地址转换的技术——反向页表技术,着重介绍了反向页表的设计、实现和维护方法。相似文献

7.

基于EPT的内存虚拟化研究与实现

李勇郭玉东王晓睿时光《计算机工程与设计》2010,31(18)

为降低虚拟机监控器在内存虚拟化方面的开销,提高内存虚拟化性能,分析了两种的内存虚拟化机制,着重对基于Intel扩展页表的内存虚拟化机制进行了研究,分析了基于展页表的两种内存虚拟化方案优劣,并进一步分析了影响内存虚拟化性能的因素.针对扩展页表页故障,提出了页池的动态内存分配方案.内存虚拟化实现表明,采用扩展页表实现内存虚拟化能简化了设计流程,有效地提高了内存虚拟化性能. 相似文献

8.

针对Linux操作系统的MMU设计

陆超朱贺飞陈兆千周晓方《小型微型计算机系统》2007,28(4):738-741

本文针对Linux操作系统的内存管理机制设计了一款在TLB不命中时自动查询页表,填充TLB的MMU,并为它设计了一条专门的验证、调试平台.经仿真验证后,本文所设计的MMU能很好的和Linux配合,高效的完成虚拟地址和物理地址的转换. 相似文献

9.

基于Android的工业控制监控软件设计

彭鑫谭彰黄文君王兴华《计算机工程》2013,39(7)

提出一种基于Android平台的工业控制移动监控软件设计方案.采用面向对象和层次化的方法,开发一个具有工业流程图显示、报警推送、安全认证等功能的监控软件.在原有工厂网络拓扑结构中架设移动终端服务器保证系统兼容性.运用Android NDK开发、多级页表映射和异步网络传输的方式提高位号点数据的传输速度,保证软件的实时性.测试结果表明,该方案能有效解决传统上位机软件只能运行于PC机端问题,具有良好的可用性. 相似文献

10.

MIPS内存管理单元的设计与实现

下载免费PDF全文

卢仕听尤凯迪韩军曾晓洋《计算机工程》2010,36(21):270-271,274

设计MIPS32 4kc处理器内存管理单元(MMU),该模块对处理器地址进行合法性检查,并按照不同的地址空间对虚拟地址进行静态或动态映射。在硬件上采用三级流水线方式实现JTLB,并为处理器指令端口和数据端口设计相应的快表以提高TLB的查询速度。MMU与总线接口模块的时序采用简化的AMBA协议,与处理器进行联合调试并运行Linux操作系统,同时在功能上通过FPGA验证。该模块经过DC综合后,面积约为32K等效逻辑门。相似文献

11.

CC-NUMA系统中面向页迁移的反向页表技术

杜静戴华东杨学军《计算机工程》2005,31(6):76-78,116

页迁移技术是实现CC-NUMA访存局部性优化的一种重要策略,其实现涉及到虚存系统中物理地址到虚拟地址的转换,传统做法需要遍历所有进程的虚拟地址空间,效率低、开销大.针对此问题,介绍了一种在操作系统内核中高效实现物理地址到虚拟地址转换的技术-一反向页表技术,并着重阐述了反向页表在页迁移策略中的应用. 相似文献

12.

Code Transformations for TLB Power Reduction

Reiley Jeyapaul Aviral Shrivastava 《International journal of parallel programming》2010,38(3-4):254-276

The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. The TLB though small is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Recently, several circuit and microarchitectural implementations of TLBs have been proposed to reduce TLB power. One simple, yet effective TLB design for power reduction is the Use-Last TLB architecture proposed in IEEE J Solid State Circuits, 1190–1199, (2004). The Use-Last TLB architecture reduces the power consumption when the last page is accessed again. In this work, we develop code transformation techniques to reduce the page switchings in data cache accesses and propose an efficient page-aware code placement technique to enhance the energy reduction capabilities achieved by the Use-Last TLB architecture for instruction cache accesses. Our comprehensive page switch reduction algorithm results in an average of 39% reduction in the data-TLB page switching, and our code placement heuristic results in an average of 76% reduction in the instrucion-TLB page switchings with negligible impact on the performance on benchmarks from MiBench, Multimedia, DSPStone and BDTI suites. The reduced page switch count through our techniques achieves an equivalent power savings, above and beyond the reduction achieved by the Use-Last TLB architecture implementation. 相似文献

13.

UCat: heterogeneous memory management for unikernels

Chong TIAN Haikun LIU Xiaofei LIAO Hai JIN 《Frontiers of Computer Science》2023,17(1):171204

Unikernels provide an efficient and lightweight way to deploy cloud computing services in application-specialized and single-address-space virtual machines (VMs). They can efficiently deploy hundreds of unikernel-based VMs in a single physical server. In such a cloud computing platform, main memory is the primary bottleneck resource for high-density application deployment. Recently, non-volatile memory (NVM) technologies has become increasingly popular in cloud data centers because they can offer extremely large memory capacity at a low expense. However, there still remain many challenges to utilize NVMs for unikernel-based VMs, such as the difficulty of heterogeneous memory allocation and high performance overhead of address translations. In this paper, we present UCat, a heterogeneous memory management mechanism that support multi-grained memory allocation for unikernels. We propose front-end/back-end cooperative address space mapping to expose the host memory heterogeneity to unikernels. UCat exploits large pages to reduce the cost of two-layer address translation in virtualization environments, and leverages slab allocation to reduce memory waste due to internal memory fragmentation. We implement UCat based on a popular unikernel--OSv and conduct extensive experiments to evaluate its efficiency. Experimental results show that UCat can reduce the memory consumption of unikernels by 50% and TLB miss rate by 41%, and improve the throughput of real-world benchmarks such as memslap and YCSB by up to 18.5% and 14.8%, respectively. 相似文献

14.

Compressed page walk cache

Dunbo ZHANG Chaoyang JIA Li SHEN 《Frontiers of Computer Science》2022,16(3):163104

GPUs are widely used in modern high-performance computing systems. To reduce the burden of GPU programmers, operating system and GPU hardware provide great supports for shared virtual memory, which enables GPU and CPU to share the same virtual address space. Unfortunately, the current SIMT execution model of GPU brings great challenges for the virtual-physical address translation on the GPU side, mainly due to the huge number of virtual addresses which are generated simultaneously and the bad locality of these virtual addresses. Thus, the excessive TLB accesses increase the miss ratio of TLB. As an attractive solution, Page Walk Cache (PWC) has received wide attention for its capability of reducing the memory accesses caused by TLB misses. However, the current PWC mechanism suffers from heavy redundancies, which significantly limits its efficiency. In this paper, we first investigate the facts leading to this issue by evaluating the performance of PWC with typical GPU benchmarks. We find that the repeated L4 and L3 indices of virtual addresses increase the redundancies in PWC, and the low locality of L2 indices causes the low hit ratio in PWC. Based on these observations, we propose a new PWC structure, namely Compressed Page Walk Cache (CPWC), to resolve the redundancy burden in current PWC. Our CPWC can be organized in either direct-mapped mode or set-associated mode. Experimental results show that CPWC increases by 3 times over TPC in the number of page table entries, increases by 38.3% over PWC in L2 index hit ratio and reduces by 26.9% in the memory accesses of page tables. The average memory accesses caused by each TLB miss is reduced to 1.13. Overall, the average IPC can improve by 25.3%. 相似文献

15.

一个基于IA-64体系的内存管理大页面的实现模型

陈鸣春潘金贵《计算机科学》2007,34(4):276-278

本文提出了一种基于IA-64体系结构的内存页面大页面化的模型，可执行文件ELF的Data Segment使用大页面。由于转换解析缓冲区（TLB）能映射更大的虚拟内存范围，从而可减小未命中率，因此可以提高使用大页面的高性能计算（HPC）应用程序或使用大量虚拟内存的任何内存访问密集型应用程序系统性能。相似文献

16.

Moving address translation closer to memory in distributed shared-memory multiprocessors

Qiu X. Dubois M. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(7):612-623

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low. 相似文献

17.

申威架构下的软件平滑嵌套页表

沙赛杜翰霖罗英伟汪小林王振林《计算机研究与发展》2022,59(4):737-746

嵌套页表是一种硬件辅助的内存虚拟化模型,当前国产申威处理器上未能提供该模型所需的硬件支持.然而申威架构特有的特权程序可编程接口可以通过软件构建必要的底层硬件支持.该接口运行在申威硬件模式上,具有最高CPU特权级.基于这一特性,在申威平台上实现了软件平滑嵌套页表模型swFNPT,通过软件设计优化弥补了硬件支持上的不足.特别地,使用平滑(1级)嵌套页表代替4级嵌套页表来提升页表查询效率.使用多组测试程序测试该设计的性能.在申威1621服务器上的实验结果表明：swFNPT整体性能良好.SPEC CPU 2006的平均内存虚拟化开销约为3%,SPEC CPU 2017中大工作集程序的平均开销约为4%,STREAM内存带宽测试结果显示swFNPT的带宽损失低于3%.这一工作可以为申威架构的硬件辅助虚拟化发展提供有价值的参考. 相似文献

18.

跨平台系统级虚拟机的访存优化

蔡嵩松刘奇沈海华章隆兵《计算机研究与发展》2012,(Z1):131-136

跨平台系统级虚拟机软件模拟访存操作效率低,严重影响了虚拟机的性能.为提高跨平台虚拟机访存效率,提出了一种使用宿主系统TLB硬件、加速跨平台系统级虚拟机访存地址转换的软硬件协同优化方法.该方法相对于软件访存模拟方法,有效利用了宿主系统的硬件资源,提高了跨平台系统级虚拟机执行访存操作效率.实验结果表明该方法将虚拟机系统的整体性能提高了近15%.提出的方法已实际应用在龙芯系统级跨平台虚拟机中. 相似文献

19.

通过部分页迁移实现CPU-GPU高效透明的数据通信

张诗情杨耀华沈立王志英《计算机工程与科学》2019,41(7):1168-1175

尽管对集成GPU和下一代互连的研究投入日益增加,但由PCI Express连接的独立GPU仍占据市场的主导地位,CPU和GPU之间的数据通信管理仍在不断发展。最初,程序员显式控制CPU和GPU之间的数据传输。为了简化编程,GPU供应商开发了一种编程模型,为“CPU+GPU”异构系统提供单个虚拟地址空间。此模型中的页迁移机制会自动根据需要在CPU和GPU之间迁移页面。为了满足高性能工作负载的需求,页面大小有增大趋势。受低带宽和高延迟互连的限制,较大的页面迁移延迟时间较长,这可能会影响计算和传输的重叠并导致严重的性能下降。提出了部分页迁移机制,它只迁移页面的所需部分,以缩短迁移延迟并避免页面变大时整页迁移的性能下降。实验表明,当页面大小为2 MB且PCI Express带宽为16 GB/s时,部分页迁移可以显著隐藏整页迁移的性能开销,相比于程序员控制数据传输,整页迁移有平均98.62%倍的减速,而部分页迁移可以实现平均1.29倍的加速。此外,我们测试了页面大小对快表缺失率的影响以及迁移单元大小对性能的影响,使设计人员能够基于这些信息做出决策。相似文献

20.

针对嵌入式系统的存储器管理单元设计

朱贺飞陆超周晓方闵昊周电《计算机工程与应用》2007,43(1):96-99

针对Linux操作系统,实现了面向32位RSIC嵌入式处理器的存储器管理单元。通过在指令快表中增加预比较电路,提高了处理器连续访问同一虚拟页面时的地址转换效率。快表失效时,设计了专门的硬件来实现页表查询及快表填充,处理速度明显优于软件。论文设计的MMU能够很好地和Linux配合,完成地址映射及存储权限管理。相似文献