首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
许立  罗军  卢凯 《计算机工程》2006,32(1):99-101,156
非一致存储访问(Non-Uniform Memou Access,NUMA)是目前高性能服务器的主流体系结构之一,传统操作系统的调度器由于无法感知复杂NUMA系统的拓扑结构,导致较大的远程节点数据访问开销。该文在深入分析O(1)调度算法对NUMA支持的基础上,提出一种基于NUMA拓扑结构的分级调度算法。实验证明,该算法较好地实现了具有节点亲近能力的NUMA调度,提高了数据访问的局部性,优化了系统性能。  相似文献   

2.
引入一种NUMA多处理器原型系统,分析该系统上的操作系统对物理内存管理的特点。基于该系统设计和实现了一个全局共享内存系统,使操作系统充分利用整个系统上的物理内存,减少应用程序的执行时间。实验结果表明,该系统能够更好地支持存储密集型 应用。  相似文献   

3.
主动存储系统结构   总被引:4,自引:0,他引:4  
靳超  郑纬民  张悠慧 《计算机学报》2005,28(6):1013-1020
随着系统结构技术的发展和用户对数据访问的需求,目前的存储结构成为性能访问的瓶颈,磁存储技术的发展使得存储密度每年增长100%的同时,访问延迟的增长率仅为10%,导致目前的块数据式访问接口无法改变I/O的访问性能相对落后于CPU和内存访问速度的状况,随着嵌入式芯片技术的发展。计算向数据迁移成为必然趋势,文章提出了一种主动存储的系统结构,该系统中的主动存储设备利用嵌入式的计算能力通过集合存储对象和应用对象的访问接口支持简单的计算任务和存储管理功能,原型系统的实验数据表明。该系统对于数据敏感性应用的可扩展访问支持显著优于传统系统,同时相比于传统系统具备更高的可扩展性。  相似文献   

4.
存储虚拟化技术的研究   总被引:2,自引:0,他引:2  
存储虚拟化技术是基于网络的存储管理技术,它屏蔽大量异构设备的差异性,向用户提供简单的逻辑存储访问接口;它简化了存储管理,优化了系统性能,提高了存储设备的利用率。本文从存储虚拟化的概念入手,详细分析了存储虚拟化的模型结构、分层、采用的协议等方面的内容,介绍了当前虚拟化技术的最新进展,讨论了基于网络的虚拟存储技术的优点和不足。  相似文献   

5.
一种高可扩展存储网络系统TH-MSNS的研究与实现   总被引:3,自引:1,他引:2  
网络存储系统对海量信息的存储与处理、数据的可伸缩性访问与可用以及数据的服务质量与存储安全等都具有重要意义.该文基于FCP设计并实现了一种可扩展存储区域网络系统TH—MSNS,该系统可通过双HBA卡增加带宽和可用性,通过双I/O节点机增加可靠性和可用性,通过多I/O节点扩展容量至260TB等.该文介绍了TH—MSNS的体系结构、SCSI目标模拟器、嵌入式操作系统EOS和存储管理的设计技术与实现方法.该系统SCSI目标模拟器采用分层设计并提供规范接口,可扩充不同的SCSI设备以及不同的网络连接协议;设计的核心软件在嵌入式操作系统的核心态通过内核模块实现,提高了效率;存储管理软件采用分布式结构,独立于操作系统,实现了对象管理、设备自动发现、访问控制、日志等管理功能.与同类系统相比,该系统具有效率高、扩展方便、易维护和兼容性好的特点.  相似文献   

6.
智能卡操作系统中存储管理设计   总被引:1,自引:0,他引:1  
介绍了智能卡操作系统中存储管理的设计方法,给出了存储管理所涉及的数据结构,分析了各种存储管理的特点及应用。  相似文献   

7.
介绍了智能卡操作系统中存储管理的设计方法,给出了存储管理所涉及的数据结构,分析了各种存储管理的特点及应用。  相似文献   

8.
介绍了智能卡操作系统中存储管理的设计方法,给出了存储管理所涉及的数据结构,分析了各种存储管理的特点及应用.  相似文献   

9.
介绍了智能卡操作系统中存储管理的设计方法,给出了存储管理所涉及的数据结构,分析了各种存储管理的特点及应用.  相似文献   

10.
本文描述会议与合著系统的多媒体信息存储管理技术,该系统客户机/服务器体系结构,采用本地数据库(LDB)和中央数据库(CDB)两级存储,对CDB中的我媒体数据分级共享策略,很好地支持多媒体数据的存储和多种方式的查询,并支持对服务器的并发访问。  相似文献   

11.
基于多虚空间多重映射技术的并行操作系统   总被引:3,自引:0,他引:3  
陈左宁  金怡濂 《软件学报》2001,12(10):1562-1568
高性能计算机系统的可扩展性是系统设计的一大难题,NUMA(non-uniformmemoryarchitecture)结构正是为了解决共享存储体系的可扩展性问题而提出来的.研究和实践表明,整机系统的可扩展性与操作系统的结构有着密切的关系.典型的多处理机操作系统通常采用两种结构,基于共享的单一核心结构以及基于消息的多核心结构.通过分析得出结论认为,这两种结构都不能很好地适应可扩展并行机尤其是NUMA结构并行机的需求.针对存在的问题,提出了新的结构设计思想:多虚空间多重映射与主动消息相结合.测试和运行结果显示,该结构成功地解决了系统的可扩展问题.  相似文献   

12.
基于CC-NUMA系统模拟器的并行程序性能分析   总被引:1,自引:0,他引:1       下载免费PDF全文
针对CC-NUMA并行系统的特点,本文描述了模拟器-AMY的设计与实现。该模拟器运行在x86PC机上的Linux操作系统环境下,采用多项优化技术,能够较精确地统计并行程序的时间开销和CC-NUMA并行系统的各项参数,具有执行速度快、精度高和内存开销小等特点。在AMY模拟器环境下,通过对几个典型的并行测试程序的模拟执行,文章给出了统计的模拟结果,分析了并行测试程序的执行行为和开销,最后得出了在CC-NUMA并行系统中对并行程序进行性能优化的有益的指导原则。  相似文献   

13.
The efficiency of the basic operations of a NUMA (nonuniform memory access) multiprocessor determines the parallel processing performance on a NUMA multiprocessor. The authors present several analytical models for predicting and evaluating the overhead of interprocessor communication, process scheduling, process synchronization, and remote memory access, where network contention and memory contention are considered. Performance measurements to support the models and analyses through several numerical examples have been done on the BBN GP1000, a NUMA shared-memory multiprocessor. Analytical and experimental results give a comprehensive understanding of the various effects, which are important for the effective use of NUMA shared-memory multiprocessor. The results presented can be used to determine optimal strategies in developing an efficient programming environment for a NUMA system  相似文献   

14.
持久性内存(persistmemory,PM)具有非易失、字节寻址、低时延和大容量等特性,打破了传统内外存之间的界限,对现有软件体系结构带来颠覆性影响.但是,当前PM硬件还存在着磨损不均衡、读写不对称等问题,特别是当跨NUMA(nonuniformmemoryaccess)节点访问PM时,存在着严重的I/O性能衰减问题.提出了一种NUMA感知的PM存储引擎优化设计,并应用到中兴新一代数据库系统GoldenX中,显著降低了数据库系统跨NUMA节点访问持久内存的开销.主要创新点包括:提出了一种DRAM+PM混合内存架构下跨NUMA节点的数据空间分布策略和分布式存取模型,实现了PM数据空间的高效使用;针对跨NUMA访问PM的高开销问题,提出了I/O代理例程访问方法,将跨NUMA访问PM开销转化为一次远程DRAM内存拷贝和本地访问PM的开销,设计了Cache Line Area (CLA)缓存页机制,缓解了I/O写放大问题,提升了本地访问PM的效率;扩展了传统表空间概念,让每个表空间既拥有独立的表数据存储,也拥有专门的WAL (write-ahead logging)日志存储,针对该分布式WA...  相似文献   

15.
The performance and energy efficiency of current systems is influenced by accesses to the memory hierarchy. One important aspect of memory hierarchies is the introduction of different memory access times, depending on the core that requested the transaction, and which cache or main memory bank responded to it. In this context, the locality of the memory accesses plays a key role for the performance and energy efficiency of parallel applications. Accesses to remote caches and NUMA nodes are more expensive than accesses to local ones. With information about the memory access pattern, pages can be migrated to the NUMA nodes that access them (data mapping), and threads that communicate can be migrated to the same node (thread mapping).In this paper, we present LAPT, a hardware-based mechanism to store the memory access pattern of parallel applications in the page table. The operating system uses the detected memory access pattern to perform an optimized thread and data mapping during the execution of the parallel application. Experiments with a wide range of parallel applications (from the NAS and PARSEC Benchmark Suites) on a NUMA machine showed significant performance and energy efficiency improvements of up to 19.2% and 15.7%, respectively, (6.7% and 5.3% on average).  相似文献   

16.
To capitalize on multicore power, modern high-speed data transfer applications usually adopt multi-threaded design and aggregate multiple network interfaces. However, NUMA introduces another dimension of complexity to these applications. In this paper, we undertook comprehensive experiment on real systems to illustrate the importance of NUMA-awareness to applications with intensive memory accesses and network I/Os. Instead of simply attributing the NUMA effect to the physical layout, we provide an in-depth analysis of underlying interactions inside hardware devices. We profile the system performance by monitoring relevant hardware counters, and reveal how the NUMA penalty occurs during prefetch and cache synchronization processes. Consequently, we implement a thread mapping module in a bulk data transfer software, BBCP, as a practical example of enabling NUMA-awareness. The enhanced application is then evaluated on our high-performance testbed with storage area networks (SAN). Our experimental results show that the proposed NUMA optimizations can significantly improve BBCP’s performance in memory-based tests with various contention levels and realistic data transfers involving SAN-based storage.  相似文献   

17.
通用可伸缩并行神经计算机系统NeuroC的设计和实现   总被引:1,自引:0,他引:1  
NeuroC是一个通用的并行神经网络计算机系统,它的规模是可以伸缩的,针对神经计算,系统设计了一套具有可选择广播通信能力的非等时访问共享存储器系统,本文首次对神经计算的需求进行了分析,接着讨论了计算单元的选择,存储器的组织与通信的实现,儋后介绍了系统硬件的主要结构,文章中还简要地阐述了系统中软件的组成和结构,最后本文对NeuroC的特点进行了总结。  相似文献   

18.
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used with a crossbar-like medium inside each cluster and a network-on-chip (NoC) at the global level which make memory operations nonuniform (NUMA). Due to NUMA, regular applications typically employed in the embedded domain (e.g., image processing, computer vision, etc.) ultimately behave as irregular workloads if a flat memory system is assumed at the program level. Nested parallelism represents a powerful programming abstraction for these architectures, provided that (i) streamlined middleware support is available, whose overhead does not dominate the run-time of fine-grained applications; (ii) a mechanism to control thread binding at the cluster-level is supported. We present a lightweight runtime layer for nested parallelism on cluster-based embedded manycores, integrating our primitives in the OpenMP runtime system, and implementing a new directive to control NUMA-aware nested parallelism mapping. We explore on a set of real application use cases how NUMA makes regular parallel workloads behave as irregular, and how our approach allows to control such effects and achieve up to 28 × speedup versus flat parallelism.  相似文献   

19.
For Non-Uniform Memory Access (NUMA) multiprocessors, memory access overhead is crucial to system performance. Processor scheduling and page placement schemes, dominant factors of memory access overhead, are closely related. In particular, if the processor scheduling scheme is dynamic space-sharing, it should be considered together with the page placement scheme for efficient process execution. Most research in this area, however, has focused exclusively on either the processor scheduling scheme or the page placement scheme alone without considering the interaction between the two. This paper proposes several policies for cluster-based NUMA multiprocessors that are combinations of a processor scheduling scheme and a page placement scheme and investigates the interaction between them. The simulation results show that policies that cooperate to employ the home-cluster concept achieve the best performance. The paper also compares the best of the proposed policies with other existing dynamic processor scheduling policies. Based on our study reported here, the best policy is found to perform better than other existing policies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号