首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dynamic page placement policies for NUMA (nonuniform memory access time) shared-memory architectures are explored using two approaches that complement each other in important ways. The authors measure the performance of parallel programs running on the experimental DUnX operating system kernel for the BBN GP1000, which supports a highly parameterized dynamic page placement policy. They also develop and apply an analytic model of memory system performance of a local/remote NUMA architecture based on approximate mean-value analysis techniques. The model is validated against experimental data obtained with DUnX while running a synthetic workload. The results of this validation show that, in general, model predictions are quite good. Experiments investigating the effectiveness of dynamic page-placement and, in particular, dynamic multiple-copy page placement the cost of replication/coherency fault errors, and the cost of errors in deciding whether a page should move or be remotely referenced are described  相似文献   

2.
聂朝恩  高荣芳 《计算机应用》2007,27(8):1858-1861
设计并实现了一种Linux平台上基于包过滤的网络流量采集系统PFC。PFC系统主要通过在内核空间实现数据包的过滤、合并,以及实现了用户空间和内核空间的内存共享,从而突破了传统上基于包过滤网络流量采集系统的性能瓶颈。  相似文献   

3.
Nitzan Niv  Assaf Schuster 《Software》2001,31(15):1439-1459
In this paper we propose a mechanism that provides distributed shared memory (DSM) systems with a flexible sharing granularity. The size of the shared memory units is dynamically determined by the system during runtime. This size can range from that of a single variable up to the size of the entire shared memory space. During runtime, the DSM transparently adapts the granularity to the memory access pattern of the application in each phase of its execution. This adaptation, called ComposedView, provides efficient data sharing in software DSM while preserving sequential consistency. Neither complex code analysis nor annotation by the programmer or the compiler are required. Our experiments indicate a substantial performance boost (up to 80% speed‐up improvement) when running a large set of applications using our method, compared to running these benchmark applications with the best fixed granularity. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

4.
The performance of network hosts can be severely degraded when subjected to heavy traffic of today’s Gigabit networks. This degradation occurs as a result of the interrupt overhead associated with the high rate of packet arrivals. NAPI, a packet reception mechanism integrated into the latest version of Linux networking subsystem, was designed to improve Linux performance to suit today’s Gigabit traffic. NAPI is definitely a major step up from earlier reception mechanisms; however, NAPI has shortcomings and its performance can be further enhanced. A hybrid interrupt-handling scheme, which was recently proposed in Salah et al. [K. Salah, K. El-Badawi, F. Haidari, Performance Analysis and Comparison of Interrupt-Handling Schemes in Gigabit Networks, International Journal of Computer Communications, Elsevier, Amsterdam 30 (17) (2007) 3425–3441], can better improve the performance of Gigabit network hosts. The hybrid scheme switches between interrupt disabling–enabling (DE) and polling (NAPI). In this paper, we present and discuss major changes required to implement such a hybrid scheme in the latest version of Linux kernel 2.6.15. We prove experimentally that the hybrid scheme can significantly improve the performance of general-purpose network desktops or servers running network I/O-bound applications, when subjecting such network hosts to both light and heavy traffic load conditions. The performance is measured and analyzed in terms of throughput, packet loss, latency, and CPU availability.  相似文献   

5.
共享内存操作系统使用精心设计的锁来保护各种共享数据,对这些数据的访问需要首先获得对应的锁,当内核中同时有多个流程(系统调用、内核线程或中断处理程序等)试图获得同一个锁时会产生竞争,相关流程越多竞争就越激烈.随着系统中处理单元数目的增长,这些流程的数量也在不断增加,此时,对锁的竞争会影响系统的整体性能,甚至成为瓶颈.另一方面,操作系统与应用程序在同一处理器核上交替运行,因为硬件cache容量有限,导致操作系统的代码和数据经常替换掉应用程序的代码和数据.当应用程序重新被调度运行时,需从更慢速的cache,甚至从内存中读取这些代码和数据,从而降低了性能.通过在一台16核AMD节点上的相关测试,以上问题得到了量化验证,并针对这些问题提出了一种异构操作系统模型.在此模型下,应用程序和操作系统分别运行在不同的处理器核上,实验显示这种模式可以有效降低对锁的竞争和对cache的污染.  相似文献   

6.
Container-based virtualization is becoming increasingly popular in cloud computing due to its efficiency and flexibility. Resource isolation is a fundamental property of containers. Existing works have indicated weak resource isolation could cause significant performance degradation for containerized applications and enhanced resource isolation. However, current studies have almost not discussed the isolation problems of page cache which is a key resource for containers. Containers leverage memory cgroup to control page cache usage. Unfortunately, existing policy introduces two major problems in a container-based environment. First, containers can utilize more memory than limited by their cgroup, effectively breaking memory isolation. Second, the OS kernel has to evict page cache to make space for newly-arrived memory requests, slowing down containerized applications. This paper performs an empirical study of these problems and demonstrates the performance impacts on containerized applications. Then we propose pCache (precise control of page cache) to address the problems by dividing page cache into private and shared and controlling both kinds of page cache separately and precisely. To do so, pCache leverages two new technologies: fair account (f-account) and evict on demand (EoD). F-account splits the shared page cache charging based on per-container share to prevent containers from using memory for free, enhancing memory isolation. And EoD reduces unnecessary page cache evictions to avoid the performance impacts. The evaluation results demonstrate that our system can effectively enhance memory isolation for containers and achieve substantial performance improvement over the original page cache management policy.  相似文献   

7.
针对目前网络分析工具的报文捕获机制及优缺点,实现了一种新的专用于报文捕获的网络协议簇—PF_ZEROCOPY。该协议簇基于零拷贝思想,借助内存共享技术,将网络报文直接DMA传输到用户空间的缓存区,绕开Linux网络协议栈、减少了内存拷贝次数;使用DMA缓存描述符环,实现了网卡和用户程序无冲突访问共享缓冲区;通过封装成内核网络协议簇,PF_ZEROCOPY具有易于应用和移植的特点。实验结果分析表明,该方法对随机长度的报文捕获速率可达900Mb/s以上,与libpcap相比较为明显地改善了报文捕获的能力。  相似文献   

8.
RTAI下的网络报文捕获平台   总被引:2,自引:0,他引:2       下载免费PDF全文
段辰生  杨昌昊  褚伟 《计算机工程》2009,35(20):160-162
针对传统报文捕获机制中内存拷贝冗余、中断处理及上下文切换频繁等弊端,提出在用户空间控制报文传输的思想,采用实时Linux内核及零拷贝思想对网卡驱动进行改造。捕包平台在捕获64 Byte及1 500 Byte的报文时,吞吐量分别达到了473 Mb/s和943 Mb/s,实验结果证明,与传统报文捕获平台相比,新平台的性能有显著提高。  相似文献   

9.
频繁中断响应、冗余的数据拷贝和上下文切换等是影响网络数据包捕获性能的主要因素。为了减少这些因素的影响,提出将PF_RING与NAPI结合应用到捕包过程,以对性能进行整体优化。比较了PF_RING与传统数据包捕获机制的差异,分析了两者结合的优势,搭建实验平台,采用内核发包形式,进行实验仿真。在仿真实验中,从捕包率和处理效率与传统方式进行比较,分析实验数据得出该方法可以有效地提高捕包性能。  相似文献   

10.
The comparative performance is studied of different message passing system designs experimentally on a shared memory Encore Multimax multiprocessor. The systems are measured both by benchmarks and by running example parallel applications. To act as a control, the shared memory machine results are compared with the performance of the benchmarks and applications on the Intel iPSC/2 running the NX/2 operating system. The design alternatives considered are buffering, buffer organization, reference and value semantics, synchronization, coordination strategy and the location of the system in user or kernel space. The results include measurements of the effects of the design alternatives, memory caching, message sizes and copying  相似文献   

11.
随着通用多核技术的发展,多核处理器在网络核心设备能够提供较高的I/O吞吐能力,并支持更复杂的网络处理,为网络处理与转发带来了前所未有的灵活性和普适性.但是,在核心网络中,多核处理器的处理与转发性能仍然面临着巨大的挑战.首先,随着网络带宽的不断提升,多核处理器需要提供越来越高的处理能力.其次,随着网络业务复杂度的提高,在网络报文处理与转发中,应用对报文的处理开销越来越大,对设备的I/O裸转发能力提出了更高的要求.提出了一种硬件缓冲区管理机制Self-Described Buffer(SDB),具有硬件低开销、软件高性能等优点.基于SDB的机制,设计并实现了一个通用的网络处理开发环境NPDK.NPDK采用零中断、零拷贝方式,提供内核与用户驱动,适用于通用多核CPU系统.不但具有简单、灵活、易开发等特点,而且在内核态下支持用户对每个CPU核上进行异构报文处理编程和在用户态下支持独占式多线程编程与共享式多进程并行编程.测试结果表明,基于SDB的网络处理开发环境在10G速率报文I/O转发性能接近线速,特别是64字节小报文可达到7.49Gbps.目前,NPDK已经在click路由器、OpenFlow交换机和网络探针等方面得到应用.  相似文献   

12.
李鹏  王雷 《计算机工程》2006,32(4):58-60
分布式共享存储系统在分布式存储器的基础上构造逻辑上的共享存储模型。提出了在操作系统层实现分布式共享存储的系统框架,并以Linux操作系统为平台介绍了其实现。该系统提供简单的调用接口,并与Linux内存管理框架紧密结合。通过采用合适的DSM一致性协议提高了整体性能。  相似文献   

13.
高性能路由器操作系统HEROS的设计与实现   总被引:1,自引:2,他引:1  
实时分布式操作系统是高性能分布式路由器的控制核心 ,为了保证路由器系统的整体性能和安全性 ,本文设计并实现了实时分布式操作系统 HEROS(Highly Efficient Router Operating System) .HEROS基于微内核体系结构 ,其多任务内核实现了基于优先级的抢先式调度 ,高效率的任务间同步和通信原语 ,实时的中断处理和高效的内存管理机制 .为了更好地服务于分布式路由器体系结构 ,HEROS基于 Compact PCI总线实现了一种分布式通信机制和面向网络协议的高性能的缓冲管理机制 .目前 ,基于 HEROS的高性能安全路由器原型系统已经设计完成  相似文献   

14.
Song  Xiaodong   《Performance Evaluation》2005,60(1-4):5-29
Most computer systems use a global page replacement policy based on the LRU principle to approximately select a Least Recently Used page for a replacement in the entire user memory space. During execution interactions, a memory page can be marked as LRU even when its program is conducting page faults. We define the LRU pages under such a condition as false LRU pages because these LRU pages are not produced by program memory reference delays, which is inconsistent with the LRU principle. False LRU pages can significantly increase page faults, even cause system thrashing. This poses a more serious risk in a large parallel systems with distributed memories because of the existence of coordination among processes running on individual node. In the case, the process thrashing in a single node or a small number of nodes could severely affect other nodes running coordinating processes, even crash the whole system. In this paper, we focus on how to improve the page replacement algorithm running on one node.

After a careful study on characterizing the memory usage and the thrashing behaviors in the multi-programming system using LRU replacement. we propose an LRU replacement alternative, called token-ordered LRU, to eliminate or reduce the unnecessary page faults by effectively ordering and scheduling memory space allocations. Compared with traditional thrashing protection mechanisms such as load control, our policy allows more processes to keep running to support synchronous distributed process computing. We have implemented the token-ordered LRU algorithm in a Linux kernel to show its effectiveness.  相似文献   


15.
《Parallel Computing》1988,6(2):235-245
We investigate the use of a hypercube packet switching network as a shared memory server for vector multiprocessors. Using the generalization of a high performance switch node introduced in an earlier paper, we develop a packet switched memory server capable of providing high bandwidth vector access to a shared memory. The network exhibits adaptive behavior, absorbing conflicts as a vector operation proceeds, and delivers full vector bandwidth to all processors simultaneously.In addition to its vector performance, the hypercube has another feature that makes it attractive as a shared memory server. The memory words are not equidistant from the processors. A hierarchy of distances occurs. By taking advantage of this one can provide segments of fast access memory within the global shared memory environment. This makes the shared memory hypercube very promising as a general purpose parallel computer.  相似文献   

16.
Live migration of virtual machines has been a powerful tool to facilitate system maintenance, load balancing, fault tolerance, and power-saving, especially in clusters or data centers. Although pre-copy is extensively used to migrate memory data of virtual machines, it cannot provide quick migration with low network overhead but leads to large performance degradation of virtual machine services due to the great amount of transferred data during migration. To solve the problem, this paper presents the design and implementation of a novel memory-compression-based VM migration approach (MECOM for short) that uses memory compression to provide fast, stable virtual machine migration, while guaranteeing the virtual machine services to be slightly affected. Based on memory page characteristics, we design an adaptive zero-aware compression algorithm for balancing the performance and the cost of virtual machine migration. Using the proposed scheme pages are rapidly compressed in batches on the source and exactly recovered on the target. Experimental results demonstrate that compared with Xen, our system can significantly reduce downtime, total migration time, and total transferred data by 27.1%, 32%, and 68.8% respectively.  相似文献   

17.
The availability of low cost, high performance microprocessors has led to various designs of shared memory multiprocessor systems. As a result, commercial products which are based on shared memory have been proliferated. Such a multiprocessor system is heavily influenced by the structure of memory system and it is not difficult to find that most configurations include local cache memories. The more processors a system carries, the larger local cache memory is needed to maintain the traffic to and from the shared memory at reasonable level. The implementation of local cache memories, however, is not a simple task because of environmental limitations. In particular, the general lack of board space availability presents a formidable problem. A cache memory system usually needs space mostly to support its complex control logic circuits for the cache itself and network interfaces like snooping logic circuits for shared bus. Although packaging can be made denser to reduce system size, there are still multiple processors per board. It requires a more area-efficient cache memory architecture. This paper presents a design of shared cache for dual processor board of bus-based symmetric multiprocessors. The design and implementation issues are described first and then the evaluation and measurement results are discussed. The shared cache proposed in this paper has been determined to be quite area-efficient without the significant loss of throughput and scalability. It has been implemented as a plug-in unit for TICOM, a prevalent commercial multiprocessor system.  相似文献   

18.
Shared memory is a simple yet powerful paradigm for structuring systems. Recently, there has been an interest in extending this paradigm to non-shared memory architectures as well. For example, the virtual address spaces for all objects in a distributed object-based system could be viewed as constituting a global distributed shared memory. We propose a set of primitives for managing distributed shared memory. We present an implementation of these primitives in the context of an object-based operating system as well as on top of Unix.  相似文献   

19.
分布式网络化控制系统故障诊断方法的研究   总被引:1,自引:0,他引:1  
为了避免分布式网络化控制系统由于网络负载过大而造成的时延和丢包问题,引入基于包的传输机制,并针对此特殊结构设计中心故障诊断单元,在诊断单元设计中,由于各子系统工作周期不同以及传输包的大小不同,将系统看作多速率采样系统,采用提升技术获得系统的离散模型.在此基础上设计状态观测器,实现分布式网络化控制系统的故障诊断.最后通过仿真验证了所提出方法的有效性.  相似文献   

20.
任建宝  齐勇  戴月华  王晓光  宣宇  史椸 《软件学报》2015,26(8):2124-2137
操作系统漏洞经常被攻击者利用,从而以内核权限执行任意代码(返回用户态攻击,ret2user)以及窃取用户隐私数据.使用虚拟机监控器构建了一个对操作系统及应用程序透明的内存访问审查机制,提出了一种低性能开销并且无法被绕过的内存页面使用信息实时跟踪策略;结合安全加载器,保证了动态链接库以及应用程序的代码完整性.能够确保即使操作系统内核被攻击,应用程序的内存隐私数据依然无法被窃取.在Linux操作系统上进行了原型实现及验证,实验结果表明,该隐私保护机制对大多数应用只带来6%~10%的性能负载.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号