首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
There has been great progress from the traditional allocation algorithms designed for small memories to more modern algorithms exemplified by McKusick's and Karels' allocator (McKusick MK, Karels MJ. Design of a general purpose memory allocator for the 4.3BSD UNIX kernel. In USENIX Conference Proceedings, Berkeley, CA, June 1988). Nonetheless, none of these algorithms have been designed to meet the needs of UNIX kernels supporting commercial data‐processing applications in a shared‐memory multiprocessor environment. On a shared‐memory multiprocessor, memory is a global resource. Therefore, allocator performance depends on synchronization primitives and manipulation of shared data as well as on raw CPU speed. Synchronization primitives and access to shared data depend on system bus interactions. The speed of system buses has not kept pace with that of CPUs, as witnessed by the ever‐larger caches found on recent systems. Thus, the performance of synchronization primitives and of memory allocators that use them have not received the full benefit of increased CPU performance. An earlier paper (McKenney PE, Slingwine J. Efficient kernel memory allocation on shared‐memory multiprocessors. In USENIX Conference Proceedings, Berkeley, CA, February 1993), describes an allocator designed to meet this situation. This article reviews the motivation for and design of the allocator and presents the experience gained during the seven years that the allocator has been in production use. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

2.
马明理  陈刚  董金祥 《计算机测量与控制》2006,14(11):1551-1553,1556
介绍了一种新的多线程内存分配技术(NIXMalloc)的设计和实现,提出了两种高效的分配策略及其自适应调优方法,有效地提高多线程应用程序的内存管理性能;其中Local分配策略对超级块对象Span进行了线程私有化,基于超级块对象为单位的垃圾回收和内存布局调整使多线程性能更优越;Global分配策略采用了自适应调优方法,在动态检测应用程序内存使用情况的基础上进行内存预取和线程缓存限值的动态调整;实验证明NIXMalloc可改善内存管理性能,提高吞吐量,同时降低内存使用量;在多线程应用系统中能获得较好的时空效率。  相似文献   

3.
动态内存分配器是现代应用程序重要组成部分, 它负责管理空闲内存并处理用户内存请求. 现代通用动态内存分配器能够提供较为平衡的性能与内存利用率, 但考虑到不同应用场景的内存使用情况和优化目标不同, 使用通用内存分配器并非最优解. 针对应用场景定制的专用内存分配器通常能够更好地满足系统需要, 然而编写专用内存分配器较为费时, 也容易出错. 开发者通常使用内存分配框架搭建专用动态内存分配器. 然而, 现有的内存分配框架存在抽象能力较差, 组合性与定制性不足的问题. 为此, 从函数式编程视角审视动态内存分配过程, 基于函数可组合性提出了一种可组合的定制化动态内存分配器框架榫卯. 榫卯框架将系统内存分配抽象为多个互不耦合的内存分配层级函数的组合, 这些层级函数能够扩展出策略槽, 以提供更高的定制性和组合性. 榫卯框架基于标准C实现, 依赖C预处理器的元编程特性实现层级函数组合的零性能开销. 开发者能够通过组合与定制分配器的层级函数, 快速构建出适合应用场景的内存分配器. 为了证明榫卯框架的有效性, 使用榫卯框架构建了3种不同的内存分配器实例: tlsfcc, hslab与wfslab, 其中tlsfcc针对多核嵌入式应用场景, 通过替换同步策略优化并发吞吐率; hslab是核心感知的slab式分配器, 通过定制线程缓存优化在异构硬件的性能; wfslab是低延迟的无等待/无锁分配器. 为了评估这3种内存分配器实例, 通过运行基准测试对比现有内存分配器. 实验分别在8核x86/64平台和8核异构aarch64嵌入式平台进行. 实验表明tlsfcc与原始tlsf分配器相比, 在上述两个平台上分别取得了平均1.76和1.59的加速比; 对比hslab与类似架构的tcmalloc, 它在两个平台的平均执行时间仅为tcmalloc的69.6%和85.0%; wfslab则取得了参与实验对比的内存分配器中最小的最差情况内存请求延迟, 其中包括目前最先进的无锁内存分配器mimalloc和snmalloc.  相似文献   

4.
Memory leaks are a continuing problem in the software developed with programming languages, such as C and C++. A recent approach adopted by some researchers is to tolerate leaks in the software application and to reclaim the leaked memory by use of specially constructed memory allocation routines. However, such routines replace the usual general‐purpose memory allocator and tend to be less efficient in speed and in memory utilization. We propose a new scheme which coexists with the existing memory allocation routines and which reclaims memory leaks. Our scheme identifies and reclaims leaked memory at the kernel level. There are some major advantages to our approach: (1) the application software does not need to be modified; (2) the application does not need to be suspended while leaked memory is reclaimed; (3) a remote host can be used to identify the leaked memory, thus minimizing impact on the application program's performance; and (4) our scheme does not degrade the service availability of the application while detecting and reclaiming memory leaks. We have implemented a prototype that works with the GNU C library and with the Linux kernel. Our prototype has been tested and evaluated with various real‐world applications. Our results show that the computational overhead of our approach is around 2% of that incurred by the conventional memory allocator in terms of throughput and average response time. We also verified that the prototype successfully suppressed address space expansion caused by memory leaks when the applications are run on synthetic workloads. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
Modern programming languages often include complex mechanisms for dynamic memory allocation and garbage collection. These features drive the need for more efficient implementation of memory management functions, both in terms of memory usage and execution performance. In this paper, we introduce a software and hardware co-design to improve the speed of the software allocator used in free-BSD systems. The hardware complexity of our design is independent of the dynamic memory size, thus making the allocator suitable for any memory size. Our design improves the performance of memory management intensive benchmarks by as much as 43%. To oar knowledge, this is the first-ever work of this kind, introducing "hybrid memory allocator"  相似文献   

6.
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. Yet, in some applications, programmers use domain-specific knowledge in an attempt to improve the speed or memory utilization of memory allocators. In this paper, we describe a program (CustoMalloc) that synthesizes a memory allocator customized for a specific application. Our experiments show that the synthesized allocators are uniformly faster and more space efficient than the Berkeley UNIX allocator. Constructing a custom allocator requires little programmer effort, usually taking only a few minutes. Experience has shown that the synthesized allocators are not overly sensitive to properties of input sets and the resulting allocators are superior even to domain-specific allocators designed by programmers. Measurements show that synthesized allocators are from two to ten times faster than widely-used allocators.  相似文献   

7.
An opportunistic resource allocation approach is proposed to guarantee both fair resource allocation and high system throughput under combinations of QoS and non-QoS connections in OFDMA networks. This approach features dynamic connection classification and packet prioritization based on real-time network conditions and QoS constraints. A classifier is first employed to prioritize QoS connections by observing the channel state of each subscriber station and the utilization of network resources. It performs a finite-horizon Markov decision process with dynamic rules affected by system load. The transmission order of packets is then determined by an opportunistic multiservice scheduler according to the QoS requirements of connections and the output of the classifier. Having the scheduling result, an allocator assigns slots to the scheduled packets, and its output is linked back to the connection classifier through a resource usage observer for all subscriber stations. The sub-channel allocation problem is also solved by cooperation between the slot allocator and the packet scheduler. Results of numerical analysis and NS2 simulation confirm the advantages claimed above. The same conclusion can also be drawn from the comparison with several existing approaches in terms of system throughput, service successful ratio, average spectral efficiency, and system revenue.  相似文献   

8.
We compare five existing dynamic memory allocators optimized for GPUs and show their strengths and weaknesses. In the measurements, we use three generic evaluation tests proposed in the past and we add one with a real workload, where dynamic memory allocation is used in building the k‐d tree data structure. Following the performance analysis we propose a new dynamic memory allocator and its variants that address the limitations of the existing dynamic memory allocators. The new dynamic memory allocator uses few resources and is targeted towards large and variably sized memory allocations on massively parallel hardware architectures.  相似文献   

9.
Allocation,dereferencing,and freeing of memory data in kernels are coherently linked.There widely exist real cases where the correctness of memory is compromised.This incorrectness in kernel memory brings about significant security issues,e.g.,information leaking.Though memory allocation,dereferencing,and freeing are closely related,previous work failed to realize they are closely related.In this paper,we study the life-cycle of kernel memory,which consists of allocation,dereferencing,and freeing.Errors in them are called memory life-cycle(MLC)bugs.We propose an in-depth study of MLC bugs and implement a memory life-cycle bug sanitizer(MEBS)for MLC bug detection.Utilizing an inter-procedural global call graph and novel identification approaches,MEBS can reveal memory allocation,dereferencing,and freeing sites in kernels.By constructing a modified define-use chain and examining the errors in the life-cycle,MLC bugs can be identified.Moreover,the experimental results on the latest kernels demonstrate that MEBS can effectively detect MLC bugs,and MEBS can be scaled to different kernels.More than 100 new bugs are exposed in Linux and FreeBSD,and 12 common vulnerabilities and exposures(CVE)are assigned.  相似文献   

10.
Dynamic storage allocation is an important part of a large class of computer programs written in C and C + +. High-performance algorithms for dynamic storage allocation have been, and will continue to be, of considerable interest. This paper presents detailed measurements of the cost of dynamic storage allocation in 11 diverse C and C + + programs using five very different dynamic storage allocation implementations, including a conservative garbage collection algorithm. Four of the allocator implementations measured are publicly available on the Internet. A number of the programs used in these measurements are also available on the Internet to facilitate further research in dynamic storage allocation. Finally, the data presented in this paper is an abbreviated version of more extensive statistics that are also publicly available on the Internet.  相似文献   

11.
Dynamic memory allocation often makes up a large part of program execution time. Different variants of the best-fit allocator are implemented and their space and time costs measured and compared. We found variants of this algorithm that are 3–33% faster than the Doug Lea 2.7.0 allocator.  相似文献   

12.
This paper describes the design criteria and implementation details of a dynamic storage allocator for real‐time systems. The main requirements that have to be considered when designing a new allocator are concerned with temporal and spatial constraints. The proposed algorithm, called TLSF (two‐level segregated fit), has an asymptotic constant cost, O(1), maintaining a fast response time (less than 200 processor instructions on a x86 processor) and a low level of memory usage (low fragmentation). TLSF uses two levels of segregated lists to arrange free memory blocks and an incomplete search policy. This policy is implemented with word‐size bitmaps and logical processor instructions. Therefore, TLSF can be categorized as a good‐fit allocator. The incomplete search policy is shown also to be a good policy in terms of fragmentation. The fragmentation caused by TLSF is slightly smaller (better) than that caused by best fit (which is one of the best allocators regarding memory fragmentation). In order to evaluate the proposed allocator, three analyses are presented in this paper. The first one is based on worst‐case scenarios. The second one provides a detailed consideration of the execution cost of the internal operations of the allocator and its fragmentation. The third analysis is a comparison with other well‐known allocators from the temporal (number of cycles and processor instructions) and spatial (fragmentation) points of view. In order to compare them, a task model has been presented. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

13.
非易失性内存以其卓越的特性被视作具有巨大潜力的下一代存储设备。然而,非易失性存储单元存在写耐受度低的缺点,使其难以承受频繁的小粒度数据更新操作。文中针对非易失性存储器,提出带磨损均衡的小粒度内存分配管理系统(IWMM)。IWMM将单个内存页分割为多个基本存储单元以适应小粒度的内存分配和数据更新操作。IWMM采用定向顺序分配算法轮流地使用单个内存页中的基本存储单元,从而将小粒度写操作均衡地分布到内存页内的各个存储单元中。实验表明,对比同样致力于磨损均衡的小粒度内存管理系统NVMalloc,IWMM能将内存页的写次数降低52.6%;同时,在内存回收率高于50%的应用场景中,性能比glibc malloc高27.6%。  相似文献   

14.
MULTIFIT-COM, a static task allocator which could be incorporated into an automated compiler/linker/loader for distributed processing systems, is presented. The allocator uses performance information for the processes making up the system in order to determine an appropriate mapping of tasks onto processors. It uses several heuristic extensions of the MULTIFIT bin-packing algorithm to find an allocation that will offer a high system throughput, taking into account the expected execution and interprocessor communication requirements of the software on the given hardware architecture. Throughput is evaluated by an asymptotic bound for saturated conditions and under an assumption that only processing resources are required. A set of options are proposed for each of the allocator's major steps. An evaluation was made on 680 small randomly generated examples. Using all the search options, an average performance difference of just over 1% was obtained. Using a carefully chosen small subset of only four options, a further degradation of just over 1.5% was obtained. The allocator is also applied to a digital signal processing system consisting of 119 tasks to illustrate its clustering and load balancing properties on a large system  相似文献   

15.
Neil Burroughs 《Software》2016,46(11):1499-1523
The primary goal of the register allocation phase in a compiler is to minimize register spills to memory. Spill decisions by the allocator are often made based on the costs of spilling a virtual register and, therefore, on an assumed placement of spill instructions. However, because most allocators make these decisions incrementally, placement opportunities can change as allocation proceeds, calling into question the basis for the original spill decision. An alternative heuristic to placement costs for spill decisions focuses on where program execution will lead. Spilling the virtual register with the Furthest Next Use is known to lead to the minimum number of loads under certain conditions in straight‐line code. While it has been implemented in register allocation in different forms, none of these implementations fully exploits profiling information. We present a register allocator that can adapt to improved profiling information, using branch probabilities to compute an Expected Distance to Next Use for making spill decisions and block frequency information to optimize post‐allocation spill instruction placement. Spill placement is optimized after allocation using a novel method for minimizing spill instruction costs on the control flow graph. Our evaluation of the allocator compared with LLVM recognizes more than 36% and 50% reductions, on average, in the number of dynamically executed store and load instructions, respectively, when using statically derived profiling information. When using dynamically gathered profiling, these improvements increase to 50% and 60% reductions, on average, for stores and loads, respectively. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
Efficient, scalable memory allocation for multithreaded applications on multiprocessors is a significant goal of recent research. In the distributed computing literature it has been emphasized that lock-based synchronization and concurrency-control may limit the parallelism in multiprocessor systems. Thus, system services that employ such methods can hinder reaching the full potential of these systems. A natural research question is the pertinence and the impact of lock-free concurrency control in key services for multiprocessors, such as in the memory allocation service, which is the theme of this work. We show the design and implementation of NBmalloc, a lock-free memory allocator designed to enhance the parallelism in the system. The architecture of NBmalloc is inspired by Hoard, a well-known concurrent memory allocator, with modular design that preserves scalability and helps avoiding false-sharing and heap-blowup. Within our effort to design appropriate lock-free algorithms for NBmalloc, we propose and show a lock-free implementation of a new data structure, flat-set, supporting conventional “internal” set operations as well as “inter-object” operations, for moving items between flat-sets. The design of NBmalloc also involved a series of other algorithmic problems, which are discussed in the paper. Further, we present the implementation of NBmalloc and a study of its behaviour in a set of multiprocessor systems. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved.  相似文献   

17.
For the last 30 years, a large variety of memory allocators have been proposed. Since performance, memory usage and energy consumption of each memory allocator differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. To this end, custom allocators are developed from scratch, which is a difficult and error-prone process. This issue has special impact in the field of portable consumer embedded systems, that must execute a limited amount of multimedia applications, demanding high performance and extensive memory usage at a low energy consumption. This paper presents a flexible and efficient simulator to study Dynamic Memory Managers (DMMs), a composition of one or more memory allocators. This novel approach allows programmers to simulate custom and general DMMs, which can be composed without incurring any additional runtime overhead or additional programming cost. We show that this infrastructure simplifies DMM construction, mainly because the target application does not need to be compiled every time a new DMM must be evaluated and because we propose a structured method to search and build DMMs in an object-oriented fashion. Within a search procedure, the system designer can choose the “best” allocator by simulation for a particular target application and embedded system. In our evaluation, we show that our scheme delivers better performance, less memory usage and less energy consumption than single memory allocators.  相似文献   

18.
Memory fragmentation is a serious obstacle preventing efficient memory usage. Garbage collectors may solve the problem; however, they cause serious performance impact, memory and energy consumption. Therefore, various memory allocators have been developed. Software developers must test memory allocators, and find an efficient one for their programs. Instead of this cumbersome method, we propose a novel approach for dynamically deciding the best memory allocator for every application. The proposed solution tests each process with various memory allocators. After the testing, it selects an efficient memory allocator according to condition of operating system (OS). If OS runs out of memory, then it selects the most memory efficient allocator for new processes. If most of the CPU power was occupied, then it selects the fastest allocator. Otherwise, the balanced allocator is selected. According to test results, the proposed solution offers up to 58% less fragmented memory, and 90% faster memory operations. In average of 107 processes, it offers 7.16±2.53% less fragmented memory, and 1.79±7.32% faster memory operations. The test results also prove the proposed approach is unbeatable by any memory allocator. In conclusion, the proposed method is a dynamic and efficient solution to the memory fragmentation problem.  相似文献   

19.
Dynamic memory allocation has been used for decades. However, it has seldom been used in real-time systems since the worst case of spatial and temporal requirements for allocation and deallocation operations is either unbounded or bounded but with a very large bound. In this paper, a new allocator called TLSF (Two Level Segregated Fit) is presented. TLSF is designed and implemented to accommodate real-time constraints. The proposed allocator exhibits time-bounded behaviour, O(1), and maintains a very good execution time. This paper describes in detail the data structures and functions provided by TLSF. We also compare TLSF with a representative set of allocators regarding their temporal cost and fragmentation. Although the paper is mainly focused on timing analysis, a brief study and comparative analysis of fragmentation incurred by the allocators has been also included in order to provide a global view of the behaviour of the allocators. The temporal and spatial results showed that TLSF is also a fast allocator and produces a fragmentation close to that caused by the best existing allocators.
Alfons Crespo (Corresponding author)Email:
  相似文献   

20.
Torquati  M.  Mencagli  G.  Drocco  M.  Aldinucci  M.  De Matteis  T.  Danelutto  M. 《The Journal of supercomputing》2019,75(8):4114-4131

This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper, we consider two streaming parallel patterns and we discuss different implementation variants related to how dynamic memory is managed. The results show that the standard mechanisms provided by modern C++ are not entirely adequate for maximizing the performance. Instead, the combined use of an efficient general purpose memory allocator, a custom allocator optimized for the pattern considered and a custom variant of the C++ shared pointer mechanism, provides a performance improvement up to 16% on the best case.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号