期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

许健于鸿洋《电子技术应用》2012,38(11):146-149

对内存池中内存块获取、分配机制、内存块大小、内存释放,以及在多线程环境下的安全处理等细节进行了研究,保证了在多线程环境下能够快速同时采用一种基于数组的链表机制,改进内存池中内存块的查找算法,将其时间复杂度稳定在O(1),避免了传统内存池中请求的线程数目过多时,引发的获取内存块性能下降的问题。同时在内部设置管理线程,动态增加或删除空闲的内存块。实验结果表明,改进后的内存池与传统的内存分配方式相比消耗更小,效率更好。相似文献

2.

供动态无锁数据结构使用的资源窃取型无锁内存池

刘恒杨小帆《计算机应用研究》2012,29(10):3772-3775

动态内存管理的问题对无锁动态数据结构的性能尤为关键,因为多线程环境下的动态内存管理涉及开销较高的同步操作。提出一种构建用于动态无锁数据结构的内存池的方法来减少动态内存使用和与之相伴的动态内存管理开销。该方法通过平衡线程的动态内存消耗来减小内存开销,利用本方法构建的内存池基于线程私有的支持节点窃取的无锁循环队列。本方法具有以下优点:a)用本方法构建的内存池是无锁的;b)能够平衡线程的堆内存消耗;c)可以方便地与动态无锁数据结构集成。实验结果显示,用该方法构造的资源窃取型内存池扩展性较强,且能够在高负载下有效降低无锁数据结构的堆内存消耗和操作执行时间;平衡算法在很大程度上决定内存消耗量,内存池在高负载下的扩展性也受到它所用的数据结构自身多线程访问性能的影响。相似文献

3.

SpMT WaveCache:开发数据流计算机中的推测多线程 总被引：1，自引：0，他引：1

裴颂文吴百锋《计算机学报》2009,32(7)

推测多线程技术(Speculative Multithreading,SpMT)是通过推测地执行多个线程来开发线程级并行性,提高超标量处理器性能.通过增加额外的硬件单元,比如线程同步单元(Thread Synchronization Unit,TSU)、线程上下文表(Thread Context Table,TCT)和线程内存历史表(Thread Memory History,TMH),扩展了事务性内存系统,提高了基于波标量指令集系统结构(WaveScalar ISA)实现的WaveCache模拟器的性能.同时,还提出了一种新的两级线程级事务提交机制.最后,采用了6个来自SPEC、Media和Mibench测试程序集的真实测试程序.评估了推测多线程WaveCache(SpMT WaveCaehe)的性能.实验表明,SpMT WaveCache比超标量系统结构提高了2～3倍的性能,是一种有效的开发动态数据流计算机性能的方法. 相似文献

4.

基于预测原理的嵌入式内存分配算法设计

程小辉何军权梁启亮黄佳欢顾俊杰《计算机工程与设计》2014,35(9)

针对嵌入式系统中内存管理存在的实时性与碎片率较大等问题,分别从时间和空间角度对其进行分析,采用基于预测分配和合并分配原理的预测合并分配机制.从时间上,利用预测线程预测下一次申请内存的大小,提前分配以减少系统等待内存创建的时间;将2次申请的内存块合并成一大块,以大块为申请单位申请内存块,以降低内存块被多次分割导致的内部碎片.μC/OS-Ⅱ平台对比实验结果表明,改进后的预测合并内存分配算法能有效从时间和内存碎片率方面提高系统的整体性能. 相似文献

5.

一种用于交互型CAD的内存管理系统设计

王伟《微计算机信息》2007,23(16):232-234

交互型CAD系统得频繁的分配与释放内存。频繁的内存分配与释放是降低应用程序性能的重要原因。应用程序以一种默认的方式使用内存,并为不需要的功能而遭受性能的损失。我们开发了一种专用的内存管理系统,改变用来容纳对象的那块内存的分配行为,较好的解决了这个问题,显著的提高了交互型CAD的效率。相似文献

6.

一种动态等尺寸内存管理算法的改进

林川吴景东《单片机与嵌入式系统应用》2008,(1):14-15

提出一种对动态等尺寸内存管理算法的改进方法。改进的算法取消了用链表连接空闲内存块的做法,采用内存分配表的办法,从而实现了将控制信息从用户的内存块中分离,使内存管理更加安全可靠。相似文献

7.

一种结合页分配和组调度的内存功耗优化方法

贾刚勇万健李曦蒋从锋代栋《软件学报》2014,25(7):1403-1415

多核系统中,内存子系统消耗大量的能耗并且比例还会继续增大.因此,解决内存的功耗问题成为系统功耗优化的关键.根据线程的内存地址空间和负载均衡策略将系统中的线程划分成不同的线程组,根据线程所属的组,给同一组内的线程分配相同内存rank中的物理页,然后,根据划分的线程组以组为单位进行调度.提出了结合页分配和组调度的内存功耗优化方法（CAS）.CAS周期性地激活当前需要的内存rank,从而可以将暂时不使用的内存rank置为低功耗状态,同时延长低功耗内存rank的空闲时间.仿真实验结果显示：与其他同类方法相比,CAS方法能够平均降低10%的内存功耗,同时提高8%的性能. 相似文献

8.

大型3D场景漫游系统内存管理 总被引：1，自引：0，他引：1

肖康刘福岩《计算机工程与设计》2010,31(10)

在大型3D场景漫游系统中,单个资源(如模型、纹理)所需内存较大且分配和释放频繁,为了防止内存碎片的产生并提高内存分配速度,提出了一种新型内存管理方法.根据程序需求首先划分出一块或多块大的虚拟内存区域,然后基于所划分的内存区域进行内存分配和回收管理.在该管理方法中,对于程序中的小资源,使用内存池;对于大的资源,则使用伙伴系统内存管理方法.实验结果表明,该内存管理方法高效且稳定. 相似文献

9.

基于马尔可夫链的嵌入式内存预测分配算法

程小辉龚幼民许安明《计算机工程与设计》2013,34(8)

为了提高嵌入式系统内存动态分配效率,在分析经典内存分配算法和马尔可夫链预测原理的基础上,提出了一种嵌入式系统内存预测分配算法.该算法融合聚类分析法,利用内存分配的转移量统计信息及其概率矩阵对嵌入式系统内存动态分配进行预测.在实现中采用轻量级预测线程预测下一次申请的内存块大小,减少内存动态分配时等待内存创建的时间.通过增加预测线程的μC/OS-Ⅱ系统和未增加预测线程的μC/OS-Ⅱ系统进行对比实验,实验结果表明了该算法的可行性和高效性. 相似文献

10.

多线程技术基于VB．NET的实现

欧广宇邓桂英《微机发展》2004,14(11):101-103

多线程技术是实现需要并发执行的应用程序的较好选择,具有不可替代的作用。文中介绍了进程、线程以及应用程序域的概念及其相互之间的关系,讨论了VisualBasic.NET对多线程的支持。并从线程的创建与管理、线程取消、线程的优先级、线程的状态、线程池、线程的同步等方面展示了如何使用多线程编程技术来创建多线程应用程序。每个线程都需要资源,创建的线程过多,反而会降低应用程序的性能。在设计多线程应用程序时,应慎重对待,建立合理的系统模型,才能使应用程序获得最佳的性能。相似文献

11.

Architectural support for thread communications in multi-core processors

Sevin Varoglu Stephen Jenks 《Parallel Computing》2011,37(1):26-41

In the ongoing quest for greater computational power, efficiently exploiting parallelism is of paramount importance. Architectural trends have shifted from improving single-threaded application performance, often achieved through instruction level parallelism (ILP), to improving multithreaded application performance by supporting thread level parallelism (TLP). Thus, multi-core processors incorporating two or more cores on a single die have become ubiquitous. To achieve concurrent execution on multi-core processors, applications must be explicitly restructured to exploit parallelism, either by programmers or compilers. However, multithreaded parallel programming may introduce overhead due to communications among threads. Though some resources are shared among processor cores, current multi-core processors provide no explicit communications support for multithreaded applications that takes advantage of the proximity between cores. Currently, inter-core communications depend on cache coherence, resulting in demand-based cache line transfers with their inherent latency and overhead. In this paper, we explore two approaches to improve communications support for multithreaded applications. Prepushing is a software controlled data forwarding technique that sends data to destination’s cache before it is needed, eliminating cache misses in the destination’s cache as well as reducing the coherence traffic on the bus. Software Controlled Eviction (SCE) improves thread communications by placing shared data in shared caches so that it can be found in a much closer location than remote caches or main memory. Simulation results show significant performance improvement with the addition of these architecture optimizations to multi-core processors. 相似文献

12.

Helper threads via virtual multithreading

《Micro, IEEE》2004,24(6):74-82

Memory latency dominates the performance of many applications on modern processors, despite advances in caches and prefetching techniques. Numerous prefetching techniques, both in hardware and software, try to alleviate the memory bottleneck. One such technique, known as helper threading improves single-thread performance on a simultaneous multithreaded architecture (SMT), which shares processor resources, including caches, among logical threads. It uses otherwise idle hardware thread contexts to execute speculative threads on behalf of the main thread. Helper threading accelerates a program by exploiting a processor's multithreading capability to run assist threads. Based on the helper threading usage model, virtual multithreading (VMT), a form of switch-on-event user-level multithreading, can improve performance for real-world workloads with a wall-clock speedup of 5.0 to 38.5 percent 相似文献

13.

Chip Multithreaded Consistency Model

下载免费PDF全文

Zu-Song Li Dan-Dan Huan Wei-Wu Hu and Zhi-Min Tang 《计算机科学技术学报》2008,23(2):298-ver

Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread. 相似文献

14.

Adaptive thread mapping strategies for transactional memory applications

Márcio Castro Luís Fabrício W. Góes Jean-François Méhaut 《Journal of Parallel and Distributed Computing》2014

Transactional Memory (TM) is a programmer friendly alternative to traditional lock-based concurrency. Although it intends to simplify concurrent programming, the performance of the applications still relies on how frequent they synchronize and the way they access shared data. These aspects must be taken into consideration if one intends to exploit the full potential of modern multicore platforms. Since these platforms feature complex memory hierarchies composed of different levels of cache, applications may suffer from memory latencies and bandwidth problems if threads are not properly placed on cores. An interesting approach to efficiently exploit the memory hierarchy is called thread mapping. However, a single fixed thread mapping cannot deliver the best performance when dealing with a large range of transactional workloads, TM systems and platforms. In this article, we propose and implement in a TM system a set of adaptive thread mapping strategies for TM applications to tackle this problem. They range from simple strategies that do not require any prior knowledge to strategies based on Machine Learning techniques. Taking the Linux default strategy as baseline, we achieved performance improvements of up to 64.4% on a set of synthetic applications and an overall performance improvement of up to 16.5% on the standard STAMP benchmark suite. 相似文献

15.

一种分布式共享存储系统的线程分配算法 总被引：3，自引：0，他引：3

刘轶郑守淇钱德沛《计算机研究与发展》2000,37(5):521-526

讨论了软件实现了多线程ＤＳＭ的通信开销和线程分配问题,给出了一种基于线程关系图的调度模型,并在此基础上提出了一种基于迭代的线程分配算法,通过大量的线程关系图对算法进行了评价,并且在一个软件ＤＳＭ系统中实现了该算法,同时给出了算法的评价结果和应用程序的性能数据。相似文献

16.

基于CMP的指针数据预取方法

下载免费PDF全文

朱会东黄永雨宋宝卫《计算机工程》2011,37(6):71-73

针对现代计算机系统中的存储墙问题,提出一种适合于链式数据结构的数据预取方法——纯遍历推送方法。采用基于共享高速缓存的多核处理器平台CMP上的多线程技术,在主程序运行时分离出一个推送线程,由其将主线程需要的数据提前预取至处理器共享高速缓存中以隐藏主线程的存储器延迟。实验结果证明该方法在CMP架构下对以链式结构为主的内存受限程序的性能有一定的改进。相似文献

17.

Fast and flexible tracepoints in x86

Christian Harper-Cyr Michel R. Dagenais Ahmad S. Bushehri 《Software》2019,49(12):1712-1727

Tracing is often the most effective technique for analyzing the performance of complex multithreaded applications. This paper presents an improvement on existing techniques for dynamic tracepoint insertion. To add a tracepoint, the technique inserts a jump at the tracing point, possibly replacing several shorter instructions. This jump embeds trap instructions inside its offset at the address of every replaced instruction. This makes the jump thread safe if any thread is about to execute a replaced instruction. It also makes it jump safe if a jump landing pad is at one of the replaced instructions. In both cases, a trap will be raised, and the thread can be redirected to the out-of-line equivalent instruction. The use of a jump instead of a trap to execute the tracepoint improves the performance of the execution. It also adds the flexibility to place the tracepoint at almost any instruction, since multiple instructions can be replaced atomically and safely. The downside of this technique is the increased memory usage, since it requires unaligned allocations with high external fragmentation. 相似文献

18.

Isolating bugs in multithreaded programs using execution suppression

Dennis Jeffrey Yan Wang Chen Tian Rajiv Gupta 《Software》2011,41(11):1259-1288

Memory‐related program failures in multithreaded programs can be caused by a variety of bugs. Concurrency bugs can occur due to unexpected or incorrect thread interleavings during execution. Other kinds of memory bugs, such as buffer overflows and uninitialized reads, may also occur in multithreaded as well as single‐threaded programs. Most prior techniques for isolating these bugs are specialized, addressing only one type of concurrency bug or certain types of other memory bugs. The memory corruption caused by these bugs can also undergo significant propagation during program execution. When a program failure finally occurs due to memory corruption, the true root cause of the failure may be effectively concealed as significant portions of memory may have become corrupted. We propose a general framework that can isolate the root cause of any failure in a multithreaded program that involves memory corruption and reveals at least a subset of this memory corruption. This includes three important types of concurrency bugs—data races, atomicity violations, and order violations—as well as other kinds of memory bugs. To account for propagation of memory corruption, our approach uses a dynamic technique called ‘execution suppression’ that iteratively reveals memory corruption in a failing execution to isolate the true root cause of the failure. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献