首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
CCNoC: Cache-Coherent Network on Chip for Chip Multiprocessors   总被引:1,自引:1,他引:0       下载免费PDF全文
As the number of cores in chip multiprocessors(CMPs) increases,cache coherence protocol has become a key issue in integration of chip multiprocessors.Supporting cache coherence protocol in large chip multiprocessors still faces three hurdles:design complexity,performance and scalability.This paper proposes Cache Coherent Network on Chip(CCNoC),a scheme that decouples cache coherency maintenance from processors and shared L2 caches and implements it completely in network on chip to free up processors and ...  相似文献   

2.
Due to economical reasons, the traditional philosophy in data centers was to scale out, rather than scaling up. However, the advances in CMP technology enabled chip multiprocessors to become more prevalent and they are expected to become more affordable and power-efficient in the coming years. Current trend towards more densely packaged systems and increasing demand for higher performance push the market towards placing datacenters on highly powerful chips that have many cores on a single platform. However, increasing the number of cores on a single chip brings along very important problems to be addressed at the chip level regarding the use of shared resources and QoS satisfaction. After briefly exploring current datacenter perspective, this paper captures the current state of the art in the field of chip multiprocessors through a detailed discussion of different studies that pave the way to the datacenters on-chip. Finally, a number of open research issues are highlighted with the intention of inspiring new contributions and developments in the field of datacenters on-chip.  相似文献   

3.
This paper evaluates asymmetric cluster chip multiprocessor (ACCMP) architectures as a mechanism to achieve the highest performance for a given power budget. ACCMPs execute serial phases of multithreaded programs on large high-performance cores whereas parallel phases are executed on a mix of large and many small simple cores. Theoretical analysis reveals a performance upper bound for symmetric multiprocessors, which is surpassed by asymmetric configurations at certain power ranges. Our emulations show that asymmetric multiprocessors can reduce power consumption by more than two thirds with similar performance compared to symmetric multiprocessors.  相似文献   

4.
多核环境下的Cache设计技术受到线延时和应用等多方面因素影响,私有和共享方案都存在各自的不足.提出了一种异构的CMP Cache结构,采用两类具有不同Cache层次的结点组成多核芯片,设计了基于间接索引的Cache容量复用等技术,提供了容量有效且访问迅速的片上存储层次.在全系统环境下对SPEC CPU2000, SPLASH2等程序的评测结果表明,异构CMP Cache结构能够适应各类应用的需要,对单进程和多线程应用平均性能提高分别可达16%和9%.异构CMP Cache同时具有硬件设计简单的特点,具有较好的工程可实现性,其设计思想将应用在未来的龙芯多核处理器设计中.  相似文献   

5.
Resource sharing in modern chip multiprocessors (multicores) provides many cost and performance benefits. However, component sharing also creates drawbacks for fault, performance, and security isolation.Thus, integration of components on a multicore chip should also be accompanied by features that help isolate effects of faults, destructive performance interference, and security breaches.  相似文献   

6.
Hybrid multiprocessor architectures which combine re-configurable computing and multiprocessors on a chip are being proposed to transcend the performance of standard multi-core parallel systems. Both fine-grained and coarse-grained parallel algorithm implementations are feasible in such hybrid frameworks.  相似文献   

7.
The Journal of Supercomputing - In chip multiprocessors, the thermal hot spot prediction is vital to take proactive thermal management decisions for mitigating its impact on the performance of the...  相似文献   

8.
Formal control techniques for power-performance management   总被引:1,自引:0,他引:1  
These techniques determine when to speed up a processor to reach performance targets and when to slow it down to save energy. They use dynamic voltage and frequency scaling to balance speed and avoid worst case frequency limitations for both multiple-clock-domain and chip multiprocessors.  相似文献   

9.
The sort operation is a core part of many critical applications (e.g., database management systems). Despite the large efforts to parallelize it, the fact that it suffers from high data-dependencies vastly limits its performance. Multithreaded architectures are emerging as the most demanding technology in leading-edge processors. These architectures include simultaneous multithreading, chip multiprocessors, and machines combining different multithreading technologies. In this paper, we analyze the memory behavior and improve the performance of the most recent parallel radix and quick integer sort algorithms on modern multithreaded architectures. We achieve speedups up to 4.69× for radix sort and up to 4.17× for quicksort on a machine with 4 multithreaded processors compared to single threaded versions, respectively. We find that since radix sort is CPU-intensive, it exhibits better results on chip multiprocessors where multiple CPUs are available. While quicksort is accomplishing speedups on all types of multithreading processers due to its ability to overlap memory miss latencies with other useful processing.  相似文献   

10.
内存竞争记录是解决多核程序执行不确定性的关键技术,然而现有点到点的内存竞争记录机制带来的硬件开销大,难以应用到实际的片上多核处理器系统中.以降低点到点内存竞争记录方式的硬件开销为出发点,为采用监听一致性协议的片上多核处理器(chip multiprocessor, CMP)系统设计了基于并发记录策略的点到点内存竞争记录算法.该记录算法将两两线程间点到点的内存竞争关系扩展到所有线程,采用分布式记录方法为每个线程记录一个由内存竞争关系的一方构成的内存竞争日志;重演时采用简化的生产者消费者模型,确保了确定性重演的实现,有效降低了硬件消耗和带宽开销.在8核处理器系统中的仿真结果表明,该并发式点到点内存竞争记录算法为每个处理器核添加硬件资源约171B,每千条内存操作指令记录日志大小约2.3B,记录和重演阶段均添加不到1.5%的带宽开销.  相似文献   

11.
半导体工艺的进步使片上可以集成更多的处理核心,对于消耗较多面积和功耗的存储单元,如何有效地减小面积、降低功耗是片上多核研究的一个重要方向。软件指令缓存技术是降低指令存储复杂性,以及降低功耗的有效方式,本文深入对比了硬件Cache结构和软件指令缓存结构,并且详细分析了两款典型的软件指令缓存结构,总结了其特点和需要解决的关键问题,为片上多核的指令存储设计提供了参考。  相似文献   

12.
Transactional Memory: An Overview   总被引:3,自引:0,他引:3  
Writing applications that benefit from the massive computational power of future multicore chip multiprocessors will not be an easy task for mainstream programmers accustomed to sequential algorithms rather than parallel ones. This article presents a survey of transactional memory, a mechanism that promises to enable scalable performance while freeing programmers from some of the burden of modifying their parallel code.  相似文献   

13.
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization an important design criterion. Previous research has shown that integrating support for fine-grained synchronization can have significant performance benefits compared to traditional coarse-grained synchronization. Not much progress has been made in supporting fine-grained synchronization transparently to processor nodes: a key reason perhaps why wide adoption has not followed.  相似文献   

14.
Heterogeneous chip multiprocessors   总被引:2,自引:0,他引:2  
Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahl's law. On-chip heterogeneity allow the processor to better match execution resources to each application's needs and to address a much wider spectrum of system loads - from low to high thread parallelism - with high efficiency.  相似文献   

15.
Transient-fault recovery for chip multiprocessors   总被引:1,自引:0,他引:1  
Chip-level redundant threading with recovery (CRTR) for chip multiprocessors extends previous transient-fault detection schemes to provide fault recovery. To hide interprocessor latency, CRTR uses a long slack enabled by asymmetric commit and uses the trailing thread state for recovery. CRTR increases bandwidth supply by pipelining communication paths and reduces bandwidth demand by extending the dependence-based checking elision.  相似文献   

16.
The MAJC architecture enhances application performance by exploiting parallelism at multiple levels-instruction, data, thread, and process. Supporting vertical multithreading, speculative multithreading, and chip multiprocessors, the scalable VLIW architecture is also capable of advanced speculation and predication and treats all data types similarly  相似文献   

17.
We designed and evaluated a microfluidic test chip for human blood filtration and imaging label-free detection of multiple biomarkers. The microfluidic chip has a total size of 75 mm × 25 mm × 2 mm. It is realized as an assembly of a plastic chip and a functionalized photonic crystal slab on a glass substrate. The plastic chip contains capillary channels, a filter membrane and a cavity open on one side. The photonic crystal chip is bonded with an adhesive foil to the open cavity. Human blood filtration is demonstrated. We determined that fluorescently labelled particles of diameter 3 µm or larger are filtered out by the system. Refractometric measurements are performed with the test chip in combination with a compact imaging read-out system to investigate the system response to refractive index changes. Flow dynamics in the sensor cavity are imaged for replacing water by isopropanol. Finally, the binding of 500 nm biotin dissolved in phosphate buffer saline to a photonic crystal surface locally functionalized with streptavidin is demonstrated.  相似文献   

18.
This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors and its corresponding pre-layout simulation results using VHDL. The arbiter exploits the advantage of a concurrency control instruction (Brk) provided by the micro-threaded microprocessor model to set the priority processor and move the circulated arbitration token to the most likely processor to issue the create instruction. This mechanism provides latency hiding during token circulation by decoupling the micro-threaded processor from the ring’s timing. The arbiter provides a very simple arbitration mechanism and can be used for chip multiprocessor arbitration purposes.  相似文献   

19.
Cache是高性能微处理器解决CPU和存储器速度差异问题的有效措施之一。在共享存储器的多机环境下,共享数据在多个处理器的片上Cache中分布,Cache间维持数据一致性成为关键。该文讨论了32位嵌入式微处理器“龙腾R2”的Cache的设计和实现和支持多机环境的Cache一致性实现方法,并给出了实现的结果。  相似文献   

20.
多处理器片上系统任务调度研究进展评述   总被引:9,自引:0,他引:9  
多处理器片上系统在单芯片上集成了多种指令集处理器,可完成复杂完整的功能,在图像处理、网络多媒体和嵌入式系统等应用领域前景广阔.任务映射与调度是多处理器片上系统设计的关键问题之一.介绍了多处理器片上系统的基本结构和面临的挑战,从调度算法分析和实现框架两个方面着重探讨了近年来多处理器片上系统任务调度的国内外研究进展情况,分析了当前亟待解决的问题与下一步主要的研究方向,可为多处理器片上系统相关研究提供参考.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号