首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
3D IC solutions have been developed for several different reasons: to reduce the system form factor for portable platforms; to increase system performance by alleviating the interconnect-delay bottleneck; and to manage overall system cost by stacking heterogeneous chips, rather than integrate diverse system components into a single chip through 2D scaling. However, although some 3D IC markets are emerging, and most technical issues of 3D integration are almost solved, several thermal and production-test challenges remain as obstacles. This special issue of IEEE Design & Test takes a look a those issues.  相似文献   

2.
From a system architecture perspective, 3D technology can satisfy the high memory bandwidth demands that future multicore/manycore architectures require. This article presents a 3D DRAM architecture design and the potential for using 3D DRAM stacking for both L2 cache and main memory in 3D multicore architecture.  相似文献   

3.
3D integration is a practical solution for overcoming the problems of long and slow global wires in current and future generations of integrated circuits. This emerging technology stacks several die slices on top of each other in a single chip. It provides higher-bandwidth and lower-latency in the third dimension than a 2D design due to extremely shorter inter-layer distances. However, thermal challenges are a key impediment to stacking logic dies on top of each other. Particularly, routers in a 3D network-on-chip (NoC) are a main source of thermal hotspots, limiting the potential performance gains of the 3D integration. In this paper, we take advantage of the low-latency 3D vertical links to design a temperature-aware router architecture for 3D NoCs. This architecture reduces the peak temperature of routers, particularly routers that are farther from the heat sink, by balancing the traffic across all layers in a temperature-aware distributed way. This way, a router with high temperature can borrow the link and crossbar bandwidth of the routers in the layers closer to the heat sink to forward its packets, effectively offloading part of its traffic to them to reduce its temperature.Experimental results show that the proposed method can control the temperature of 3D NoCs and reduce the temperature gradient across the network with minimized negative impact on performance, compared to a state-of-the-art 3D NoC temperature management method.  相似文献   

4.
With the prevalence of data-centric computing, the key to achieving energy efficiency is to reduce the latency and energy cost of data movement. Near data processing (NDP) is a such technique which, instead of moving data around, moves computing closer to where data is stored. The emerging 3D stacked memory brings such opportunities for achieving both high power-efficiency as well as less data movement overheads. In this paper, we exploit power efficient NDP architectures using the 3D stacked memory. We integrate the programmable GPU streaming multiprocessors into the NDP architectures, in order to fully exploit the bandwidth provided by 3D stacked memory. In addition, we study the tradeoffs between area, performance and power of the NDP components, especially the NoC designs. Our experimental results show that, compared to traditional architectures, the proposed GPU based NDP architectures can achieve up to 43.8% reduction in EDP and 41.9% improvement in power efficiency in terms of performance-per-Watt.  相似文献   

5.
Three-dimensional (3D) Chip Multiprocessors (CMPs) have the potential to improve communication latency as well as integration density. Nevertheless, the stacked nature of the cores introduces thermal challenges that can have severe reliability consequences. In this work, we introduce a reliability-aware platform that tries to optimize power and reliability. We achieve this by integrating a power management policy that we introduced in Kdouh and El-Rewini (ISCA 22nd International Conference on Computer Applications in Industry and Engineering (CAINE-2009), 4–6 November 2009), along with a thermal management policy, as well as a temperature-aware 3D routing algorithm. The thermal management policy is responsible for respecting different correlations between the cores. As for the temperature-aware 3D routing algorithm, it has the capability to dynamically react to the thermal constraints. Furthermore, we introduce a 3D CMP architecture that accommodates our policies. The proposed platform is evaluated using multi-threaded benchmarks in an integrated power, performance, and temperature full system simulator.  相似文献   

6.
张展鹏  张治国 《计算机科学》2012,39(106):350-356
高层综合从高级编程语言对系统的行为描述出发,把系统中的计算转移到可重构的硬件中,以加速系统运行。高层综合中生成有效的内存子系统尤为重要,特别是对于数据密集型的计算。分析了现阶段FPGA高层综合技术及其内存子系统,把生成的内存子系统从体系上分为三类:DSP型体系、以CPU为核心的体系以及基于可重构内存功能单元的体系。结合实例介绍了各体系的特点,然后按照高层综合过程中的前端和后端,分类讨论了内存子系统的优化技术。经过分析评价,指出片外与片上内存间的映射、程序的有效建模等问题仍有待解决,自动化生成内存组织体系和多模块综合是可能的研究方向。  相似文献   

7.
OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the partially replicating shared arrays memory model, we propose ...  相似文献   

8.
Energy costs have become increasingly problematic for high performance processors, but the rising number of cores on-chip offers promising opportunities for energy reduction. Further, emerging architectures such as heterogeneous multicores present new opportunities for improved energy efficiency. While previous work has presented novel memory architectures, multithreading techniques, and data mapping strategies for reducing energy, consideration to thread generation mechanisms that take into account data locality for this purpose has been limited. This study presents methodologies for the joint partitioning of data and threads to parallelize sequential codes across an innovative heterogeneous multicore processor called the Passive/Active Multicore (PAM) for reducing energy consumption from on-chip data transport and cache access components while also improving execution time. Experimental results show that the design with automatic thread partitioning offered reductions in energy-delay product (EDP) of up to 48%.  相似文献   

9.
Radiation-induced soft error has become an emerging reliability threat to high performance microprocessor design. As the size of on chip cache memory steadily increased for the past decades, resilient techniques against soft errors in cache are becoming increasingly important for processor reliability. However, conventional soft error resilient techniques have significantly increased the access latency and energy consumption in cache memory, thereby resulting in undesirable performance and energy efficiency degradation. The emerging 3D integration technology provides an attractive advantage, as the 3D microarchitecture exhibits heterogeneous soft error resilient characteristics due to the shielding effect of die stacking. Moreover, the 3D shielding effect can offer several inner dies that are inherently invulnerable to soft error, as they are implicitly protected by the outer dies. To exploit the invulnerability benefit, we propose a soft error resilient 3D cache architecture, in which data blocks on the soft error invulnerable dies have no protection against soft error, therefore, access to the data block on the soft error invulnerable die incurs a considerably reduced access latency and energy. Furthermore, we propose to maximize the access on the soft error invulnerable dies by dynamically moving data blocks among different dies, thereby achieving further performance and energy efficiency improvement. Simulation results show that the proposed 3D cache architecture can reduce the power consumption by up to 65% for the L1 instruction cache, 60% for the L1 data cache and 20% for the L2 cache, respectively. In general, the overall IPC performance can be improved by 5% on average.  相似文献   

10.
The interconnection structures in FPGA devices increasingly contribute more to the delay, power consumption and area overhead. The demand for even higher clock frequencies makes this problem even more important. Three-dimensional (3-D) chip stacking is touted as the silver bullet technology that can keep Moores momentum and fuel the next wave of consumer electronics products. However, the benefits of such a new integration paradigm have not been sufficiently explored yet. In this paper, a novel 3-D architecture, as well as the software supporting tools for exploring and evaluating application implementation, are introduced. More specifically, by assigning to different layers logic and I/O resources, we achieve mentionable wire-length reduction. Experimental results prove the effectiveness of such a selection, since target architectures outperform the conventional 2-D FPGAs.  相似文献   

11.
The advantages of 3D design can be exploited by reducing the memory access time. In this article, the authors use a simulator based on analytical models to build an optimal processor-memory configuration for two designs: a graphics processor and a microprocessor. One emerging alternative approach to relieving these interconnect constraints is the use of wafer-level 3D integration, which provides a high density of high-performance, low-parasitic vertical interconnects. A wafer-level 3D design is partitionable into multiple chips connected by short vertical vias. This arrangement reduces the length of many global interconnects without introducing any logic complexity. Wafer-level 3D integration also reduces the required number of repeaters, thereby improving the area efficiency and reducing the power consumed within the interconnect network. With micron-size interwafer vias, wafer-level 3D integration allows a large memory bandwidth with little wafer area consumption. We have developed a software program that allows a first-order comparison of cache designs in 2D and 3D IC technologies. We present a first-order estimate of the performance improvements achieved by 3D implementation of cache memory, with emphasis on large caches in deep-submicron technologies.  相似文献   

12.
Although various forms of 3D fabrication technology have existed for a few decades, only in recent years have researchers developed highly integrated 3D design technologies that are potentially manufacturable and economically feasible. Several companies are already marketing 3D structures built by wafer stacking, where the distance between the 3D layers on a wafer are on the order of the wafer thickness. But 3D technology must also surmount several challenges, such as exploiting the design space to build high-performance systems and architectures to gain the fullest advantage achievable. The articles in this special issue highlight these advances and challenges, providing an excellent snapshot of the state of 3D technology as it stands today.  相似文献   

13.
In this paper we investigate architectures that combine message‐passing and shared‐memory technologies, called hereinafter hybrid architectures. We introduced hybrid architectures in which large buses of the shared‐memory are split into a number of small high‐performance shared‐memory blocks, which are connected via message‐passing architecture, such as hypercube, grid or ring. This way we avoid the possible degradation of the achieved performance due to the fact that the bus performance does not scale well when the number of processors it connects increases. We study the saturation situations of several hybrid network architectures, where adding processors does not reduce the overall execution time. We show that the use of hybrid network architectures leads to significant improvement of the systems price/performance ratio, by significantly improving the performance with almost no system cost increment. Therefore, the usage of hybrid architectures demonstrates how minimal ‘cost’ spending could significantly increase the system performance. In addition, we show that different types of applications have different best hybrid architectures. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

14.
片上多核Cache资源管理机制研究   总被引:2,自引:1,他引:1  
随着片上多核成为处理器发展的主流和片上Cache资源的持续增长,Cache资源的管理已成为片上多核的关键问题。介绍了片上多核Cache资源管理的研究进展,依据研究内容将Cache资源的管理分为Cache划分和Cache共享两类。对Cache划分,探讨了其主要组成部分和一般形式,分析和比较了典型的片上多核Cache划分机制。对Cache共享,给出了其主要研究内容,并介绍和比较了几种主流的片上多核Cache共享机制。通过分析,认为软硬件协同管理的页划分应是未来片上多核Cache划分机制的研究重点;而片上多核Cache共享机制的研究则应从目标应用的Cache行为特征着手。  相似文献   

15.
Chip-multiprocessor (CMP) architectures are a promising design alternative to exploit the ever-increasing number of transistors that can be put on a die. To deliver high performance on applications that cannot be easily parallelized, CMPs can use additional support for speculatively executing the possibly data-dependent threads of an application. For cross-thread dependences that must be handled dynamically, the threads can be made to synchronize and communicate either at the register level or at the memory level. In the past, it has been unclear whether the higher hardware cost of register-level communication is cost-effective. In this paper, we show that the wide-issue dynamic processors that will soon populate CMPs, make fast communication a requirement for high performance. Consequently, we propose an effective hardware mechanism to support communication and synchronization of registers between on-chip processors. Our scheme adds enough support to enable register-level communication without specializing the architecture toward speculation much. Finally, our scheme allows the system to achieve near ideal performance.  相似文献   

16.
This article aims to present an account of the state of the art research in the field of integrated cognitive architectures by providing a review of six cognitive architectures, namely Soar, ACT-R, ICARUS, BDI, the subsumption architecture and CLARION. We conduct a detailed functional comparison by looking at a wide range of cognitive components, including perception, memory, goal representation, planning, problem solving, reasoning, learning, and relevance to neurobiology. In addition, we study the range of benchmarks and applications that these architectures have been applied to. Although no single cognitive architecture has provided a full solution with the level of human intelligence, important design principles have emerged, pointing to promising directions towards generic and scalable architectures with close analogy to human brains.  相似文献   

17.
参照DoDAF企业架构设计标准,以Web服务作为互联企业上层集成架构、以Midrange中型系统作为底层集成架构,探讨如何应用SOA集成架构打破企业控制层与管理层、企业与企业间互通互联壁垒的新型管理模式和集成架构设计方法.论文通过案例分析和面向DoDAF V 2.0的系统视图-服务视图模型间的关系研究,提出一种基于Midrange平台的互联企业SOA集成架构层次框架,展示了Midrange平台在SOA企业集成架构中的优势.  相似文献   

18.
Chip multiprocessors (CMPs) are promising candidates for the next generation computing platforms to utilize large numbers of gates and reduce the effects of high interconnect delays. One of the key challenges in CMP design is to balance out the often-conflicting demands. Specifically, for today’s image/video applications and systems, power consumption, memory space occupancy, area cost, and reliability are as important as performance. Therefore, a compilation framework for CMPs should consider multiple factors during the optimization process. Motivated by this observation, this paper addresses the energy-aware reliability support for the CMP architectures, targeting in particular at array-intensive image/video applications. There are two main goals behind our compiler approach. First, we want to minimize the energy wasted in executing replicas when there is no error during execution (which should be the most frequent case in practice). Second, we want to minimize the time to recover (through the replicas) from an error when it occurs. This approach has been implemented and tested using four parallel array-based applications from the image/video processing domain. Our experimental evaluation indicates that the proposed approach saves significant energy over the case when all the replicas are run under the highest voltage/frequency level, without sacrificing any reliability over the latter.  相似文献   

19.
Service oriented architectures: approaches,technologies and research issues   总被引:15,自引:0,他引:15  
Service-oriented architectures (SOA) is an emerging approach that addresses the requirements of loosely coupled, standards-based, and protocol- independent distributed computing. Typically business operations running in an SOA comprise a number of invocations of these different components, often in an event-driven or asynchronous fashion that reflects the underlying business process needs. To build an SOA a highly distributable communications and integration backbone is required. This functionality is provided by the Enterprise Service Bus (ESB) that is an integration platform that utilizes Web services standards to support a wide variety of communications patterns over multiple transport protocols and deliver value-added capabilities for SOA applications. This paper reviews technologies and approaches that unify the principles and concepts of SOA with those of event-based programing. The paper also focuses on the ESB and describes a range of functions that are designed to offer a manageable, standards-based SOA backbone that extends middleware functionality throughout by connecting heterogeneous components and systems and offers integration services. Finally, the paper proposes an approach to extend the conventional SOA to cater for essential ESB requirements that include capabilities such as service orchestration, “intelligent” routing, provisioning, integrity and security of message as well as service management. The layers in this extended SOA, in short xSOA, are used to classify research issues and current research activities.  相似文献   

20.
A major step towards practical use of parallel computers is the integration of cost into transformational or derivational software development methods. The difficulties with doing this come from the wide variety of parallel architectures possible and the effects of memory access and congestion phenomena. This paper presents a model of costs for uniform architectures that is compatible with refinement-based methods of development, that is much simpler than those previously suggested, but which accurately assesses the costs of an implemented computation. Decisions about architecture type and machine size can be made at any stage in the development, even at the end.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号