首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 84 毫秒
1.
Describes and analyzes the Hybrid Array Ring Processor (HARP) architecture. The HARP is an application specific architecture built around a host processor, shared memory, and a set of memory mapped processing cells that are connected both into an open backplane and a bidirectional systolic ring. The architecture is analyzed through detailed simulation of a system implementation based on the Texas Instruments TMS34082 floating point RISC. A bus controller is designed that provides a tightly coupled DMA function that accelerates systolic communication and supports new interleaved transparent communications and reduced overhead message passing. The architecture is benchmarked with the matrix multiplication, FFT, QRD, and SVD algorithms  相似文献   

2.
在片上网络NoC( Network-on-Chip)中,通过光通信取代传统的电信号传精来获得低延时、低功耗成为一种新兴的研究方向—光五连片上网络ONoC(Optical Network-on-Chip)本文提出一种全新的双向传输的波长路由片上网络,这种新的结构对调制好的光信号的波长进行判断来实现在网络节点之间的路由,同时还能够通过器件和传输通道的共享实现数据的双向传输.和传统的电信号传输网络相比,本文提出的双向传输结构减少了50%的硬件开销和70%的芯片面积开销,提高了器件利用率,降低了网络传输延时,极大地提高了网络传精性能,对于光互连片上网络具有重要意义.  相似文献   

3.
To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low.  相似文献   

4.
This paper develops a performance model of an optically interconnected parallel computer system operating in a distributed shared memory environment. The performance model is developed to reflect the impact of low level optical media access protocol and optical device switching latency on high level system performance. This enables the model to predict the performance impact of supporting distributed shared memory with different address allocation schemes and media access protocols. The passive star-coupled photonic network operates through wavelength division multiple access. Two media access protocols are examined for this WDM network, both are designed to operate in a multiple-channel multiple-access environment and require each node to possess a wavelength tunable transmitter and a fixed (or slow tunable) receiver. A semi-Markov model has been developed to study the interaction of the distributed shared memory architecture and the two access protocols of the photonic network. This analytical model has been validated by extensive simulation. The model is then used to examine the system performance with varying numbers of nodes and wavelength channels and varying, memory and channel access times.  相似文献   

5.
千兆位交换式路由器的背板设计   总被引:3,自引:0,他引:3  
可同时传输多个包的交换式背板替代拥塞、共享型背板是高性能路由器发展的必然趋势。文章介绍了路由器结构的演变,对交换式背板的需求进行了分析,并研究了设计交换式背板所必须解决的一些问题,最后,给出了一个吞吐率为64Gbps的交换式背板的具体设计。  相似文献   

6.
Multiprocessor system-on-chip (MP-SoC) platforms represent an emerging trend for embedded multimedia applications. To enable MP-SoC platforms, scalable communication-centric interconnect fabrics, such as networks-on-chip (NoCs), have been recently proposed. The shared memory represents one of the key elements in designing MP-SoCs to provide data exchange and synchronization support.This paper focuses on the energy/delay exploration of a distributed shared memory architecture, suitable for low-power on-chip multiprocessors based on NoC. A mechanism is proposed for the data allocation on the distributed shared memory space, dynamically managed by an on-chip hardware memory management unit (HwMMU). Moreover, the exploitation of the HwMMU primitives for the migration, replication, and compaction of shared data is discussed. Experimental results show the impact of different distributed shared memory configurations for a selected set of parallel benchmark applications from the power/-performance perspective. Furthermore, a case study for a graph exploration algorithm is discussed, accounting for the effects of the core mapping and the network topology on energy and performance at the system level.  相似文献   

7.
《Computer Networks》2008,52(10):1864-1872
Optical networks will change greatly over the next 10 years. This is because, if the current growth rate is maintained, the Internet will have expanded 100–1000 times. Networked wireless appliances, such as radio frequency identification (RFID) tags and wireless sensors, are expected to greatly outnumber PCs. Such exponential changes in network capacity and terminals may lead to the emergence of post-IP networks. This paper introduces a candidate for a post-IP network called the “appliance defined ubiquitous network (ADUN)”, which supports niche ubiquitous network applications for affordable implementation. The ADUN will demand optical networks that can transport 10–100 Gbps streams, each of which requires almost the full transmission capacity of one wavelength or a wavelength group. This paper discusses directions for the functional enhancement of optical network architecture, dynamically using wavelengths for grid computing, so as to support the ADUN.  相似文献   

8.
In general, message passing multiprocessors suffer from communication overhead between processors and shared memory multiprocessors suffer from memory contention. Also, in computer vision tasks, data I/O overhead limits performance. In particular, high level vision tasks, which are complex and require nondeterministic communication, are strongly affected by these disadvantages. This paper proposes a flexibly (tightly/loosely) coupled hypercube multiprocessor (FCHM) for high level vision to alleviate these problems. A variable address space memory scheme in which a set of adjacent memory modules can be merged into a shared memory module by a dynamically partitionable hypercube topology is proposed. The architecture is quantitatively analyzed using computational models and simulated on the Intel’s Personal SuperComputer (iPSC/I), a hypercube multiprocessor. A parallel algorithm for exhaustive search is simulated on FCHM using the iPSC/I showing significant performance improvements over that of the iPSC/I. This research was supported in part by IBM corporation.  相似文献   

9.
Abstract— By using current technology, it is possible to design and fabricate performance‐competitive TV‐sized AMOLED displays. In this paper, the system design considerations are described that lead to the selection of the device architecture (including a stacked white OLED‐emitting unit), the backplane technology [an amorphous Si (a‐Si) backplane with compensation for TFT degradation], and module design (for long life and low cost). The resulting AMOLED displays will meet performance and lifetime requirements, and will be manufacturing cost‐competitive for TV applications. A high‐performance 14‐in. AMOLED display was fabricated by using an in‐line OLED deposition machine to demonstrate some of these approaches. The chosen OLED technologies are scalable to larger glass substrate sizes compatible with existing a‐Si backplane fabs.  相似文献   

10.
The scalable coherent interface (SCI), a local or extended computer backplane interface being defined by an IEEE standard project (P1596), is discussed. the interconnection is scalable, meaning that up to 64 K processor, memory, or I/O nodes can effectively interface to a shared SCI interconnection. The SCI sharing-list structures are described, and sharing-list addition and removal are examined. Optimizations being considered to improve the performance of large system configurations are discussed. Request combining, a useful feature of linked-list coherence, is described. SCI's optional extensions, including synchronization using a queued-on-lock bit, are considered  相似文献   

11.
 Planar optical waveguides for applications in communication networks can be fabricated using conventional chip-manufacturing techniques. We present a planar optical waveguide technology that is based on a silicon-oxynitride (SiON) core and silicon-oxide cladding layers. In addition to more compact, conventional optical devices, it also enables enhanced optical functions such as dynamically reconfigurable planar integrated optical devices. Examples of adaptive devices realized in this technology include finite and infinite impulse response (FIR and IIR) filters. Received: 13 February 2002/Accepted: 28 February 2002 In realizing the SiON waveguide technology and the adaptive optical filter functions with the subsystem control, the dedicated work of the waveguide process technology, the photonic device technology, and the engineering services teams at IBM's Zurich Research Laboratory were instrumental and are gratefully acknowledged. For the concept-level optical-packaging work we thank Optospeed SA. This paper was presented at the Workshop “Optical MEMS and Integrated Optics” in June 2001.  相似文献   

12.
多核处理器已经成为主流,并且被广泛应用于嵌入式设备中.在操作系统如何有效支持多核处理器方面的研究中,目前国内外大多基于常见的紧耦合共享存储架构的多核处理器,而对一些特殊存储架构的多核处理器研究并不多.本文针对内存受限的多级存储架构的多核处理器,提出一种单代码多数据的嵌入式多核操作系统模型.实验表明,该模型应用在具有多级存储架构的八核DSP上,比AMP模型减少约80%的代码空间开销;与SMP模型相比,与实时性紧密相关的时间开销减少约10倍.  相似文献   

13.
存储模型仿真器的设计与实现   总被引:2,自引:1,他引:1  
存储一致性问题和高速缓存一致性问题是共享存储并行计算机中两个最关键的问题,通过仿真器对它们进行了量化研究,设计并实现了一个存储模型仿真器MMS.基于MMS仿真了不同并行机结构模型下多种存储一致性模型的行为;针对不同类型的计算问题比较了不同的存储一致性模型,并对实验结果进行了分析;实现了几个不同的高速缓存一致性协议,并比较了它们的性能.  相似文献   

14.
新一代互联网必然要满足服务质量(QoS)的要求,本文讨论了新一代互联网路由器输入端缓冲区队列管理和内存管理的一个方案,提出了二级队列的基本结构拥塞控制与门限值的设定方法,使所提方案能够支持多级QoS和动态内存分配。  相似文献   

15.
Conventional multiprocessors mostly use centralized, memory-based barriers to synchronize concurrent processes created in multiple processors. These centralized barriers often become the bottleneck or hot spots in the shared memory. In this paper, we overcome the difficulty by presenting a distributed and hardwired barrier architecture, that is hierarchically constructed for fast synchronization in cluster-structured multiprocessors. The hierarchical architecture enables the scalability of cluster-structured multiprocessors. A special set of synchronization primitives is developed for explicit use of distributed barriers dynamically. To show the application of the hardwired barriers, we demonstrate how to synchronize Doall and Doacross loops using a limited number of hardwired barriers. Timing analysis shows an O(102) to O(105) reduction in synchronization overhead, compared with the use of software-controlled barriers implemented in a shared memory. The hardwired architecture is effective in implementing any partially ordered set of barriers or fuzzy barriers with extended synchronization regions. The versatility, scalability, programmability, and low overhead make the distributed barrier architecture attractive in constructing fine-grain, massively parallel MIMD systems using multiprocessor clusters with distributed shared memory  相似文献   

16.
一个NT平台上分布式对象数据库服务器系统   总被引:4,自引:0,他引:4  
FISH系统是一个用于支持先进应用(如GIS,EC,CIMS)的新一代分布式对象数据库系统.该系统采用了许多新颖技术,如DSVM(distributed shared virtual memory)、持久堆、页式对象、透明锁、紧凑提交等.重点介绍了该系统的总体结构和设计思想,特别是FISH系统在Windows NT上实现所涉及的底层技术,包括内存映射、共享内存、远程过程调用、多线索连接、页面故障处理等.基于OO7的性能测试表明,FISH系统在NT机群环境下取得了与在分布式UNIX环境下同样高的分布执行效率  相似文献   

17.
This paper presents a joint study of application and architecture to improve the performance and scalability of an irregular application—computing betweenness centrality—on a many-core architecture IBM Cyclops64. The characteristics of unstructured parallelism, dynamically non-contiguous memory access, and low arithmetic intensity in betweenness centrality pose an obstacle to an efficient mapping of parallel algorithms on such many-core architectures. By identifying several key architectural features, we propose and evaluate efficient strategies for achieving scalability on a massive multi-threading many-core architecture. We demonstrate several optimization strategies including multi-grain parallelism, just-in-time locality with explicit memory hierarchy and non-preemptive thread execution, and fine-grain data synchronization. Comparing with a conventional parallel algorithm, we get 4X-50X improvement in performance and 16X improvement in scalability on a 128-cores IBM Cyclops64 simulator.  相似文献   

18.
《Computer Networks》2007,51(6):1643-1659
This paper addresses various issues in an all-photonic network called APOSN (All-Photonic Overlaid-Star Network). By revisiting the fundamental star topology, a scalable architecture is proposed to cover widely separated nodes and to take advantage of the currently available optical technologies. An equivalent connection pattern is presented to facilitate the modeling of switching and routing operations. The proposed overlaid-star topology can support different switching techniques and reduce the complexity as well as the cost of switching and routing. We demonstrate the feasibility of such switching operations by evaluating different link resources, traffic patterns, delay constraints and other typical factors. We show that this topology is a good choice to capture both Time-Division-Multiplexing (TDM) and Optical Burst Switching (OBS) operations. Therefore, the APOSN can provide a feasible and efficient solution for high-speed networks of near future by making use of the currently available optical technologies.  相似文献   

19.
Emerging non-volatile memories (e.g. STT-MRAM, OxRRAM and CBRAM) based on resistive switching are under intense research and development investigation by both academics and industries. They provide high performance such as fast write/read speed, low power and good endurance (e.g. >1012), and could be used as both computing and storage memories beyond flash memories. However the conventional access architecture based on 1 transistor + 1 memory cell limits its storage density as the selection transistor should be large enough to ensure enough current for the switching operation. This paper presents the design and analysis of crossbar architecture based on complementary resistive switching non-volatile memory cells with a particular focus on reliability and power performance investigation. This architecture allows fewer selection transistors, and minimum contacts between memory cells and CMOS control circuits. The complementary cell and parallel data sensing mitigate the impact of sneak currents in the crossbar architecture and provide fast data access for computing purpose. We perform transient and statistical simulations based on two memory technologies: STT-MRAM and OxRRAM to validate the functionality of this design by using CMOS 40 nm design kit and memory compact models, which were developed based on relative physics and experimental parameters.  相似文献   

20.
The Stanford Dash multiprocessor   总被引:2,自引:0,他引:2  
The overall goals and major features of the directory architecture for shared memory (Dash) are presented. The fundamental premise behind the architecture is that it is possible to build a scalable high-performance machine with a single address space and coherent caches. The Dash architecture is scalable in that it achieves linear or near-linear performance growth as the number of processors increases from a few to a few thousand. This performance results from distributing the memory among processing nodes and using a network with scalable bandwidth to connect the nodes. The architecture allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance. A distributed directory-based protocol that provides cache coherence without compromising scalability is discussed in detail. The Dash prototype machine and the corresponding software support are described  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号