首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
片上网络作为一种新型片上互连架构,克服了片上系统在发展中遭遇的瓶颈问题。然而,片上网络中的路由器故障以及路由器之间的链路故障都会造成网络性能损失。对此,文章提出一种针对路径故障与局部拥塞的NoC容错路由算法。首先,设计了一种相隔节点间路径故障模型,该模型下的路由器以较小的开销为代价,动态感知两跳以内的路径故障状态。其次,提出了一种新颖的更能准确反映局部网络拥塞状态的拥塞模型来均衡网络流量。最后,当网络无故障时,算法保证走最优路径;有故障时,算法不仅可以实现容错还能保证网络具有良好的性能。实验表明,在无故障的情况下,本文方案相较于对比对象延迟降低了10%~20%,吞吐率提高了25%左右。在有故障的情况下,本文方案较对比对象的优势更加明显。  相似文献   

2.
With the feature size of semiconductor technology reducing and intellectual property (IP) cores increasing, on-chip interconnection network architectures have a great influence on the performance and area of system-on-chip (SoC) design. Focusing on trade-off performance, cost and implementation, a regular network-on-chip (NoC) architecture which is mesh-connected rings (MCR) interconnection network is proposed. The topology of MCR is simple, planar and scalable in architecture, which combines mesh with ring. A detailed theoretical analysis for MCR and mesh is given, and a simulation analysis based on the virtual channel router with wormhole switching is also presented. The results compared with the general mesh architecture show that MCR has better performance, especially in local traffics and low loads, and lower cost.  相似文献   

3.
FABSYN: floorplan-aware bus architecture synthesis   总被引:1,自引:0,他引:1  
As system-on-chip (SoC) designs become more complex, it is becoming harder to design communication architectures to handle the ever increasing volumes of inter-component communication. Manual traversal of the vast communication design space to synthesize a communication architecture that meets performance requirements becomes infeasible. In this paper, we address this problem by proposing an automated approach for floorplan-aware bus architecture synthesis (FABSYN) to synthesize cost-effective, bus-based communication architectures that satisfy the performance constraints in a design. Our synthesis approach incorporates a high-level floorplanning and wire delay estimation engine to evaluate the feasibility of the synthesized bus architecture and detect bus cycle time violations early in the design How, at the system level. We present case studies of network communication SoC subsystems for which we synthesized bus architectures, detected and eliminated timing violations, and generated core placements in a matter of hours instead of several days for a manual effort.  相似文献   

4.
The high-density population leads to crowded cities. The future city is envisaged to encompass a large-scale network with diverse applications and a massive number of interconnected heterogeneous wireless-enabled devices. Hence, green technology elements are crucial to design sustainable and future-proof network architectures. They are the solutions for spectrum scarcity, high latency, interference, energy efficiency, and scalability that occur in dense and heterogeneous wireless networks especially in the home area network (HAN). Radio-over-fiber (ROF) is a technology candidate to provide a global view of HAN's activities that can be leveraged to allocate orthogonal channel communications for enabling wireless-enabled HAN devices transmission, with considering the clustered-frequency-reuse approach. Our proposed network architecture design is mainly focused on enhancing the network throughput and reducing the average network communications latency by proposing a data aggregation unit (DAU). The performance shows that with the DAU, the average network communications latency reduces significantly while the network throughput is enhanced, compared with the existing ROF architecture without the DAU.  相似文献   

5.
Code-division multiple-access (CDMA) is a data transmission method based on the spreading code technology, wherein multiple data streams share the same physical medium with no interference. A novel architecture for on-chip communication networks based on this approach is devised. The proposed design allows sharing coding resources among network?s users through the use of dynamic assignment of spreading codes. Data transmission latency is reduced by adopting a parallel structure for the coding/decoding circuitry. A 14-node CDMA network based on the proposed architecture is synthesised using 65 nm ST technology library. Performance analysis reveals that the proposed approach achieves significantly lower data packet latency compared to both conventional CDMA and packet switched network-on-chip implementations. Large area and power savings compared to existing approaches are also obtained.  相似文献   

6.
Network-on-chip (NoC) has rapidly become a promising alternative for complex system-on-chip architectures including recent multicore architectures. Additionally, optimizing NoC architectures with respect to different design objectives that are suitable for a particular application domain is crucial for achieving high-performance and energy-efficient customized solutions. Despite the fact that many researches have provided various solutions for different aspects of NoCs design, a comprehensive NoCs system solution has not emerged yet. This paper presents a novel methodology to provide a solution for complex on-chip communication problems to reduce power, latency and area overhead. Our proposed NoC communication architecture is based on setting up virtual source–destination paths between selected pairs of NoCs cores so that the packets belonging to distance nodes in the network can bypass intermediate routers while traveling through these virtual paths. In this scheme, the paths are constructed for an application based on its task-graph at the design time. After that, the run time scheduling mechanism is applied to improve the buffer management, virtual channel and switch allocation schemes and hence, the constructed paths are optimized dynamically. Moreover, in our design the router complexity and its overheads are reduced. Additionally, the suggested router has been implemented on Xilinx Virtex-5 FPGA family. The evaluation results captured by SPLASH-2 benchmark suite reveal that in comparison with the conventional NoC router, the proposed router takes 25% and 53% reduction in latency and energy, respectively besides 3.5% area overhead. Indeed, our experimental results demonstrate a significant reduction in the average packet latency and total power consumption with negligible area overhead.  相似文献   

7.
Network-on-chip (NOC) is emerging as a revolutionary methodology to integrate numerous intellectual property blocks in a single die. It is the packet switching-based communications backbone that interconnects the components on multicore system-on-chip (SoC). A major challenge that NOC design is expected to face is related to the intrinsic unreliability of the interconnect infrastructure under technology limitations. By incorporating error control coding schemes along the interconnects, NOC architectures are able to provide correct functionality in the presence of different sources of transient noise and yet have lower overall energy dissipation. In this paper, designs of novel joint crosstalk avoidance and triple-error-correction/quadruple-error-detection codes are proposed, and their performance is evaluated in different NOC fabrics. It is demonstrated that the proposed codes outperform other existing coding schemes in making NOC fabrics reliable and energy efficient, with lower latency.  相似文献   

8.
Variable block-size motion estimation (VBSME) process occupies a major part of computation of an H.264 encoder, which is usually accelerated by bit-parallel hardware architectures with large I/O bit width to meet real-time constrains. However, such kind of architectures increase the area overhead and pin count, and therefore will not be suitable for area-constrained electronic consumer designs such as small portable multimedia devices. This paper addresses this problem by proposing two area efficient least significant bit (LSB) bit-serial architectures with small pin numbers. Both designs take advantage of data reusing technique in different ways for sum of absolute differences (SAD) computation and reading reference pixels, leading to a considerable reduction of memory bandwidth. The first architecture propagates the partial SAD and sum results and broadcasts the reference pixel rows whereas the second design reuse the SAD of small blocks and has a reconfigurable reference buffer leading to a better memory bandwidth when using hardware parallelism. The proposed designs benefit from several optimization techniques including an efficient serial absolute difference architecture, word length reduction by parallelism, bit truncation, mode filtering, and macroblock (MB) level subsampling, which significantly enhance their performances in terms of silicon area, throughput, latency, and power consumption. The first and second designs can support full search VBSME of 720?×?480 video with 30 frames per second (fps), two reference frames, and [?16, 15] search range at a clock frequency of 414 MHz with 29.28 k and 31.5 k gates, respectively.  相似文献   

9.
System-on-Chip: Reuse and Integration   总被引:5,自引:0,他引:5  
Over the past ten years, as integrated circuits became increasingly more complex and expensive, the industry began to embrace new design and reuse methodologies that are collectively referred to as system-on-chip (SoC) design. In this paper, we focus on the reuse and integration issues encountered in this paradigm shift. The reusable components, called intellectual property (IP) blocks or cores, are typically synthesizable register-transfer level (RTL) designs (often called soft cores) or layout level designs (often called hard cores). The concept of reuse can be carried out at the block, platform, or chip levels, and involves making the IP sufficiently general, configurable, or programmable, for use in a wide range of applications. The IP integration issues include connecting the computational units to the communication medium, which is moving from ad hoc bus-based approaches toward structured network-on-chip (NoC) architectures. Design-for-test methodologies are also described, along with verification issues that must be addressed when integrating reusable components.  相似文献   

10.
This paper presents a communication network targeted for complex system-on-chip (SoC) and network-on-chip (NoC) designs. The Heterogeneous IP Block Interconnection (HIBI) aims at maximum efficiency and minimum energy per transmitted bit combined with quality-of-service (QoS) in transfers. Other features include support for hierarchical topologies with several clock domains, flexible scalability, and runtime reconfiguration of network parameters. HIBI is intended for integrating coarse-grain components such as intellectual property (IP) blocks that have size of thousands of gates.HIBI has been implemented in VHDL and SystemC and synthesized on several CMOS technologies and on FPGA. A 32-bit wrapper requires 5400 gates and runs with 315 MHz on 0.18 μ m technology which shows that only minimal area overhead is paid for the advanced features. The area and frequency results are well comparable to other NoC proposals.Furthermore, data transfers are shown to approach the maximum theoretical performance for protocol efficiency. HIBI network is accompanied with a design framework with tools for optimizing the system through automated design space exploration. Erno Salminen Tampere University of Technology (TUT), Finland.Currently he is working towards his PhD degree in the Institute of Digital and Computer Systems (DCS) at TUT. His main research interests are digital systems design and communication issues in SoCs. Tero KangasTampere University of Technology (TUT), Finland.Since 1999 he has been working as a research scientist in the Institute of Digital and Computer Systems (DCS) at TUT. Currently he is working towards his PhD degree and his main research topics are system architectures and SoC design methodologies in multimedia applications. Timo D. H?m?al?ainen Tampere University of Technology (TUT), Finland. He was nominated to full professor at TUT/Institute of Digital and Computer Systems in 2001. He heads the DACI research group that focuses on three main research areas: wireless sensor networks, high-performance multi-DSP and hardware based video encoding, and design flow tools for heterogeneous MP-SoC platforms. Jouni Riihi?mki Tampere University of Technology (TUT), Finland. Currently he is working as a senior design engineer at Nokia Technlogy Platforms. He is also working towards his PhD degree. His research interests include SoC design and verification methodologies. Vesa Lahtinen received his M.Sc. and Ph.D. from TUT in 1998 and 2004, respectively. In TUT, his main research areas were system-on-chips and their interconnects. Currently, Dr. Lahtinen is a Senior Research Engineer in the Computing Architectures Laboratory of Nokia Research Center (NRC) concentrating on architecture modeling and, specifically, memory architectures. Kimmo Kuusilinna Tampere University of Technology (TUT), Finland. His main research interests include system-level design and verification, on-chip interconnections, and parallel memories. Currently he is working as a senior research engineer at the Nokia Research Center.  相似文献   

11.
Memory and communication architecture have a significant impact on the performance, cost, and power of complex multiprocessor system-on-chip designs. In this paper, we present an automated bus matrix synthesis flow for efficient transaction-level design space exploration of communication architecture in a reconfigurable multimedia system-on-chip platform. Specifically, we consider hardware interface selection problem, which has significant effect on the overall cost of area and power. We propose a method to solve such hardware interface selection problem through static analysis of communication behavior. We experiment with JPEG encoder and H.264 encoder examples and the results show the reduction of area by 56.91% and power by 48.61% of bus matrix with 0.58% performance overhead on average compared to the case of maximum performance. According to our HW interface selection algorithm, we also experiment MPEG4 video decoder example. And the result is evaluated on the FPGA prototyping board.  相似文献   

12.
Novel systolic and super-systolic architectures are presented for polynomial basis multiplication over GF(2m) based on irreducible trinomials. By suitable cut-set retiming, we have derived here an efficient bit-level-pipelined bit-parallel systolic design for binary field multiplication which requires fewer gates and registers and involves nearly half the time-complexity of the corresponding existing design. We have also suggested a digit-level-pipelined design, which involves lower latency, and fewer registers compared with the bit-level-pipelined structure. Moreover, we have proposed a super-systolic design consisting of a set of systolic arrays in a systolic-pipeline and a pipelined systolic-block design consisting of a pipelined blocks of concurrent systolic arrays. The super-systolic designs have the same average computation time and the same critical path as the proposed bit-level-pipelined design, but can be used to reduce the latency by a factor O(radic(m)) at the cost of marginally higher number of XOR gates and bit-registers. The hardware complexities of proposed super-systolic designs are nearly three times that of the existing bit-parallel structures, but offer very high throughput compared with the others for large values of m. For the field orders m = 233 and m = 409, the proposed structures offer, respectively, ten and eleven times more throughput than the others.  相似文献   

13.
Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area.  相似文献   

14.
针对2D-Mesh结构片上网络中通信密集点引发的网络拥塞问题,提出了一种分散通信密集点负载的方法,对网络互连结构进行局部调整,增加与大通信量模块相连的路由器数目,并设计了一种基于区域的XY-YX路由算法。仿真结果表明,该方法可以有效地降低通信延时,增大吞吐量。  相似文献   

15.
As technology scales toward deep submicron, the integration of complete system-on-chip (SoC) designs consisting of large number of Intellectual Property (IP) blocks (cores) on the same silicon die is becoming technically feasible. Until recently, the design-space exploration for SoCs has been mainly focused on the computational aspects of the problem. However, as the number of IP blocks on a single chip and their performance continue to increase, a shift from computation-based to communication-based designs becomes mandatory. As a result, the communication architecture plays a major role in the area, performance and energy consumption of the overall systems [Pasricha S, Dutt N. On-chip communication architectures: system on chip interconnect. Amsterdam: Elsevier Inc.; 2008, Kim J, Verbauwhede I, Chang MCF. Design of an interconnect architecture and signaling technology for parallelism in communication. IEEE Trans VLSI Syst 2007;15(8):881-94].This article presents a structure of a wrapper as a component of Code Division Multiple Access, CDMA, based shared bus architecture in a SoC. Two types of wrappers can be identified, master and slave. A master wrapper is located between the arbiter and CDMA coded physical interconnect, while a slave connects the CDMA coded bus with memory/peripheral module. In the proposal, only bus lines that carry address and data signals are CDMA coded. We implemented a pair of master-slave wrapper described in VHDL and confirmed its functionality using testbenches. Also we synthesized wrappers using a Xilinx Spartan and Virtex devices to determine resource requirements in respect to a number of equivalent gates, communication bandwidth, latency and power consumption. Specifically we involved a Design_Quality, DQ, metric for wrapper performance evaluation. A pair of master-slave wrapper seems to occupy appropriate space, in average 2000 equivalent gates, considering CPU cost of about 30,000 gates, what is less than 8% of hardware overhead per CPU. We also present experimental results which show that benefits of involving CDMA coding relates both to decreasing a number of bus lines and accomplishing simultaneous multiple master-slave connections at relatively low-power consumption and high communication bandwidth. Convenient range indices RW and RR to determine data transfer rate for Write and Read operations in multiprocessor bus systems that use TDMA and CDMA data transfer techniques. The obtained results show that increased data transfer latencies involved by CDMA data transfer are compensated by simultaneous master-slave transfers.  相似文献   

16.
A high-performance network architecture for a PA-RISC workstation   总被引:1,自引:0,他引:1  
With current low-cost high-performance workstations, application-to-application throughput is limited more by host memory bandwidth than by the cost of protocol processing. Conventional network architectures are inefficient in their use of this memory bandwidth, because data is copied several times between the application and the network. As network speeds increase further, network architectures must be developed that reduce the demands on host memory bandwidth. The authors discuss the design of a single-copy network architecture, where data is copied directly between the application buffer and the network interface. Protocol processing is performed by the host, and transport layer buffering is provided on the network interface. They describe a prototype implementation for the HP Apollo Series 700 workstation family that consists of an FDDI network interface and a modified 4.3BSD TCP/IP protocol stack, and report some early results that demonstrate twice the throughput of a conventional network architecture and significantly lower latency  相似文献   

17.
Technology trends are driving parallel on-chip architectures in the form of multiprocessor systems-on-a-chip (MPSoCs) and chip multiprocessors (CMPs). In these systems, the increasing on-chip communication demand among the computation elements necessitates the use of scalable, high-bandwidth network-on-chip (NoC) fabrics instead of dedicated interconnects and shared buses. As transistor feature sizes are further miniaturized, more complicated NoC architectures become feasible that can support more demanding applications. Given the myriad emerging software-hardware combinations, for cost-effectiveness, a system designer critically needs to prune this widening NoC design-space to predict the interconnect fabric(s) that best balance(s) cost/performance, before the actual design process begins. This prompted us to develop Polaris, a system-level roadmapping toolchain for on-chip interconnection networks that helps designers predict the most suitable interconnection network design(s) tailored to their performance needs and power/silicon area constraints with respect to a range of applications that the system will run. Polaris explores the plethora of NoC designs based on projections of network traffic, architectures, and process characteristics. While Polaris's toolchain is extensible so new traffic, network designs, and technology processes can be added, the current version already incorporates 7872 NoC design points. Polaris is rapid, efficiently iterating over thousands of NoC design points, while maintaining high relative and absolute accuracies when validated against detailed NoC synthesis results.  相似文献   

18.
Today, chip multiprocessors (CMPs) that accommodate multiple processor cores on the same chip have become a reality. As the communication complexity of such multicore systems is rapidly increasing, designing an interconnect architecture with predictable behavior is essential for proper system operation. In CMPs, general-purpose processor cores are used to run software tasks of different applications and the communication between the cores cannot be precharacterized. Designing an efficient network-on-chip (NoC)-based interconnect with predictable performance is thus a challenging task. In this paper, we address the important design issue of synthesizing the most power efficient NoC interconnect for CMPs, providing guaranteed optimum throughput and predictable performance for any application to be executed on the CMP. In our synthesis approach, we use accurate delay and power models for the network components (switches and links) that are obtained from layouts of the components using industry standard tools. The synthesis approach utilizes the floorplan knowledge of the NoC to detect timing violations on the NoC links early in the design cycle. This leads to a faster design cycle and quicker design convergence across the high-level synthesis approach and the physical implementation of the design. We validate the design flow predictability of our proposed approach by performing a layout of the NoC synthesized for a 25-core CMP. Our approach maintains the regular and predictable structure of the NoC and is applicable in practice to existing NoC architectures.  相似文献   

19.
The on-chip communication architecture is a primary determinant of overall performance in complex system-on-chip (SoC) designs. Since the communication requirements of SoC components can vary significantly over time, communication architectures that dynamically detect and adapt to such variations can substantially improve system performance. In this paper, we propose Flexbus, a new architecture that can efficiently adapt the logical connectivity of the communication architecture and the components connected to it. Flexbus achieves this by dynamically controlling both the communication architecture topology, as well as the mapping of SoC components to the communication architecture. This is achieved through new dynamic bridge by-pass, and component remapping techniques. In this paper, we introduce these techniques, describe how they can be realized within modern on-chip buses, and discuss policies for run-time reconfiguration of Flexbus-based architectures.   相似文献   

20.
This paper proposes a new three input nodal structure within the data vortex packet switched interconnection network. With additional optical switches, the modified architecture allows for two input packets in addition to a buffered packet to be processed simultaneously within a routing node. A much higher degree of parallel processing is allowed in comparison to previously proposed enhanced buffer node with two input processing or the original network node with single input processing. Unlike the previous contention prevention mechanism, the new network operates by introducing the packet blocking within the node if no exit path is available. This eliminates the traffic control signaling and the strict timing alignment associated with the routing paths which simplifies the overall network implementation. This study shows that both data throughput and the latency performance are improved significantly within the new network. The study compares the three input node with the two input node as well as the original single input data vortex node. Due to additional switch count and nodal cost, networks that support the same I/O ports and of the same cost are compared for a fair comparison. The limitation introduced by the blocking rate is also addressed. The study has shown that under reasonable traffic and network condition, the blocking rate can be kept very low without introducing complex controls and management for dropped packets. As previous architectures require operation under saturation point, the proposed architecture should also operate at reasonable level of network redundancy to avoid excessive packet drop. This study provides guidance and criteria on the proposed three input network design and operation for feasible applications. The proposed network provides an attractive alternative to the previous architectures for higher throughput and lower latency performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号