首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we explore the designs of a circuit-switched router, a wormhole router, a quality-of-service (QoS) supporting virtual channel router and a speculative virtual channel router and accurately evaluate the energy-performance tradeoffs they offer. Power results from the designs placed and routed in a 90-nm CMOS process show that all the architectures dissipate significant idle state power. The additional energy required to route a packet through the router is then shown to be dominated by the data path. This leads to the key result that, if this trend continues, the use of more elaborate control can be justified and will not be immediately limited by the energy budget. A performance analysis also shows that dynamic resource allocation leads to the lowest network latencies, while static allocation may be used to meet QoS goals. Combining the power and performance figures then allows an energy-latency product to be calculated to judge the efficiency of each of the networks. The speculative virtual channel router was shown to have a very similar efficiency to the wormhole router, while providing a better performance, supporting its use for general purpose designs. Finally, area metrics are also presented to allow a comparison of implementation costs.   相似文献   

2.
The issues of applying the code-division multiple access (CDMA) technique to an on-chip packet switched communication network are discussed in this paper. A packet switched network-on-chip (NoC) that applies the CDMA technique is realized in register-transfer level (RTL) using VHDL. The realized CDMA NoC supports the globally-asynchronous locally-synchronous (GALS) communication scheme by applying both synchronous and asynchronous designs. In a packet switched NoC, which applies a point-to-point connection scheme, e.g., a ring topology NoC, data transfer latency varies largely if the packets are transferred to different destinations or to the same destination through different routes in the network. The CDMA NoC can eliminate the data transfer latency variations by sharing the data communication media among multiple users concurrently. A six-node GALS CDMA on-chip network is modeled and simulated. The characteristics of the CDMA NoC are examined by comparing them with the characteristics of an on-chip bidirectional ring topology network. The simulation results reveal that the data transfer latency in the CDMA NoC is a constant value for a certain length of packet and is equivalent to the best case data transfer latency in the bidirectional ring network when data path width is set to 32 bits.  相似文献   

3.
In this paper, ultra-low-voltage circuit techniques are presented for CMOS RF frontends. By employing a complementary current-reused architecture, the RF building blocks including a low-noise amplifier (LNA) and a single-balanced down-conversion mixer can operate at a reduced supply voltage with microwatt power consumption while maintaining reasonable circuit performance at multigigahertz frequencies. Based on the MOSFET model in moderate and weak inversion, theoretical analysis and design considerations of the proposed circuit techniques are described in detail. Using a standard 0.18-mum CMOS process, prototype frontend circuits are implemented at the 5-GHz frequency band for demonstration. From the measurement results, the fully integrated LNA exhibits a gain of 9.2 dB and a noise figure of 4.5 dB at 5 GHz, while the mixer has a conversion gain of 3.2 dB and an IIP3 of -8 dBm. Operated at a supply voltage of 0.6 V, the power consumptions of the LNA and the mixer are 900 and 792 muW, respectively.  相似文献   

4.
片上网络节点编码的设计和在路由方面的应用   总被引:2,自引:2,他引:0  
网络拓扑选择和路由算法设计是片上网络设计的关键问题.在比较现有的三种网络拓扑结构的基础上,提出了一种隐含着相邻节点以及节点之间链路关系并适合二维Torus拓扑结构的节点编码方法.该编码和Torus结构的结合能拓扑结果够简化路由算法的设计和实现,改善了网络路由性能.实验结果表明,提出的编码方法与二维Torus拓扑结构的结合有效地提高了片上网络通信性能.  相似文献   

5.
一种基于混合遗传算法的加权Myriad滤波器   总被引:1,自引:0,他引:1       下载免费PDF全文
杨军  马晓岩  万山虎 《电子学报》2003,31(12):1807-1810
本文提出了一种直接采用浮点操作的混合遗传算法,分析了该算法的特点,并将其用于实现加权Myriad滤波器,即采用混合遗传算法得到加权Myriad滤波器的权值估计及其滤波输出.比较仿真结果表明,由于混合遗传算法的所有染色体均可快速收敛至全局最优点,该方法在任何情况下的处理效果明显优于其它方法.  相似文献   

6.
本文提出了一种基于握手协议的GALS接口设计方法。该接口采用异步FIFO作为输入缓冲区,有效降低了数据传输延迟;采用环形缓冲的概念来管理缓冲区,使接口具有了可扩展性。FPGA验证结果表明,该接口保证了适配单元与网络路由之间完成准确的异步传输,4通道的接口共占用了405个ALUT(Adaptive Look-Up Table)和支持211 MHz的时钟频率。  相似文献   

7.
通过分析流水线结构和单周期结构的片上网络路由器,提出了一种低延时片上网络路由器的设计,并在SMIC0.13um Mixed-signal/RF 1.2V/3.3V工艺进行流片验证。芯片测试结果表明,该路由器可以在300 MHz时钟频率下工作,并且在相同负载下,与其他结构的路由器相比较,其能够在较低延时下完成数据包传送功能。  相似文献   

8.
In this paper, we consider the problem of synthesizing custom networks-on-chip (NoC) architectures that are optimized for a given application. We consider both unicast and multicast traffic flows in the input specification. Multicast traffic flows are used in a variety of applications, and their direct support with only replication of packets at optimal bifurcation points rather than full end-to-end replication can significantly reduce network contention and resource requirements. Our problem formulation is based on the decomposition of the problem into the inter-related steps of finding good flow partitions, deriving a good physical network topology for each group in the partition, and providing an optimized network implementation for the derived topologies. Our solutions may be comprised of multiple custom networks, each interconnecting a subset of communicating modules. We propose several algorithms that can systematically examine different flow partitions, and we propose Rectilinear–Steiner-Tree (RST)-based algorithms for generating efficient network topologies. Our design flow integrates floorplanning, and our solutions consider deadlock-free routing. Experimental results on a variety of NoC benchmarks showed that our synthesis results can on average achieve a 4.82 times reduction in power consumption over different mesh implementations on unicast benchmarks and a 1.92 times reduction in power consumption on multicast benchmarks. Significant improvements in performance were also achieved, with an average of 2.92 times reduction in hop count on unicast benchmarks and 1.82 times reduction in hop count on multicast benchmarks. To further gauge the effectiveness of our heuristic algorithms, we also implemented an exact algorithm that enumerates all distinct set partitions. For the benchmarks where exact results could be obtained, our algorithms on average can achieve results within 3% of exact results, but with much shorter execution times.   相似文献   

9.
针对由3颗或者3颗以上卫星组成的分布式SAR卫星系统,该文提出了一种基于改进粒子群算法的构形优化设计方法。首先给出了系统卫星平均轨道要素偏差的确定方法,并确定了分布式SAR卫星系统构形设计的一般流程。为了准确快速的完成基于任务要求的系统构形设计,提出了基于SM-PSO算法的系统构形优化设计方法。最后,以构形稳定性为目标函数,有效基线范围、工作时间和地面覆盖带为约束条件,对系统构形进行优化设计。仿真结果表明:该系统构形优化设计方法得到的系统构形参数可以较好地满足系统的覆盖特性以及系统在长期运行过程中能够保持相对稳定的有效基线。  相似文献   

10.
虚拟通道控制器的设计是实现虚拟通道技术的关键.基于FPGA实现技术,利用VHDL硬件描述语言设计实现了一种适用于网格、半环网、环网三种拓扑结构的虚拟通道控制器.该虚拟通道控制器工作频率能够达到689MHz,FPGA资源占用率仅为1%,是一种高效的虚拟通道控制器设计方案.  相似文献   

11.
We propose two novel wavelength-division-multiplexed passive-optical-network (WDM-PON) architectures where subcarriers are employed to transmit downstream data and optical carriers are reused for upstream transmission. Architecture I is designed for the situation where two short distribution fibers are available between the remote node (RN) and each optical network unit (ONU), whereas Architecture II is devised for the case where there is only one distribution fiber between the RN and each ONU. Both architectures use only one interferometric filter located at the RN to simultaneously separate all downlink optical carriers and subcarriers, leading to a considerable cost reduction in the implementation of the WDM-PONs. Separated optical carriers are then reused and injected into reflective semiconductor optical amplifiers as the uplink light sources, which eliminates the necessity of specific wavelength sources at the ONUs. The downstream subcarrier signals are directly detected using baseband receivers. Two multichannel upstream and downstream transmission experiments are carried out at 1.25 Gb/s using the proposed schemes. The impact of optical carrier-to-subcarrier ratio of downlink signal, Rayleigh-backscattering noise, and wavelength mismatch between laser source and filter on system performance is also investigated.  相似文献   

12.
Dynamically reconfigurable hardware has already been deployed for accelerating computationally demanding applications. Some of these hardware architectures allow run time reconfiguration but this usually leads to a large reconfiguration overhead. The advantage of run time reconfiguration is that it allows new algorithmic solutions for many applications. To study the potential of frequent run time reconfiguration it is interesting to investigate its costs and benefits from an abstract point of view and to develop new architectural concepts. Multi-level reconfigurable architectures are one such concept that introduces several levels of reconfiguration. This paper deals with new types of multi-level reconfigurable architectures. The corresponding problem of finding the best granularity for different reconfiguration levels is formulated and investigated. Although this problem is shown to be NP-complete, an interesting restricted subcase is solved optimally in polynomial time. For the general case, a good heuristic is proposed that is based on solutions for the restricted case. Results on three example applications show that the reconfiguration cost can be reduced with the new architectures. Based on a proposed measure of relative efficiency it is also shown that the new architectures are more efficient so that they obtain a larger reconfiguration cost reduction with less additional hardware.
Martin MiddendorfEmail:
  相似文献   

13.
 三维片上网络中路由器的输入端口和交叉开关出现故障,将严重影响整个网络的性能,因此文章提出了一种故障及拥塞感知的容错路由器.通过增加一个冗余的输入端口和旁路总线,不仅能实现对输入端口和交叉开关容错的目的,而且还能在没有端口故障的情况下使用冗余端口有效地解决拥塞问题.实验表明此容错机制能够使得网络在故障路由器多、拥塞严重的情况下,仍然保持良好的性能.  相似文献   

14.
A network-on-chip (NoC) based parallel processor is presented for bio-inspired real-time object recognition with visual attention algorithm. It contains an ARM10-compatible 32-bit main processor, 8 single-instruction multiple-data (SIMD) clusters with 8 processing elements in each cluster, a cellular neural network based visual attention engine (VAE), a matching accelerator, and a DMA-like external interface. The VAE with 2-D shift register array finds salient objects on the entire image rapidly. Then, the parallel processor performs further detailed image processing within only the pre-selected attention regions. The low-latency NoC employs dual channel, adaptive switching and packet-based power management, providing 76.8 GB/s aggregated bandwidth. The 36 mm2 chip contains 1.9 M gates and 226 kB SRAM in a 0.13 mum 8-metal CMOS technology. The fabricated chip achieves a peak performance of 125 GOPS and 22 frames/sec object recognition while dissipating 583 mW at 1.2 V.  相似文献   

15.
We propose a technique that enhances the performance of Torrieri and Bakhru's maximin algorithm in the presence of frequency offsets and modulated interferences, a common scenario in multiple-access environments, where we observed that the Maximin algorithm suffers performance degradation in the aforesaid scenario. To combat modulated interferences and fading effects, we propose to update the step length during every smart antenna weight vector update. We found that our scheme improves the robustness of the maximin algorithm to a greater extent. Simulation results for a frequency-hopping system confirm the validity of the approach and demonstrate improvements in the performance of the maximin algorithm in the presence of frequency offsets and data modulated interferences in a fading channel environment.  相似文献   

16.
17.
In this paper, we propose an improvement of the normalized min-sum (MS) decoding algorithm and novel MS decoder architectures with reduced word length using nonuniform quantization schemes for low-density parity-check (LDPC) codes. The proposed normalized MS algorithm introduces a more exact adjustment with two optimized correction factors for check-node-updating computations, while the conventional normalized MS algorithm applies only one correction factor. The proposed algorithm provides a significant performance gain without any additional computation or hardware complexity. The finite word-length analysis in implementing an LDPC decoder is a very important factor since it directly impacts the size of memory to store the intrinsic and extrinsic messages and the overall hardware area in the partially parallel LDPC decoder. The proposed nonuniform quantization scheme can reduce the finite word length while achieving similar performances compared to a conventional quantization scheme. From simulation results, it is shown that the proposed 4-bit nonuniform quantization scheme achieves an acceptable decoding performance, unlike the conventional 4-bit uniform quantization scheme. Finally, the proposed MS decoder architectures by the nonuniform quantization scheme provide significant reductions of 20% and up to 8% for the memory area and combinational logic area, respectively, compared to the conventional 5-bit ones.   相似文献   

18.
Vector processors (supercomputers) can be effectively employed in MIC or MMIC applications to solve problems of large numerical size such as broad-band nonlinear design or statistical design (yield optimization). In order to fully exploit the capabilities of a vector hardware, any program architecture must be structured accordingly. This paper presents a possible approach to the "semantic" vectorization of microwave circuit design software. Speed-up factors of the order of 50 can be obtained on a typical vector processor (Cray X-MP), with respect to the most powerful scalar computers (CDC 7600), with cost reductions of more than one order of magnitude. This could broaden the horizon of microwave CAD techniques to include problems that are practically out of the reach of conventional systems.  相似文献   

19.
基于SET的I-V特性以及SET与MOS管互补的特性,以MOS管的逻辑电路为设计思想,首先提出了一个SET/MOS混合结构的反相器,进而推出或非门电路,并最终实现了一个唯一地址译码器.通过SET和MOS管两者的混合构建的电路与纯SET实现的电路相比,电路的带负载能力增强;与纯MOS晶体管实现的电路相比,电路同样仅需要单电源供电,且元器件数目得到了减少,电路的静态功耗大大降低.仿真结果验证了电路设计的正确性.  相似文献   

20.
A queue typically connects a producer and a consumer and improves the overall performance by smoothening irregular production and consumption over time. In this paper, we introduce so-called multiple-input–multiple-output (MIMO) queues, connecting $N_{P}$ producers with $N_{C}$ consumers, that are symmetric, scalable, and have a high throughput. MIMO queues can be used to perform fine-grained load balancing and are proposed as key building blocks for variation-tolerant architectures. We present and analyze a family of asynchronous MIMO queues.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号