首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Memory and communication architecture have a significant impact on the performance, cost, and power of complex multiprocessor system-on-chip designs. In this paper, we present an automated bus matrix synthesis flow for efficient transaction-level design space exploration of communication architecture in a reconfigurable multimedia system-on-chip platform. Specifically, we consider hardware interface selection problem, which has significant effect on the overall cost of area and power. We propose a method to solve such hardware interface selection problem through static analysis of communication behavior. We experiment with JPEG encoder and H.264 encoder examples and the results show the reduction of area by 56.91% and power by 48.61% of bus matrix with 0.58% performance overhead on average compared to the case of maximum performance. According to our HW interface selection algorithm, we also experiment MPEG4 video decoder example. And the result is evaluated on the FPGA prototyping board.  相似文献   

2.
On-chip communication architectures have a significant impact on the power consumption and performance of emerging chip multiprocessor (CMP) applications. However, customization of such architectures for an application requires the exploration of a large design space. Designers need tools to rapidly explore and evaluate relevant communication architecture configurations exhibiting diverse power and performance characteristics. In this paper, we present an automated framework for fast system-level, application-specific, power–performance tradeoffs in a bus matrix communication architecture synthesis (CAPPS). Our study makes two specific contributions. First, we develop energy models for system-level exploration of bus matrix communication architectures. Second, we incorporate these models into a bus matrix synthesis flow that enables designers to efficiently explore the power–performance design space of different bus matrix configurations. Experimental results show that our energy macromodels incur less than 5% average cycle energy error across 180–65 nm technology libraries. Our early system-level power estimation approach also shows a significant speedup ranging from 1000 to 2000 $ times$ when compared with detailed gate-level power estimation. Furthermore, on applying our synthesis framework to three industrial networking CMP applications, a tradeoff space that exhibits up to 20% variation in power and up to 40% variation in performance is generated, demonstrating the usefulness of our approach.   相似文献   

3.
A detailed study on the performance analysis and optimum design of an integrated front-end PIN/HBT photoreceiver for fiber-optic communication is presented. Receiver circuits with two different transimpedance amplifiers-a single-stage common emitter (CE) amplifier and a three-stage amplifier comprising a CE amplifier and two emitter followers (EFs), are analyzed assuming a standard load of 50 /spl Omega/. A technique to include the transit-time effect of a PIN photodetector on the overall receiver circuit analysis is introduced and discussed. Gain-bandwidth product (GB) and gain-bandwidth-sensitivity measure product (GBS) are obtained as functions of feedback resistance (R/sub F/) and various device parameters. Hence, some optimum designs are suggested using a photodetector of area 100 /spl mu/m/sup 2/ and with a feedback resistance of 500 /spl Omega/. The bandwidth plays a major role in determining the optimum designs for maximum GB and maximum GBS. A bandwidth >8 GHz has been obtained for the photoreceiver even with a single-stage CE amplifier. The optimum design for a receiver with a three-stage amplifier shows a bandwidth of 35 GHz which is suitable for receivers operating well beyond 40 Gb/s; however, in this case, the gain is reduced. The performance of different fixed square-emitter structures are investigated to choose the optimum designs corresponding to different gains. Very low power dissipation has been estimated for the optimized devices. The noise performance of the devices with optimum designs was calculated in terms of the minimum detectable optical power for a fixed bit-error rate of 10/sup -9/. The present design indicates that GB and noise performance can be improved by using an optimum device design.  相似文献   

4.
For many years, discrete gate sizing has been widely used for timing and power optimization in VLSI designs. The importance of gate sizing optimization has been emphasized by academia for many years, especially since the 2012/2013 ISPD gate sizing contests [1, 2]. These contests have provided practical impetus to academic sizers through the use of realistic constraints and benchmark formats. At the same time, due to simplified delay/power Liberty models and timing constraints, the contests fail to address real-world criteria for gate sizing that are highly challenging in practice. We observe that lack of consideration of practical issues such as electrical and multi-corner constraints – along with limited sets of benchmarks – can misguide the development of contest-focused academic sizers. Thus, we study implications of the “gap” between academic sizers and product design use cases. In this paper, we note important constraints of modern industrial designs that are generally not comprehended by academic sizers. We also point out that various optimization techniques used in academic sizers can fail to offer benefits in product design contexts due to differences in the underlying optimization formulation and constraints. To address this gap, we develop a new robust academic sizer, Sizer, from a fresh implementation of Trident [3]. Experimental results show that Sizer is able to achieve up to 10% leakage power and 4% total power reductions compared to leading commercial tools on designs implemented with foundry technologies, and 7% leakage power reduction on a modern industrial design in the multi-corner multi-mode (MCMM) context.  相似文献   

5.
In this paper, we propose a system-level design methodology for the efficient exploration of the architectural parameters of the memory sub-systems, from the energy-delay joint perspective. The aim is to find the best configuration of the memory hierarchy without performing the exhaustive analysis of the parameters space. The target system architecture includes the processor, separated instruction and data caches, the main memory, and the system buses. To achieve a fast convergence toward the near-optimal configuration, the proposed methodology adopts an iterative local-search algorithm based on the sensitivity analysis of the cost function with respect to the tuning parameters of the memory sub-system architecture. The exploration strategy is based on the Energy-Delay Product (EDP) metric taking into consideration both performance and energy constraints. The effectiveness of the proposed methodology has been demonstrated through the design space exploration of a real-world case study: the optimization of the memory hierarchy of a MicroSPARC2-based system executing the set of Mediabench benchmarks for multimedia applications. Experimental results have shown an optimization speedup of 2 orders of magnitude with respect to the full search approach, while the near-optimal system-level configuration is characterized by a distance from the optimal full search configuration in the range of 2%.  相似文献   

6.
Relative timing (RT) is introduced as a method for asynchronous design. Timing requirements of a circuit are made explicit using relative timing. Timing can be directly added, removed, and optimized using this style. RT synthesis and verification are demonstrated on three example circuits, facilitating transformations from speed-independent circuits to burst-mode and pulse-mode circuits. Relative timing enables improved performance, area, power, and functional testability of up to a factor of 3/spl times/ in all three cases. This method is the foundation of optimized timed circuit designs used in an industrial test chip, and may be formalized and automated.  相似文献   

7.
Typical design flows supporting the software development for multiprocessor systems are based on a board support package and high-level programming interfaces. These software design flows fail to support critical design activities, such as design space exploration or software synthesis. One can observe, however, that design flows based on a formal model of computation can overcome these limitations. In this article, we analyze the major challenges in multiprocessor software development and present a taxonomy of software design flows based on this analysis. Afterwards, we focus on design flows based on the Kahn process network (KPN) model of computation and elaborate on corresponding design automation techniques. We argue that the productivity of software developers and the quality of designs could be considerably increased by making use of these techniques.  相似文献   

8.
We propose an integrated framework for the design of SOC test solutions, which includes a set of algorithms for early design space exploration as well as extensive optimization for the final solution. The framework deals with test scheduling, test access mechanism design, test sets selection, and test resource placement. Our approach minimizes the test application time and the cost of the test access mechanism while considering constraints on tests and power consumption. The main feature of our approach is that it provides an integrated design environment to treat several different tasks at the same time, which were traditionally dealt with as separate problems. We have made an implementation of the proposed heuristic used for the early design space exploration and an implementation based on Simulated Annealing for the extensive optimization. Experiments on several benchmarks and industrial designs show the usefulness and efficiency of our approach.  相似文献   

9.
Two new domino structures with improved logic evaluation accelerators and output drivers are proposed. They achieve ~30% performance improvement with negligible power increase and small area penalty  相似文献   

10.
Chang  Y.-J. 《Electronics letters》2009,45(6):300-302
A leakage suppressed ternary content-addressable memory (TCAM) cell design is introduced, in which `don?t care? information is used to minimise the leakage power dissipated in the prefix CAM. The measurements based on 90 nm process technology show that without any performance penalty the design can deliver a leakage power reduction of 18%.  相似文献   

11.
The main motivation of this paper is related to the lack of a high-level design flow for field-programmable gate array (FPGA) partial dynamic reconfiguration management. Our contribution consists in proposing a high-level add-on methodology to the Xilinx’s design flow for dynamic partial reconfiguration (DPR). The main objective is to give an abstract view of the developed application in order to facilitate the designer task. The suggested design flow offers an application-centric view on dynamic reconfiguration designs, which permits simplifying the optimisation and generation of such designs. A new formulation of the reconfigurable modules’ mapping process is put forward. This allows a design space exploration so as to find the convenient number of reconfigurable regions and their sizes as well as the reconfiguration sequence. A new tool was proposed to support our methodology by allowing creating and synthesising graphical models of the developed application. We introduce a new block diagram to represent this latter and a sequence model that can be used for the design optimisations. To validate the proposed DPR design environment, two application examples are given at the end of the paper. They demonstrate the usefulness of the suggested models and methods.  相似文献   

12.
Balanced phase-locked loops for optical homodyne receivers are investigated. When a balanced loop is employed in a communications system, a part of the transmitter power must be used for unmodulated residual carrier transmission. This leads to a power penalty. In addition, the performance of the balanced loops is affected by the laser phase noise, by the shot noise, and by the crosstalk between the data-detection- and phase-lock-branches of the receiver. The impact of these interferences is minimized if the loop bandwidthBis optimized. The value of Boptand the corresponding optimum loop performance are evaluated in this paper. Further, the maximum permissible laser linewidthdeltanuis evaluated and found to be5.9 times 10^{-6}times Rb, where Rb(bit/s) is the system bit rate. This number corresponds toBER = 10^{-10}and power penalty of 1 dB (0.5 dB due to residual carrier transmission, and 0.5 dB due to imperfect carrier phase recovery). For comparison, decision-driven phase-locked loops require onlydeltanu = 3.1 times 10^{-4}. R_{b}. Thus, balanced loops impose more stringent requirements on the laser linewidth than decision-driven loops, but have the advantage of simpler implementation. An important additional advantage of balanced loops is their capability to suppress the excess intensity noise of semiconductor lasers.  相似文献   

13.
A Carry-Select Adder (CSA) is one of the most suitable adders for high-speed applications, but the power and area penalties are greater, because it requires a double Ripple-Carry Adder (RCA) structure corresponding to carry inputs 0 and 1. Current low-power and low-area techniques are not suitable for a standard cell-based design which is one of the widely adopted design methodologies. Our work proposes two simple optimised architectures suitable for standard cell-based designs. A simple decision logic that replaces the RCA for Carry input 1 in a conventional CSA is proposed. One of the proposed architectures reduces power and area significantly with a small delay penalty compared to the existing techniques. Another proposed architecture improves the speed of operation and reduces the power and area considerably. The first one is more suitable for high-speed arithmetic in battery-operated applications where there is a trade-off between speed and power, while the other one is suitable for high-performance applications which also require area and power optimisation. The proposed architectures were implemented in TSMC 0.18um CMOS technology, and compared with conventional Square Root Carry-Select Adders and an existing standard cell-based design.  相似文献   

14.
Power-efficient design of multicast wavelength-routed networks   总被引:13,自引:0,他引:13  
In this paper, we introduce the power-efficient design space for multicast wavelength-routed networks. The power-efficient design space is based on the impact of power on the overall design of wavelength-routed networks. Two cross-connect architectures on this design concept are investigated. One is an existing architecture called splitter-and-delivery (SaD). The other is a new architecture called multicast-only splitter-and-delivery (MOSaD). The MOSaD architecture uses power splitters for multicast connections only, allowing unicast connections to pass without enduring unnecessary power losses. Our cross-connect design provides a strictly nonblocking service for unicast connections while eliminating unnecessary power loss of the SaD cross-connect. Experimental results demonstrate that the MOSaD architecture provides substantial savings in cost and reduction in signal power loss with minimal effects on the blocking performance of the network.  相似文献   

15.
Design optimization for performance enhancement in analog and mixed-signal circuits is an active area of research as technology scaling is moving towards the nanometer scale. This paper presents an approach towards the efficient simulation and characterization of mixed-signal circuits, using a 45 nm CMOS voltage controlled oscillator (VCO) with frequency divider as a case study. The performance characteristics of the analog and digital blocks in the circuit are simulated and the accuracy issues arising due to separate analog and digital simulation engines are considered. The tremendous impact of gate tunneling current on device performance is quantitatively analyzed with the help of an “effective tunneling capacitance”, which allows accurate modeling and simulation of digital blocks with almost analog accuracy. To meet the design specifications of the analog VCO using digital CMOS technology, we follow a design of experiments (DOE) approach. The functional specifications of the VCO optimized in this design are the center frequency and minimization of overall power consumption as well as minimization of power due to gate-oxide tunneling current leakage, a component that was not important in previous generations of CMOS technologies but is dominant at 45 nm and below. Due to the large number of available design parameter (gate-oxide thickness and transistor sizes), the concurrent achievement of all optimization goals is difficult. A DOE approach is shown to be very effective and a viable alternative to standard design exploration in the nanometer regime.  相似文献   

16.
介绍一种新型远红外声光调制器,采用高品质因子Ge单晶(III)方向传声作为声光介质,铌酸锂晶体作为压电换能器,工作中心频率为80MHz;还介绍了两种散热结构,经测试比较,环绕散热结构比背面散热结构的衍射效率提高10%,而且光斑质量也明显变好,较大地提高了远红外声光调制器的性能,利用10W高频驱动功率,测得锗声光器件的峰值衍射效率为65%。它的成功为声光调制器在激光雷达、远距离目标跟踪等方面的远红外技术应用,具有实际意义。  相似文献   

17.
18.
The increasing demand on low-power applications is adding pressure on circuit designers to come out with new circuit styles that can decrease power dissipation while making use of the performance improvement of the new CMOS technologies. Multi-threshold MOS current mode logic (MTMCML) appears to be a solution to this problem by making use of the high-performance of MOS current mode circuits while minimizing power dissipation with the help of multi-threshold CMOS technologies. In this work, analytical formulations, based on the BSIM3v3 model, are proposed for MTMCML performance measures with an error within 10% compared to HSPICE. The formulation helps designers to efficiently design MTMCML circuits without undergoing the time-consuming HSPICE simulations. Furthermore, it provides design guidelines and aids for designers to fully understand the different tradeoffs in MTMCML design. In addition, the analysis is extended to study the impact of technology scaling and parameter variations on MTMCML. It is shown that the worst case variation in the minimum supply voltage of MTMCML is 1.16%, thus suggesting maximal power saving.  相似文献   

19.
This work presents a novel approach to optimize digital integrated circuits yield referring to speed, dynamic power and leakage power constraints. The method is based on process parameter estimation circuits and active control of body bias performed by an on-chip digital controller. The associated design flow allows us to quantitatively predict the impact of the method on the expected yield in a specific design. We present the architecture scheme, the theoretical foundation, the estimation circuits used, and two application case studies, referring to an industrial 0.13-/spl mu/m CMOS process data. The approach results to be remarkably effective at high operating temperature. In the presented case study, initial yields below 14% are improved to 86% by using a single controller and a single set of estimation circuits per die.  相似文献   

20.
This paper presents accurate area, time, power estimation models for implementations using FPGAs from the Xilinx Virtex-2Pro family (Deng et al. 2008). These models are designed to facilitate efficient design space exploration in an automated algorithm-architecture codesign framework. Detailed models for estimating the number of slices, block RAMs and 18×18-bit multipliers for fixed point and floating point IP cores have been developed. These models are also utilized to develop power models that consider the effect of logic power, signal power, clock power and I/O power. Timing models have been developed to predict the latency of the fixed point and floating point IP cores. In all cases, the model coefficients have been derived by using curve fitting or regression analysis. The modeling error is quite small for single IP cores; the error for the area estimate, for instance, is on the average 0.95%. The error for fairly large examples such as floating point implementation of 8-point FFTs is also quite small; it is 1.87% for estimation of number of slices and 3.48% for estimation of power consumption. The proposed models have also been integrated into a hardware-software partitioning tool to facilitate design space exploration under area and time constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号