首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The module library for the Cathedral-II synthesis environment is discussed. The underlying architectural style of the environment is defined as a hierarchical composition of flexible and parameterizable data paths, microcoded control units, interprocessor communication protocols, and input/output interfaces. A data path is called an execution unit (EXU), which consists of three parts: an input block, an output block, and a core. Only the core varies for the different EXUs. The topology and functionality of an EXU can be influenced by a set of parameters. The module library consists of two parts, a leaf-cell library and a procedure library, to place and interconnect the leaf cells to create functional building blocks that compose one EXU. The EXUs are guaranteed to work at 10 MHz. The described modules have been implemented and tested. The main features of this system are a very powerful parameterization, the technology independence of the CAD tools, and the generation speed of the modules. The current library is specially dedicated towards application-specific IC (ASIC) customized processors, although it can be used for more hardwired architectures, oriented towards higher throughput  相似文献   

2.
Multi-V/sub DD/ design is an effective way to reduce power consumption, but the need for level conversion imposes delay and energy penalties that limit the potential gains. In this paper, we describe new level converting circuits that provide 10%-61% lower energy consumption at equivalent or better speeds compared to those available in the literature. Furthermore, we make the argument that level converters should be evaluated largely by their maximum speed since slower level converters consume valuable timing slack that can be used to reduce the energy of other gates in the circuit. Based on this criterion, we find the new structures to offer up to a 25% speed improvement over conventional level converters. Using an efficient dual V/sub DD/ voltage assignment algorithm, we show that this speed improvement can yield a reduction of up to 7.3% in total circuit power in small benchmark circuits. We also propose embedding the functionality of logic gates into the level converting circuits. For typical values of the second supply voltage, this technique can reduce delay by 15% at constant energy or lower energy by up to 30% at fixed delay.  相似文献   

3.
This work presents an effective way for evaluating and validating ensembles of combinational CMOS gates and logic cell libraries. The major contributions include an innovative design methodology for such a kind of test vehicle, as well as a simple and flexible multi-operating mode circuit architecture. The resulting circuit is quite useful for cell library verification at different levels: in the EDA environment and on silicon prototyping. The proposed methodology can be applied for analysis taking into account the logic gate functionality, timing performance, power consumption and circuit operating impact of nanometer aging effects. Simulation results demonstrate the circuit operation, features and facilities described herein.  相似文献   

4.
In this brief, we present a high-speed AES IP-core, which runs at 880 MHz on a 0.13-/spl mu/m CMOS standard cell library, and which achieves over 10-Gbps throughput in all encryption modes, including cipher block chaining (CBC) mode. Although the CBC mode is the most widely used and important, achieving such high throughput was difficult because pipelining and/or loop unrolling techniques cannot be applied. To reduce the propagation delays of the S-Box, the slowest function block, we developed a special circuit architecture that we call twisted-binary decision diagram (BDD), where the fanout of signals is distributed in the S-Box circuit. Our S-Box is 1.5 to 2 times faster than the conventional S-Box implementations. The T-Box algorithm, which merges the S-Box and another primitive function (MixColumns) into a single function, is also used for an additional speedup.  相似文献   

5.
The use of a realistic component library with multiple implementations of operators results in cost-efficient designs; slow components can then be used on noncritical paths and the more expensive components on only the critical paths. This paper presents a cost-optimized algorithm for selecting components and pipelining a data-flow graph, given such a library, and throughput and latency constraints. Experimental results on several large examples indicate the importance of component selection as a parameter in design exploration  相似文献   

6.
Instruction level power analysis and optimization of software   总被引:4,自引:0,他引:4  
The increasing popularity of power constrained mobile computers and embedded computing applications drives the need for analyzing and optimizing power in all the components of a system. Software constitutes a major component of today's systems, and its role is projected to grow even further. Thus, an ever increasing portion of the functionality of today's systems is in the form of instructions, as opposed to gates. This motivates the need for analyzing power consumption from the point of view of instructions—something that traditional circuit and gate level power analysis tools are inadequate for. This paper describes an alternative, measurement based instruction level power analysis approach that provides an accurate and practical way of quantifying the power cost of soft-ware. This technique has been applied to three commercial, architecturally different processors. The salient results of these analyses are summarized. Instruction level analysis of a processor helps in the development of models for power consumption of software executing on that processor. The power models for the subject processors are described and interesting observations resulting from the comparison of these models are highlighted. The ability to evaluate software in terms of power consumption makes it feasible to seach fow low power implementations of given programs. In addition, it can guide the development of general tools and techniques for low power software. Several ideas in this regard as motivated by the power analysis of the subject processors are also described.  相似文献   

7.
We present results from an analogue Radial Basis Function chip which is based upon a compact Euclidean distance calculating circuit. Floating-gate devices are used to program the circuit, and also to compensate for interdevice threshold variations. Chip measurements confirm the functionality of the circuits. Simulations suggest that a large-scale implementation of an RBF system using this architecture would consume only a fraction of the power of comparable digital implementations.  相似文献   

8.
This paper presents the realization of a fault tolerance technique for a dynamically reconfigurable array of programmable cells. The three parts of the technique, fault detection, fault reconfiguration, and fault recovery, are implemented completely in hardware and form a self-contained system. Each of the parts can be exchanged by an alternative implementation without affecting the remaining parts too much, thus making the concept adaptable to different reconfigurable circuits. A hardware realization for the core mechanism is discussed and a prototypical design of a field-programmable gate array implementing the complete system is described. The technological development towards nanoscale feature sizes and the growing influence of deep-submicrometer effects will result in an inherent unreliability of the individual components of future circuit implementations and a higher vulnerability towards external influences. The technique discussed can be used to exploit dynamic reconfiguration capabilities of programmable arrays to alleviate system vulnerability towards these effects and thus to enhance their overall reliability.  相似文献   

9.
10.
Recently, ISO/IEC standardized a dataflow-programming framework called Reconfigurable Video Coding (RVC) for the specification of video codecs. The RVC framework aims at providing the specification of a system at a high abstraction level so that the functionality (or behavior) of the system become independent of implementation details. The idea is to specify a system so that only intrinsic features of the algorithms are explicitly expressed, whereas implementation choices can then be made only once specific target platforms have been chosen. With this system design approach, one abstract design can be used to automatically create implementations towards multiple target platforms. In this paper, we report our investigations on applying the methodology standardized by the MPEG RVC framework to develop secure computing in the domains of cryptography and multimedia security, leading to the conclusion that the RVC framework can successfully be applied as a general-purpose framework to other fields beyond multimedia coding. This paper also highlights the challenges we faced in conducting our study, and how our study helped the RVC and the secure computing communities benefited from each other. Our investigations started with the development of a Crypto Tools Library (CTL) based on RVC, which covers a number of widely used ciphers and cryptographic hash functions such as AES, Triple DES, ARC4 and SHA-2. Performance benchmarking results on the RVC-based AES and SHA-2 implementations in both C and Java revealed that the automatically generated implementations can achieve a comparable performance to some manually written reference implementations. We also demonstrated that the RVC framework can easily produce implementations with multi-core support without any change to the RVC code. A security protocol for mutual authentication was also implemented to demonstrate how one can build heterogeneous systems easily with RVC. By combining CTL with Video Tool Library (a standard library defined by the RVC standard), a non-standard RVC-based H.264/AVC encoder and a non-standard RVC-based JPEG codec, we further demonstrated the benefits of using RVC to develop different kinds of multimedia security applications, which include joint multimedia encryption-compression schemes, digital watermarking and image steganography in JPEG compressed domain. Our study has shown that RVC can be used as a general-purpose implementation-independent development framework for diverse data-driven applications with different complexities.  相似文献   

11.
This paper proposes a novel nonlinear modulating function approach for generating n-scroll chaotic attractors based on a general jerk circuit. The systematic nonlinear modulating function methodology developed here can arbitrarily design the swings, widths, slopes, breakpoints, equilibrium points, shapes, and even the general phase portraits of the n-scroll chaotic attractors by using the adjustable sawtooth wave, triangular wave, and transconductor wave functions. The dynamic mechanism and chaos generation condition of the general jerk circuit are further investigated by analyzing the system stability. A simple block circuit diagram, including integrator, sawtooth wave and triangular wave generators, buffer, switch linkages, and voltage-current conversion resistors, is designed for the hardware implementations of various 3-12-scroll chaotic attractors via switchings of the switch linkages. This is the first time to experimentally verify a 12-scroll chaotic attractor generated by an analog circuit. In particular, the recursive formulas of system parameters and real physical circuit parameters are rigorously derived for the hardware implementations of the n-scroll chaotic attractors. Moreover, the adjustability of the nonlinear modulating function and the rigorous recursive formulas together provide a theoretical principle for the hardware implementations of various chaotic attractors with a large number of scrolls.  相似文献   

12.
The use of generic models in the synthesis of FMS systems, which allows for rapid modelling and analysis, does not ease the verification task difficulty. Even though generic modules can be verified separately, the verification of the interconnections between modules requires the whole model to be considered. A potential solution is to replace the generic modules with their functional abstractions which realize the external functional behavior of these modules. The number of places and transitions involved in realizing the required functionality is, typically, a fraction of that used to represent complete components. This reduces the complexity of the components of the modelled system, and thus the complexity of the verification model. The verification task can then focus on the correctness of the interfaces, rather then on the internal nature of the components. In this paper, for a class of Petri net models, which can be used to represent the primary components of AGV based FMS systems, a method that allows one to systematically construct functional abstractions is presented  相似文献   

13.
A practical system approach for time-multiplexing cellular neural network (CNN) implementations suitable for processing large and complex images using small CNN arrays is presented. For real size applications, due to hardware limitations, it is impossible to have a one-on-one mapping between the CNN hardware cells and all the pixels in the image involved. This paper presents a practical solution by processing the input image, block by block, with the number of pixels in a block being the same as the number of CNN cells in the array. Furthermore, unlike other implementations in which the output is observed at the hard-limiting block, the very large scale integrated (VLSI) architecture hereby described monitors the outputs from the state node. While previous implementations are mostly suitable for black and white applications because of the thresholded outputs, our approach is especially suitable for applications in color (gray) image processing due to the analog nature of the state node. Experimental complementary metal-oxide-semiconductor (CMOS) chip results in good agreement with theoretical results are presented  相似文献   

14.
Due to shrinking feature sizes in integrated circuits, additional reliability effects have to be considered which influence the functionality of the system. These effects can either result from the manufacturing process or external influences during the lifetime such as radiation and temperature. Additionally, modern technology nodes are affected by time-dependent degradation i.e. aging.Due to the age-dependent degradation of a circuit, processes on the atomic scale of the semiconductor material lead to charges in the oxide silicon interface of CMOS devices, altering the performance parameters of the device and subsequently the behavior of the circuit. With the continuous downscaling of modern semiconductor technologies, the impact of these atomic scale processes affecting the overall system characteristics becomes more and more critical. Therefore, aging effects need to be assessed during the design phase and actions have to be taken guaranteeing the correct system functionality throughout a system's lifetime.This work presents methods to investigate the influence of age-dependent degradation as well as process-variability on different levels. An operating-point dependent sizing methodology based on the gm/ID-method extended to incorporate aging, which aims at developing aging-resistent circuits is presented. Additionally, the sensitivity of circuit performances in regard to aging can be determined. In order to investigate the reliability of a complex system on behavioral level, a modeling method to represent the performance of system components in dependence of aging and process variability is introduced.  相似文献   

15.
Registers are one of the circuit elements that can be affected by soft errors. To ensure that soft errors do not affect the system functionality, Triple Modular Redundancy (TMR) is commonly used to protect registers. TMR can effectively protect against errors affecting a single flip-flop and has a low overhead in terms of circuit delay. The main drawback of TMR is that it requires more than three times the original circuit area as the flip-flops are triplicated and additional voting logic is inserted. Another alternative is to protect registers using Error Correction Codes (ECCs), but those typically require a large circuit delay overhead and are not suitable for high speed implementations. In this paper, DMR + an alternative to TMR to protect registers in FPGAs, is presented. The proposed scheme exploits the FPGA structure to achieve a reduction in the FPGA resources (LUTs and Flip-Flops) at the cost of a certain overhead in delay. DMR + can correct all single bit errors like TMR but is more vulnerable to multiple bit errors. To evaluate the benefits, the DMR + technique has been implemented and compared with TMR considering standalone registers and also some simple designs.  相似文献   

16.
The Fast Fourier Transform (FFT) is widely used in various digital signal processing applications. The performance requirements for FFT in modern real-time applications has increased dramatically due to the high demand on capacity and performance of modern telecommunication systems, where FFT plays a major role. Software implementations of FFT running on a general purpose computer can no longer meet current speed requirements. However, recent advances in VLSI technology have made it possible to implement the entire FFT system on a single silicon substrate. This article presents a column FFT design suitable for ULSI (Ultra Large Scale Integration) implementations. The basic building block is a 64-point column FFT. FFTs with longer transform lengths can be easily realized using the 64-point column FFT building block. The butterfly processors in the column FFT are connected using circuit switching networks. The circuit switching networks not only provide dynamically recon-figurable interconnections among the butterfly processors, but also provide a fault-tolerant capability. Bit-serial arithmetic is used in the architecture. Assuming the data word length is 16 bits, the 1024-point column FFT engine proposed in this article is capable of processing 1024 complex data samples in 533 clock cycles. If the clock frequency is 40 MHz, it will take 13.3 µs to complete a 1024-point FFT.  相似文献   

17.
《Solid-state electronics》2006,50(7-8):1252-1260
A technique for modeling the effect of variations in multiple process parameters on circuit delay performance is proposed. The variation in saturation current Ion at the device level, and the variation in rising/falling edge stage delay for the NAND gate at the circuit level, are taken as performance metrics. The delay of a two-input NAND gate with 65 nm gate length transistors is extensively characterized by mixed-mode simulations, which is then used as a library element. Appropriate templates for the NAND gate library are incorporated in a general purpose circuit simulator SEQUEL. A 4-bit × 4-bit Wallace tree multiplier circuit, consisting of two-input NAND gates is used to demonstrate the proposed methodology. The variation in the multiplier delay is characterized, by generating delay distributions, using an extensive Monte Carlo analysis. The use of linear interpolation and linear superposition is evaluated to study simultaneous variations in two and more process parameters. An analytical model for gate delays, in terms of device drive current Ion, is proposed, which can be used to extend this methodology for a generic technology library with a variety of library elements. The model is validated against Monte Carlo simulations and is shown to have a typical error of less than 0.1% for simultaneous variations in multiple process parameters. The proposed methodology can be used for statistical timing analysis and circuit simulation at the gate level.  相似文献   

18.
Interpolation in digital modems. II. Implementation and performance   总被引:1,自引:0,他引:1  
For pt.I, see ibid., vol.41, no.3, p.502-208 (1993). Properties of a specific class of interpolators that are based upon polynomials are discussed. Several implementations are described, one of which is particularly convenient in practical hardware. Simulations demonstrate that simple interpolators give excellent performance. In many cases, two-point, linear interpolation is adequate. If better performance is needed, classical four-point, third-order polynomials could be used. Better yet, a novel four-point interpolating filter with piecewise-parabolic impulse response can have performance superior to that of the standard cubic interpolator and still be implemented much more simply. The NCO-based control method presented in Part I is shown to be equivalent to a conventional phase locked loop and its operation is verified by simulation  相似文献   

19.
20.
In this paper, two novel application circuits utilizing the differential voltage current conveyor (DVCC) are introduced and implemented. To the best knowledge of authors, this is the first reported monostable multivibrators employing DVCC device. Each presented circuit is constructed by single DVCC as the basic active building block together with a few passive components. Both of them can be operated via a positive-edge triggering signal to generate a pulse waveform with an adjustable width. The first one is a general monostable circuit. The second design is an improved construction, which shortens the recovery time for applying the consecutive triggering signals. The circuit operations are first described and then the non-ideal issues and design considerations of the proposed circuits are discussed. To demonstrate their feasibility, the presented circuits are simulated using circuit simulation program Is-Spice. Available commercial ICs and discrete components are used to implement the prototype circuits. Simulation and experimental results agree well with the theoretical analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号