首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A low-power, area-efficient four-way 32-bit multifunction arithmetic unit has been developed for programmable shaders for handheld 3D graphics systems. It adopts the logarithmic number system (LNS) at the arithmetic core for the single-cycle throughput and the small-size low-power unification of various complicated arithmetic operations such as power, logarithm, trigonometric functions, vector-SIMD multiplication, division, square root and vector dot product. 24-region and 16-region piecewise linear logarithmic and antilogarithmic converters are proposed with 0.8% and 0.02% maximum conversion error, respectively. All the supported operations are implemented with less than 6.3% operation error and unified into a single arithmetic platform with maximum four-cycle latency and single-cycle throughput. A 93 K gate test chip is fabricated using one-poly five-metal 0.18-mum CMOS technology. It operates at 210 MHz with maximum power consumption of 15.3 mW at 1.8 V.  相似文献   

2.
A high speed redundant binary (RB) architecture, which is optimized for the fast CMOS parallel multiplier, is developed. This architecture enables one to convert a pair of partial products in normal binary (NB) form to one RE number with no additional circuit. We improved the RB adder (RBA) circuit so that it can make a fast addition of the RB partial products. We also simplified the converter circuit that converts the final RE number into the corresponding NE number. The carry propagation path of the converter circuit is carried out with only multiplexer circuits. A 54×54-bit multiplier is designed with this architecture. It is fabricated by 0.5 μm CMOS with triple level metal technology. The active area size is 3.0×3.08 mm2 and the number of transistors is 78,800. This is the smallest number for all 54×54-bit multipliers ever reported. Under the condition of 3.3 V supply voltage, the chip achieves 8.8 ns multiplication time. The power dissipation of 540 mW is estimated for the operating frequency of 100 MHz. These are, so far, the fastest speed and the lowest power for 54×54-bit multipliers with 0.5-μm CMOS  相似文献   

3.
A high-speed 4-bit ALU, 4×4-bit multiplier, and 8×8-bit multiplier/accumulator have been implemented in low-power GaAs enhanced/depletion E/D direct-coupled FET logic (DCFL). Circuits are fabricated with a high-yield titanium tungsten nitride self-aligned gate MESFET process. The 4-bit ALU performs at up to 1.2 GHz with only 131-mW power dissipation. The multiplication time for the 4×4-bit array multiplier is 940 ps, which is the fastest multiplication time reported for any semiconductor technology. The 8×8-bit two's complement multiplier/accumulator uses 4278 FETs (1317 logic gates) and exhibits a multiplication time of 3.17 ns. the fastest yet reported for a multiplier of this type. Yield on the best wafer for the 4×4-bit and 8×8-bit circuits is 94 and 43%, respectively. A digital arithmetic subsystem has been demonstrated, consisting of the 8×8-bit multiplier/accumulator, two of the 4-bit ALUs, three logical multiplexers, and a logical demultiplexer. The subsystem performs arithmetic and logic functions required in signal processing at clock rates as high as 325 MHz  相似文献   

4.
A novel mesochronous pipelining scheme is described in this paper. In this scheme, data and clock travel together. At any given time a pipeline stage could be operating on more than one data wave. The clock period in the proposed pipeline scheme is determined by the pipeline stage with largest difference between its minimum and maximum delays. This is a significant performance gain compared to conventional pipeline scheme where clock period is determined by the stage with the largest delay. A detailed analysis of the clock period constraints is provided to show the performance gains and Speedup of mesochronous pipelining over other pipelining schemes. Also, the number of pipeline stages and pipeline registers is small. The clock distribution scheme is simple in the mesochronous pipeline architecture. An 8 /spl times/ 8-bit carry-save adder multiplier has been implemented in mesochronous pipeline architecture using modest TSMC 180-nm (drawn length 200 nm) CMOS technology. The multiplier architecture and simulation results are described in detail in this paper. The pipelined multiplier is able to operate on a clock period of 350 ps (2.86 GHz). This is a Speedup of 1.7 times over conventional pipeline scheme, with fewer pipeline stages and pipeline registers.  相似文献   

5.
Design and implementation of a high-speed multiplexer-based fine-grain pipelined architecture for a general digital fuzzy logic controller has been presented. All the operators have been designed at gate level. For the multiplication, a multiplexer-based modified Wallace tree multiplier has been designed, and for the division and addition multiplexer-based non-restoring parallel divider and multiplexer-based Manchester adder have been used, respectively. To further increase the processing speed, fine-grain pipelining technique has been employed. By using this technique, the critical path of the circuit is broken into finer pieces. Based on the proposed architecture, and by using Quartus II 9.1, a sample two-input, one-output digital fuzzy logic controller with eight rules has been successfully synthesised and implemented on Stratix II field programmable gate array. Simulations were carried out using DSP Builder in the MATLAB/Simulink tool at a maximum clock rate of 301.84 MHz.  相似文献   

6.
王韧  刘敬波  秦玲  陈勇  赵建民 《微电子学》2006,36(5):651-654,658
设计了一种3.3 V 9位50 MS/s CMOS流水线A/D转换器。该A/D转换器电路采用1.5位/级,8级流水线结构。相邻级交替工作,各级产生的数据汇总至数字纠错电路,经数字纠错电路输出9位数字值。仿真结果表明,A/D转换器的输出有效位数(ENOB)为8.712位,信噪比(SNR)为54.624 dB,INL小于1 LSB,DNL小于0.6 LSB,芯片面积0.37 mm2,功耗仅为82 mW。  相似文献   

7.
Floating-point (FP) multipliers are the main energy consumers in many modern embedded digital signal processing (DSP) and multimedia systems. For lossy applications, minimizing the precision of FP multiplication operations under the acceptable accuracy loss is a well-known approach for reducing the energy consumption of FP multipliers. This paper proposes a multiple-precision FP multiplier to efficiently trade the energy consumption with the output quality. The proposed FP multiplier can perform low-precision multiplication that generates 8?, 14?, 20?, or 26-bit mantissa product through an iterative and truncated modified Booth multiplier. Energy saving for low-precision multiplication is achieved by partially suppressing the computation of mantissa multiplier. In addition, the proposed multiplier allows the bitwidth of mantissa in the multiplicand, multiplier, and output product to be dynamically changed when it performs different FP multiplication operations to further reduce energy consumption. Experimental results show that the proposed multiplier can achieve 59 %, 71 %, 73 %, and 82 % energy saving under 0.1 %, 1 %, 5 %, and 11 % accuracy loss, respectively, for the RGB-to-YUV & YUV-to-RGB conversion when compared to the conventional IEEE single-precision multiplier. In addition, the results also exhibit that the proposed multiplier can obtain more energy reduction than previous multiple-precision iterative FP multipliers.  相似文献   

8.
设计了一种257 bit快速Montgomery模乘器。针对Montgomery算法中大数乘法操作存在耗时过长问题,采用二次Booth32编码与Wallace树压缩思想,将三次乘法做成三级流水结构,并将加法和可能的减法巧妙的结合在第3次乘法中,最大限度地提高计算并行性。仿真结果表明,整个模乘器可工作在140 MHz频率下,建立流水的时间是42.329 ns,其后每次模乘时间是7.022 ns,性能远远优于现有的模乘器。所设计的模乘器可用于模乘运算的高性能实现,尤其在设计多核运算模块时其性能优势比较明显。  相似文献   

9.
A 32×32-bit multiplier using multiple-valued current-mode circuits has been fabricated in 2-μm CMOS technology. For the multiplier based on the radix-4 signed-digit number system, 32×32-bit two's complement multiplication can be performed with only three-stage signed-digit full adders using a binary-tree addition scheme. The chip contains about 23600 transistors and the effective multiplier size is about 3.2×5.2 mm2, which is half that of the corresponding binary CMOS multiplier. The multiply time is less than 59 ns. The performance is considered comparable to that of the fastest binary multiplier reported  相似文献   

10.
Erdogan  A.T. Arslan  T. 《Electronics letters》1996,32(21):1959-1960
A new multiplication scheme is proposed, for application to single multiplier CMOS based DSP processors, for the implementation of low-power digital FIR filters through the reduction of switching activity within the multiplier section of the filter. The scheme operates in conjunction with a transpose direct form FIR filter structure and a modified DSP processor architecture, through a significant reduction in power can be obtained by using algorithms to order the filter coefficients. This reduction is demonstrated using two basic examples, with different wordlengths and filter orders, achieving up to 63% reduction in switching activity  相似文献   

11.
The superior broadband performance of 2D IIR frequency-planar beam filters, relative to conventional 2D FIR true-time-delay beamforming, has recently been reported using computational electromagnetics and real-time emulations on an antenna test range, resulting in significant improvements of bit-error-rates (BERs) in the presence of broadband interference. Further, massively parallel systolic VLSI circuit polyphase architectures have also been reported (Madanayake et al. in Int. J. Circuit Theory Appl. 2010) for the case of the direct-form signal flow graph (SFG) architecture, operating at a maximum throughput of M-(antenna)-frames-per-clock-cycle (MFPCC). The superior broadband performance of 2D IIR frequency-planar beam filters is extended here from the direct-form signal flow graph (SFG) architecture (Madanayake et al. in Int. J. Circuit Theory Appl. 2010) to the novel differential-form SFG architecture in order to reduce overall complexity. The proposed method employs a differential-form polyphase 2D IIR frequency-planar beam SFG, and a corresponding circuit architecture, to implement the required input-output 2D space-time difference equation. The resultant digital hardware has the significant advantage of much-reduced multiplier complexity, relative to the direct-form structure. For example, when look-ahead pipelining is not employed and for polyphase architectures having two, three, and four phases, the corresponding reductions in multiplier complexity are 20%, 28.6% and 33.3%, respectively. A proof-of-concept prototype circuit is designed and implemented on a Xilinx Sx35 FPGA device for the two-phase case, operating at a frame-rate of 132 million linear frames per second on the uniform linear array (ULA), corresponding to 2-frames-per-clock-cycle at a circuit clock frequency of 66 MHz. The circuit is optimized for low critical path delays (CPDs) using look-ahead pipelining of order three. For ultra-wideband (UWB) radio-frequency (RF) implementations, in such fields as radio astronomy, radar and wireless communications, custom VLSI versions of the proposed circuits are required.  相似文献   

12.
针对不同格密码体制带来的数论变换参数多样性,以及数论变换的性能优化设计,该文提出一种基于随机存取存储器(RAM)的可重构多通道数论变换单元.在数论变换单元设计中,在按时间抽取的基础上改进多通道架构,并提出一种优化地址分配方法.最后基于Xilinx Artix-7现场可编程逻辑门阵列(FPGA)平台进行原型实现,结果显示...  相似文献   

13.
A 16-bit /spl times/ 16-bit multiplier for 2 two's-complement binary numbers based on a new algorithm is described. This multiplier has been fabricated on an LSI chip using a standard n-E/D MOS process technology with a 2.7-/spl mu/m design rule. This multiplier is characterized by use of a binary tree of redundant binary adders. In the new algorithm, n-bit multiplication is performed in a time proportional to log/SUB 2/ n and the physical design of the multiplier is constructed of a regular cellular array. This new algorithm has been proposed by N. Takagi et al. (1982, 1983). The 16-bit/spl times/16-bit multiplier chip size is 5.8 /spl times/ 6.3 mm/SUP 2/ using the new layout for a binary adder tree. The chip contains about 10600 transistors, and the longest logic path includes 46 gates. The multiplication time was measured as 120 ns. It is estimated that a 32-bit /spl times/ 32-bit multiplication time is about 140 ns.  相似文献   

14.
A combinatorial complex multiplier has been designed for use in a pipelined fast Fourier transform processor. The performance in terms of throughput of the processor is limited by the multiplication. Therefore, the multiplier is optimized to make the input-to-output delay as short as possible. A new architecture based on distributed arithmetic, Wallace-trees, and carry-lookahead adders has been developed. The multiplier has been fabricated using standard cells in a 0.5-μm process and verified for functionality, speed, and power consumption. Running at 40 MHz, a multiplier with input wordlengths of 16+16 times 10+10 bits consumes 54% less power compared to an distributed arithmetic array multiplier fabricated under equal conditions  相似文献   

15.
介绍了一种应用于ARM处理器的增强DSP功能乘加单元。为了减小乘加指令的周期数,采用了两个并行16×16位乘加单元构成的单指令多数据(SIMD)结构,可以通过适当的配置支持16到32位的各种乘加运算以及16位的复数乘法。理论分析表明,这种乘加单元与传统的单指令单数据(SISD)结构相比在周期数上有明显的减小。尤其对于16位乘加及16位复数乘法,其所需周期数分别只有ARM1022E的1/4和1/3。0.35mm的标准单元库实现表明该乘加单元可以工作在120MHz,使得其非常适合数字信号处理的应用。  相似文献   

16.
A novel hardware architecture for elliptic curve cryptography (ECC) over$ GF(p)$is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications.  相似文献   

17.
Drolet  G. 《Electronics letters》1999,35(5):368-369
The multiplication, inversion, division and exponentiation of elements of GF(2m) are easily implemented with conventional arithmetic and logical units when the elements are in the logarithmic representation. An electronic architecture for the addition of two elements in the logarithmic representation is presented. The architecture of the adder is similar to that of the Massey-Omura multiplier for the normal basis representation. In particular, the same combinatorial circuit is used to successively compute every bit of the sum  相似文献   

18.
An 8-bit×8-bit signed two's complement pipelined multiplier megacell implemented in 1.6-μm single-poly, double-metal N-well CMOS is described. It is capable of throughputs of 230,000,000 multiplications/s at a clock frequency of 230 MHz, with a latency of 12 clock cycles. A half-bit level pipelined architecture, and the use of true single-phase clocked circuitry are the key features of this design. Simulation studies indicate that the multiplier dissipates 540 mW at 230 MHz. The multiplier cell has 5176 transistors, with dimensions of 1.5 mm×1.4 mm. This multiplier satisfies the need for very high-throughput multiplier cores required in DSP architectures  相似文献   

19.
RSA密码协处理器的实现   总被引:11,自引:0,他引:11  
李树国  周润德  冯建华  孙义和 《电子学报》2001,29(11):1441-1444
密码协处理器的面积过大和速度较慢制约了公钥密码体制RSA在智能卡中的应用.文中对Montgomery模乘算法进行了分析和改进,提出了一种新的适合于智能卡应用的高基模乘器结构.由于密码协处理器采用两个32位乘法器的并行流水结构,这与心动阵列结构相比它有效地降低了芯片的面积和模乘的时钟数,从而可在智能卡中实现RSA的数字签名与认证.实验表明:在基于0.35μm TSMC标准单元库工艺下,密码协处理器执行一次1024位模乘需1216个时钟周期,芯片设计面积为38k门.在5MHz的时钟频率下,加密1024位的明文平均仅需374ms.该设计与同类设计相比具有最小的模乘运算时钟周期数,并使芯片的面积降低了1/3.这个指标优于当今电子商务的密码协处理器,适合于智能卡应用.  相似文献   

20.
周云  冯全源 《微电子学》2016,46(1):54-57
数字下变频器是软件无线电接收机的关键组成部分,用于将模数转换器输出的中频信号进行下变频、抽取、滤波,变为低速基带信号,便于后级数字信号处理。针对DDC经典结构,在分布式算法、流水线技术及多速率数字信号处理技术的基础上,在Xilinx FPGA上实现了一种简单、高效的数字下变频器,并采用Matlab和Modelsim联合仿真验证。结果表明,该DDC不仅有效且无需乘法器资源,占用的各项硬件资源均较低,利于嵌入大型系统中,具有良好的工程应用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号