首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In an orthogonal frequency division multiplexing (OFDM) based wireless systems, Fast Fourier Transform (FFT) is a critical block as it occupies large area and consumes more power. In this paper, we present an area-efficient and low power 16-bit word-width 64-point radix-22 and radix-23 pipelined FFT architectures for an OFDM-based IEEE 802.11a wireless LAN baseband. The designs are derived from radix-2k algorithm and adopt a Single-Path Delay Feedback (SDF) architecture for hardware implementation. To eliminate the complex multipliers and read-only memory (ROM) which is used for internal storage of twiddle factor coefficients, the proposed 64-point FFT employs a Canonical Signed Digit (CSD) complex constant multiplier using adders, multiplexers and shifters. The complex constant multiplier (CCM) is modified using common sub-expression sharing block that reduces the area of the design. The proposed radix-22 and radix-23 pipelined FFT architectures are modeled and implemented using TSMC 180 nm CMOS technology with a supply voltage of 1.8 V. The implementation results show that the proposed architectures significantly reduces the hardware cost and power consumption in comparison to existing 64-point FFT architectures.  相似文献   

2.
李飞雄  蒋林 《电子科技》2013,26(8):46-48,67
在对传统Booth乘法器研究的基础上,介绍了一种结构新颖的流水线型布什(Booth)乘法器。使用基-4 Booth编码、华莱士树(Wallace Tree)压缩结构、64位Kogge-Stone前缀加法器实现,并在分段实现的64位Kogge-Stone前缀加法器中插入4级流水线寄存器,实现32 t×32 bit无符号和有符号数快速乘法。用硬件描述语言设计该乘法器,使用现场可编程门阵列(Field Programmable Gate Array,FPGA)进行验证,并采用SMIC 0.18 μm CMOS标准单元工艺对该乘法器进行综合。综合结果表明,电路的关键路径延时为3.6 ns,芯片面积<0.134 mm,功耗<32.69 mW。  相似文献   

3.
A 32×32-bit multiplier using multiple-valued current-mode circuits has been fabricated in 2-μm CMOS technology. For the multiplier based on the radix-4 signed-digit number system, 32×32-bit two's complement multiplication can be performed with only three-stage signed-digit full adders using a binary-tree addition scheme. The chip contains about 23600 transistors and the effective multiplier size is about 3.2×5.2 mm2, which is half that of the corresponding binary CMOS multiplier. The multiply time is less than 59 ns. The performance is considered comparable to that of the fastest binary multiplier reported  相似文献   

4.
A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm/sup 2/ test chip in 0.18-/spl mu/m 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8% energy savings are 25.3% using dual supplies, while for 8.3% increase in delay, 33.3% can be saved.  相似文献   

5.
A 14-bit digital-to-analog converter based on a fourth-order multibit sigma-delta modulator is described. The digital modulator is pipelined to minimize both its power dissipation and design complexity. The 6-bit output of this modulator is converted to analog using 64 current-steering cells that are continuously calibrated to a reference current. This converter achieves 85-dB dynamic range at 5-MHz signal bandwidth, with an oversampling ratio of 12. The chip was fabricated in a 0.5-/spl mu/m CMOS technology and operates from a single 2.5-V supply.  相似文献   

6.
7.
In this paper, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in an OFDM-based IEEE 802.11a wireless LAN baseband processor. The 64-point FFT is realized by decomposing it into a two-dimensional structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use a two-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25-/spl mu/m BiCMOS technology. The core area of this chip is 6.8 mm/sup 2/. The average dynamic power consumption is 41 mW at 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption.  相似文献   

8.
数字自校准算法在高精度流水线ADC中应用越来越广泛.目前,基于数字自校准算法的流水线ADC的结构一般都是1.5位/级.基于对各种结构优缺点的分析,选择在芯片功耗和面积方面有很强优势的2位/级结构,并设计了一种符合这种结构的改进型数字自校准算法.这种改进算法解决了目前数字自校准算法中校准参数不准确的问题,使校准输出后的数据准确度更高.实验结果表明,该改进型数字自校准算法使系统的线性度有了很大的提升.  相似文献   

9.
A chip set for high-speed radix-2 fast Fourier transform (FFT) applications up to 512 points is described. The chip set comprises a (16+16)/spl times/(12+12)-bit complex number multiplier, and a 16-bit butterfly chip for data reordering, twiddle factor generation, and butterfly arithmetic. The chips have been implemented using a standard cell design methodology on a 2-/spl mu/m bulk CMOS process. Three chips implement a complex FFT butterfly with a throughput of 10 MHz, and are cascadable up to 512 points. The chips feature an offline self-testing capability.  相似文献   

10.
In this paper, design of a mixed-signal 64-bit adder based on the continuous valued number system (CVNS) is presented. The 64-bit adder is generated by cascading four 16-bit radix-2 CVNS adders. Truncated summation of the CVNS digits reduced the number of required interconnections in the system, which in turn reduced design complexity and hardware costs. This adder can perform one 64-bit, two 32-bit, four 16-bit, or eight 8-bit additions on demand for media signal processing applications. The compact and low-power and low-noise design of the adder is suitable for this type of application. The 64-bit adder designed in TSMC CMOS 0.18-$mu$ m technology, has a worst case delay of 1.5 ns, energy dissipation of about 14 pJ with the core area of 13$thinspace$ 250 $mu{hbox {m}}^{2}$.   相似文献   

11.
This work developed a modified direct form based on the radix-4 Booth algorithm to realize a finite impulse response (FIR) architecture with programmable dynamic ranges of input data and filter coefficients. This architecture comprises a preprocessing unit, data latches, configurable connection units, double Booth decoders, coefficient registers, a path control unit, and a postprocessing unit. Programmable dynamic ranges of input data and filter coefficients can be any positive even numbers or multiple of a word length of coefficient registers, using configurable connection units or a path control unit, respectively. In particular, the proposed architecture employs only data-path controls to accomplish programmable operations, without changing word lengths and components of data latches and filter taps. A practical 8-bit and 16-bit FIR processor has also been implemented by using the TSMC 5 V 0.6 μm CMOS technology. It is suitable for operations of asymmetric, symmetric, and anti-symmetric filters at 64, 63, 32, 31, and 16 taps, and is well explored to optimize its functional units. The proposed processor has throughput rates of 50 M and 25 M samples/s for 8-bit and 16-bit input data of various filter applications, respectively  相似文献   

12.
Dynamic CMOS ternary logic circuits that can be used to form a pipelined system with nonoverlapped two-phase clocks are proposed and investigated. The proposed dynamic ternary gates do not dissipate DC power and have full voltage swings. A circuit structure called the simple ternary differential logic (STDL) is also proposed and analyzed, and an optimal procedure is developed. An experimental chip has been fabricated in a 1.2-μm CMOS process and tested. A binary pipelined multiplier has been designed, using the proposed dynamic ternary logic circuits in the interior of the multiplier for coding of radix-2 redundant positive-digit number. The structure has the advantages of higher operating frequency, less latency, and lower device count as compared with the conventional binary parallel pipelined multiplier. The advantages of the circuits over other dynamic ternary logic circuits are shown  相似文献   

13.
43位浮点流水线乘法器的设计   总被引:1,自引:0,他引:1       下载免费PDF全文
梁峰  邵志标  孙海珺   《电子器件》2006,29(4):1094-1096,1102
提出一种浮点流水线乘法器IP芯核。该乘法器采用改进的三阶Booth算法减少部分积数目,提出了一种压缩器混用的Wallace树结构压缩阵列,并对关键路径中的5-2压缩器、4—2压缩器和64位CLA加法器进行了优化设计,有效降低了乘法器的延时和面积。经FPGA仿真验证表明,该乘法器运算能力比Altera公司近期提供的同类乘法器单元快15.4%。  相似文献   

14.
This paper presents a novel variable-latency multiplier architecture, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core. The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder. The paper reports the architecture and logic design, CMOS circuit design and performance evaluation. In 0.35 μm CMOS, the expected sustainable cycle time for a 32-bit synchronous implementation is 2.25 ns. Instruction level simulations estimate 54% single-cycle and 46% two-cycle operations in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous implementation is expected to reach an average 1.76 ns throughput and 3.48 ns latency in SPEC95 execution  相似文献   

15.
A single-chip 80-bit floating point VLSI processor capable of performing 5.6 million floating point operations per second has been realized using 1.2-/spl mu/m n-well CMOS technology. The processor handles 80-bit double-extended floating point data conforming to IEEE standard 754. The chip has 128 microinstructions which are stored in an on-chip ROM. By programming microinstruction sequences in an external control storage, not only basic arithmetic operation but also special arithmetic functions can be performed. A composite design method supported by a hierarchical design automation system was used to quickly lay out 50K gates including a 64-/spl times/64-bit multiplier and 15 kb of memory on a chip with a die size of 10/spl times/10 mm/SUP 2/. Only 11 man-months were required for the effort.  相似文献   

16.
One lattice equalizer stage is designed on a single chip using 4-/spl mu/m NMOS technology. All the arithmetic operations of the chip are performed bit-serially under the control of a global two-phase clock, and they are totally pipelined. The data are represented as 16-bit two's complement fixed-point numbers. A built-in test scheme allows the offline testing of the chip with high fault coverage at a minimal hardware overhead. Direct coupling between chips permits the realization of filters of higher order. In addition, the structure of the lattice equalizer permits the use of the same chip in linear prediction problems. SPICE simulation results and fabrication of the major blocks in the design demonstrated that operating clock frequencies of up to 8 MHz are possible. At the maximum estimated operating clock frequency, the chip can accommodate applications with data rates of up to 500 kHz.  相似文献   

17.
A fully integrated 32-bit VLSI CPU chip utilizing 1 /spl mu/m features is described. It is fabricated in an n-channel silicon gate, self-aligned technology. The chip contains about 450000 transistors and executes microinstructions at approximately one per 55 ns clock cycle. It can execute a 32-bit binary integer add in 55 ns, a 32-bit binary integer multiply in 1.8 /spl mu/s, and a 64-bit floating point multiply in 10.4 /spl mu/s. The instruction set provides the functions of an advanced mainframe CPU. Because the implementation of such a complex device poses an organizational as well as a technical challenge, the design philosophy that was adopted is summarized briefly. Careful attention was paid to designer productivity, and design flexibility and testability.  相似文献   

18.
A chip architecture designed to compute a 16-point discrete Fourier transform (DFT) using S. Winograd's algorithm (1978) every 457 ns is presented. The 99500-transistor 1.2-μm chip incorporates arithmetic, control, and input/output circuitry with testability and fault detection into a 144-pin package. A throughput of 2.3×1012 gate-Hz/cm2 and 79-million multiplications/s is attained with 70-MHz pipelined bit-serial logic. Combined with similar chips computing 15- and 17-point DFTs, 4080-point DFTs can be computed every 118 μs. Using the 16- and 17-point chips, 272×272-point complex data imagery can be transformed in 4.25 ms. A 24-bit block floating-point data representation combined with an adaptive scaling algorithm delivers a numerical precision of 106 dB (17.6 bits) after computing 4080-point DFTs  相似文献   

19.
The first single-chip 64-b vector-pipelined processor (VPP) ULSI is described. It executes vector operations indispensable to high-speed scientific computation. The VPP ULSI attains a 200-MFLOPS peak performance at a 100-MHz clock frequency. This extremely high performance is made possible by the integration on the VPP of a 64-b five-stage pipelined adder/shifter, a 64-b five-stage pipelined multiplier/divider/logic operation unit, and a 40-kb register file. Various new high-speed circuit techniques have been also developed for 100-MHz operations. The chip, which was fabricated with a 0.8-μm BiCMOS and triple-layer metallization process technology, has a 17.2-mm×17.3-mm area and contains about 693 K transistors. It consumes 13.2 W at a 100-MHz clock frequency with a single 5-V power supply  相似文献   

20.
针对WIMAX系统中变长子载波的特点,通过采用流水线乒乓结构,以基2、基4混合基实现了高速可配置的FFT/IFFT。将不同点数的FFT旋转因子统一存储,同时对RAM单元进行优化,节约了存储空间;此外对基4蝶形单元进行优化,减少了加法和乘法运算单元。仿真和综合结果表明,设计满足了WIMAX高速系统中不同带宽下FFT/IFFT的要求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号