共查询到20条相似文献,搜索用时 62 毫秒
1.
In an orthogonal frequency division multiplexing (OFDM) based wireless systems, Fast Fourier Transform (FFT) is a critical block as it occupies large area and consumes more power. In this paper, we present an area-efficient and low power 16-bit word-width 64-point radix-22 and radix-23 pipelined FFT architectures for an OFDM-based IEEE 802.11a wireless LAN baseband. The designs are derived from radix-2k algorithm and adopt a Single-Path Delay Feedback (SDF) architecture for hardware implementation. To eliminate the complex multipliers and read-only memory (ROM) which is used for internal storage of twiddle factor coefficients, the proposed 64-point FFT employs a Canonical Signed Digit (CSD) complex constant multiplier using adders, multiplexers and shifters. The complex constant multiplier (CCM) is modified using common sub-expression sharing block that reduces the area of the design. The proposed radix-22 and radix-23 pipelined FFT architectures are modeled and implemented using TSMC 180 nm CMOS technology with a supply voltage of 1.8 V. The implementation results show that the proposed architectures significantly reduces the hardware cost and power consumption in comparison to existing 64-point FFT architectures. 相似文献
2.
在对传统Booth乘法器研究的基础上,介绍了一种结构新颖的流水线型布什(Booth)乘法器。使用基-4 Booth编码、华莱士树(Wallace Tree)压缩结构、64位Kogge-Stone前缀加法器实现,并在分段实现的64位Kogge-Stone前缀加法器中插入4级流水线寄存器,实现32 t×32 bit无符号和有符号数快速乘法。用硬件描述语言设计该乘法器,使用现场可编程门阵列(Field Programmable Gate Array,FPGA)进行验证,并采用SMIC 0.18 μm CMOS标准单元工艺对该乘法器进行综合。综合结果表明,电路的关键路径延时为3.6 ns,芯片面积<0.134 mm,功耗<32.69 mW。 相似文献
3.
Kawahito S. Kameyama M. Higuchi T. Yamada H. 《Solid-State Circuits, IEEE Journal of》1988,23(1):124-132
A 32×32-bit multiplier using multiple-valued current-mode circuits has been fabricated in 2-μm CMOS technology. For the multiplier based on the radix-4 signed-digit number system, 32×32-bit two's complement multiplication can be performed with only three-stage signed-digit full adders using a binary-tree addition scheme. The chip contains about 23600 transistors and the effective multiplier size is about 3.2×5.2 mm2, which is half that of the corresponding binary CMOS multiplier. The multiply time is less than 59 ns. The performance is considered comparable to that of the fastest binary multiplier reported 相似文献
4.
A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm/sup 2/ test chip in 0.18-/spl mu/m 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8% energy savings are 25.3% using dual supplies, while for 8.3% increase in delay, 33.3% can be saved. 相似文献
5.
A 14-bit digital-to-analog converter based on a fourth-order multibit sigma-delta modulator is described. The digital modulator is pipelined to minimize both its power dissipation and design complexity. The 6-bit output of this modulator is converted to analog using 64 current-steering cells that are continuously calibrated to a reference current. This converter achieves 85-dB dynamic range at 5-MHz signal bandwidth, with an oversampling ratio of 12. The chip was fabricated in a 0.5-/spl mu/m CMOS technology and operates from a single 2.5-V supply. 相似文献
6.
7.
A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM 总被引:1,自引:0,他引:1
In this paper, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in an OFDM-based IEEE 802.11a wireless LAN baseband processor. The 64-point FFT is realized by decomposing it into a two-dimensional structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use a two-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25-/spl mu/m BiCMOS technology. The core area of this chip is 6.8 mm/sup 2/. The average dynamic power consumption is 41 mW at 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption. 相似文献
8.
9.
《Solid-State Circuits, IEEE Journal of》1987,22(1):15-19
A chip set for high-speed radix-2 fast Fourier transform (FFT) applications up to 512 points is described. The chip set comprises a (16+16)/spl times/(12+12)-bit complex number multiplier, and a 16-bit butterfly chip for data reordering, twiddle factor generation, and butterfly arithmetic. The chips have been implemented using a standard cell design methodology on a 2-/spl mu/m bulk CMOS process. Three chips implement a complex FFT butterfly with a throughput of 10 MHz, and are cascadable up to 512 points. The chips feature an offline self-testing capability. 相似文献
10.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(9):1141-1150
11.
Chen O.T.-C. Wei-Lung Liu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(4):440-446
This work developed a modified direct form based on the radix-4 Booth algorithm to realize a finite impulse response (FIR) architecture with programmable dynamic ranges of input data and filter coefficients. This architecture comprises a preprocessing unit, data latches, configurable connection units, double Booth decoders, coefficient registers, a path control unit, and a postprocessing unit. Programmable dynamic ranges of input data and filter coefficients can be any positive even numbers or multiple of a word length of coefficient registers, using configurable connection units or a path control unit, respectively. In particular, the proposed architecture employs only data-path controls to accomplish programmable operations, without changing word lengths and components of data latches and filter taps. A practical 8-bit and 16-bit FIR processor has also been implemented by using the TSMC 5 V 0.6 μm CMOS technology. It is suitable for operations of asymmetric, symmetric, and anti-symmetric filters at 64, 63, 32, 31, and 16 taps, and is well explored to optimize its functional units. The proposed processor has throughput rates of 50 M and 25 M samples/s for 8-bit and 16-bit input data of various filter applications, respectively 相似文献
12.
Dynamic CMOS ternary logic circuits that can be used to form a pipelined system with nonoverlapped two-phase clocks are proposed and investigated. The proposed dynamic ternary gates do not dissipate DC power and have full voltage swings. A circuit structure called the simple ternary differential logic (STDL) is also proposed and analyzed, and an optimal procedure is developed. An experimental chip has been fabricated in a 1.2-μm CMOS process and tested. A binary pipelined multiplier has been designed, using the proposed dynamic ternary logic circuits in the interior of the multiplier for coding of radix-2 redundant positive-digit number. The structure has the advantages of higher operating frequency, less latency, and lower device count as compared with the conventional binary parallel pipelined multiplier. The advantages of the circuits over other dynamic ternary logic circuits are shown 相似文献
13.
14.
This paper presents a novel variable-latency multiplier architecture, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core. The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder. The paper reports the architecture and logic design, CMOS circuit design and performance evaluation. In 0.35 μm CMOS, the expected sustainable cycle time for a 32-bit synchronous implementation is 2.25 ns. Instruction level simulations estimate 54% single-cycle and 46% two-cycle operations in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous implementation is expected to reach an average 1.76 ns throughput and 3.48 ns latency in SPEC95 execution 相似文献
15.
《Solid-State Circuits, IEEE Journal of》1985,20(5):986-992
A single-chip 80-bit floating point VLSI processor capable of performing 5.6 million floating point operations per second has been realized using 1.2-/spl mu/m n-well CMOS technology. The processor handles 80-bit double-extended floating point data conforming to IEEE standard 754. The chip has 128 microinstructions which are stored in an on-chip ROM. By programming microinstruction sequences in an external control storage, not only basic arithmetic operation but also special arithmetic functions can be performed. A composite design method supported by a hierarchical design automation system was used to quickly lay out 50K gates including a 64-/spl times/64-bit multiplier and 15 kb of memory on a chip with a die size of 10/spl times/10 mm/SUP 2/. Only 11 man-months were required for the effort. 相似文献
16.
《Solid-State Circuits, IEEE Journal of》1985,20(6):1235-1241
One lattice equalizer stage is designed on a single chip using 4-/spl mu/m NMOS technology. All the arithmetic operations of the chip are performed bit-serially under the control of a global two-phase clock, and they are totally pipelined. The data are represented as 16-bit two's complement fixed-point numbers. A built-in test scheme allows the offline testing of the chip with high fault coverage at a minimal hardware overhead. Direct coupling between chips permits the realization of filters of higher order. In addition, the structure of the lattice equalizer permits the use of the same chip in linear prediction problems. SPICE simulation results and fabrication of the major blocks in the design demonstrated that operating clock frequencies of up to 8 MHz are possible. At the maximum estimated operating clock frequency, the chip can accommodate applications with data rates of up to 500 kHz. 相似文献
17.
《Solid-State Circuits, IEEE Journal of》1981,16(5):537-542
A fully integrated 32-bit VLSI CPU chip utilizing 1 /spl mu/m features is described. It is fabricated in an n-channel silicon gate, self-aligned technology. The chip contains about 450000 transistors and executes microinstructions at approximately one per 55 ns clock cycle. It can execute a 32-bit binary integer add in 55 ns, a 32-bit binary integer multiply in 1.8 /spl mu/s, and a 64-bit floating point multiply in 10.4 /spl mu/s. The instruction set provides the functions of an advanced mainframe CPU. Because the implementation of such a complex device poses an organizational as well as a technical challenge, the design philosophy that was adopted is summarized briefly. Careful attention was paid to designer productivity, and design flexibility and testability. 相似文献
18.
Linderman R.W. Shephard C.G. Taylor K. Coutee P.W. Rossbach P.C. Collins J.M. Hauser R.S. 《Solid-State Circuits, IEEE Journal of》1988,23(2):343-350
A chip architecture designed to compute a 16-point discrete Fourier transform (DFT) using S. Winograd's algorithm (1978) every 457 ns is presented. The 99500-transistor 1.2-μm chip incorporates arithmetic, control, and input/output circuitry with testability and fault detection into a 144-pin package. A throughput of 2.3×1012 gate-Hz/cm2 and 79-million multiplications/s is attained with 70-MHz pipelined bit-serial logic. Combined with similar chips computing 15- and 17-point DFTs, 4080-point DFTs can be computed every 118 μs. Using the 16- and 17-point chips, 272×272-point complex data imagery can be transformed in 4.25 ms. A 24-bit block floating-point data representation combined with an adaptive scaling algorithm delivers a numerical precision of 106 dB (17.6 bits) after computing 4080-point DFTs 相似文献
19.
Okamoto F. Hagihara Y. Ohkubo C. Nishi N. Yamada H. Enomoto T. 《Solid-State Circuits, IEEE Journal of》1991,26(12):1885-1893
The first single-chip 64-b vector-pipelined processor (VPP) ULSI is described. It executes vector operations indispensable to high-speed scientific computation. The VPP ULSI attains a 200-MFLOPS peak performance at a 100-MHz clock frequency. This extremely high performance is made possible by the integration on the VPP of a 64-b five-stage pipelined adder/shifter, a 64-b five-stage pipelined multiplier/divider/logic operation unit, and a 40-kb register file. Various new high-speed circuit techniques have been also developed for 100-MHz operations. The chip, which was fabricated with a 0.8-μm BiCMOS and triple-layer metallization process technology, has a 17.2-mm×17.3-mm area and contains about 693 K transistors. It consumes 13.2 W at a 100-MHz clock frequency with a single 5-V power supply 相似文献