首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is shown how distributed arithmetic techniques can be applied in parallel-data arithmetic computations to achieve highly regular and efficient VLSI structures on silicon. Two individual arithmetic processor chips are described as examples of the technique. The chips described, which are intended primarily for computation of the FFT butterfly, each contain the functional equivalence of two parallel pipelined multipliers. The first chip is an 8-bit prototype device which has been designed and fabricated on a standard 5-/spl mu/m silicon-gate n-channel MOS process. The second chip is a 16-bit CMOS-SOS design which uses a modified architecture to achieve higher clocking rates and improved versatility in systems use.  相似文献   

2.
We consider offline testing, design-for-testability, and diagnosis for fast Fourier transform (FFT) networks. A practical FFT chip can contain millions of gates, so effective testing and fault-tolerance techniques usually are required in order to guarantee high-quality products. We propose M-testability conditions for FFT butterfly, omega, and flip networks at the double-multiply-subtract-add (DMSA) module level. A novel design-for-testability technique based on the functional bijectivity property of the specified modules to detect faults other than the cell faults is presented. It guarantees 100% combinational fault coverage with negligible hardware overhead-about 0.17% for an FFT network with 16-bit operand words, independent of the network size. Our design requires fewer test vectors compared with previous ones-a factor of up to 1/(6 /spl times/ 2/sup 5n/), where n is the word length. We also propose C-diagnosability conditions and a C-diagnosable FFT network design. By property exchanging and blocking certain fault propagation paths, a faulty DMSA module can be located using a two-phase deterministic algorithm. The blocking mechanism can be implemented with no additional hardware. Compared with previous schemes, our design reduces the diagnosis complexity from O(N) to O(1). For both testing and diagnosis, the hardware overhead for our approach is only about 0.43% for 16-bit numbers regardless of the FFT network size.  相似文献   

3.
A 2-/spl mu/m CMOS VLSI digital signal processor (DSP) family, the SP50, is described that is capable of eight million instructions per second and up to six concurrent operations in each instruction. Two DSPs, the PCB5010 and PCB5011, have been developed. Both are based on a common architecture which contains two 16-bit data buses, and a 16/spl times/16/spl rarr/40-bit multiplier accumulator and 16-bit ALU, both with multiprecision support in hardware. Also implemented are two static data RAMs (128/spl times/16 or 256/spl times/16), a data ROM (51/spl times/16), a 15-word three-port register file, three address computation units, and five serial and parallel I/O interfaces. The data path is controlled by an orthogonal instruction set, using 40-bit microcode words. The controller contains a five-level stack and an instruction repeat register, and can have either on-chip program memory (RAM: 32/spl times/40; ROM: 987/spl times/40) or off-chip program memory (up to 64K/spl times/40). Benchmarks show a two to sixfold improvement in overall performance over its predecessors.  相似文献   

4.
A high-responsivity 9-V/Lux-s high-speed 5000-frames/s (at full 512/spl times/512 resolution) CMOS active pixel sensor (APS) is presented in this paper. The sensor was designed for a 0.35-/spl mu/m 2P3M CMOS sensor process and utilizes a five-transistor pixel to provide a true parallel shutter. Column-parallel analog-to-digital converter (ADC) architecture yields fast readout from pixels and digitization of the data simultaneously with acquiring a new frame. The chip has a two-row SRAM to store data from the ADC and read previous rows of data out of the chip. There are a total of 16 parallel ports operating up to 90 MHz delivering /spl sim/1.3 Gpixel/s or 13 Gb/s of data at the maximum rate. In conclusion, a comparison between two high-speed digital CMOS sensor architectures, which are a column-parallel APS and a digital pixel sensor (DPS), is conducted.  相似文献   

5.
The paper proposes a new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy. The existing in-place strategy supports only a fixed-radix FFT algorithm. In contrast, the proposed in-place strategy can support the MR algorithm, which allows CF FFT computations regardless of the length of FFT. The novel in-place strategy is made by interchanging storage locations of butterfly outputs. The CFMR FFT processor provides the MR algorithm, the in-place strategy, and the CF FFT computations at the same time. The CFMR FFT processor requires only two N-word memories due to the proposed in-place strategy. In addition, it uses one butterfly unit that can perform either one radix-4 butterfly or two radix-2 butterflies. The CFMR FFT processor using the 0.18 /spl mu/m SEC cell library consists of 37,000 gates excluding memories, requires only 640 clock cycles for a 512-point FFT and runs at 100 MHz. Therefore, the CFMR FFT processor can reduce hardware complexity and computation cycles compared with existing FFT processors.  相似文献   

6.
Describes a new 4-bit microcomputer fabricated using a low-power silicon gate CMOS process and working from a supply voltage down to 1.2 V. The /spl mu/C can directly drive up to seven 3:1 multiplexed LCD digits, scan up 48 keys, and perform 4-bit handshaking data transfer with external devices. 16-bit, single-word instructions and eight stack levels permit efficient use of the 640-word ROM. Operating from a 4.19 MHz crystal, the device has an instruction cycle time of 15 /spl mu/s. An operating power of 100 /spl mu/W at 1.5 W makes the chip ideal for performing control and timing functions in battery operated applications.  相似文献   

7.
A 16-bit /spl times/ 16-bit multiplier for 2 two's-complement binary numbers based on a new algorithm is described. This multiplier has been fabricated on an LSI chip using a standard n-E/D MOS process technology with a 2.7-/spl mu/m design rule. This multiplier is characterized by use of a binary tree of redundant binary adders. In the new algorithm, n-bit multiplication is performed in a time proportional to log/SUB 2/ n and the physical design of the multiplier is constructed of a regular cellular array. This new algorithm has been proposed by N. Takagi et al. (1982, 1983). The 16-bit/spl times/16-bit multiplier chip size is 5.8 /spl times/ 6.3 mm/SUP 2/ using the new layout for a binary adder tree. The chip contains about 10600 transistors, and the longest logic path includes 46 gates. The multiplication time was measured as 120 ns. It is estimated that a 32-bit /spl times/ 32-bit multiplication time is about 140 ns.  相似文献   

8.
A 24-bit serial-parallel multiplier was integrated in CMOS/silicon-on-sapphire (SOS) technology on a 155 mil/spl times/170 mil chip. The operation of this multiplier is described, showing how the parallel loaded multiplier x combines with the serial loaded multiplicand, a, to form the serial product. An addend, b, can also be accommodated to produce ax+b. The design of the multiplier cells are based on functional majority logic adders and weak or trickle inverter master-slave latches. The chip operates at clock rates up to 18 MHz. Power dissipation at 10 MHz and V/SUB DD/ of 5 V is about 20 mW, and the energy consumption for multiplying two 16-bit numbers is about 64 nJ. Typical application areas are mentioned.  相似文献   

9.
This investigation proposes a novel radix-42 algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-42 single delay feedback path (R42SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8times8 2D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input re-ordering and post computation operations in the 8times8 2D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 times 8 2D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-mum CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices.  相似文献   

10.
This paper presents a direct digital frequency synthesizer (DDFS) with a 16-bit accumulator, a fourth-order phase domain single-stage /spl Delta//spl Sigma/ interpolator, and a 300-MS/s 12-bit current-steering DAC based on the Q/sup 2/ Random Walk switching scheme. The /spl Delta//spl Sigma/ interpolator is used to reduce the phase truncation error and the ROM size. The implemented fourth-order single-stage /spl Delta//spl Sigma/ noise shaper reduces the effective phase bits by four and reduces the ROM size by 16 times. The DDFS prototype is fabricated in a 0.35-/spl mu/m CMOS technology with active area of 1.11mm/sup 2/ including a 12-bit DAC. The measured DDFS spurious-free dynamic range (SFDR) is greater than 78 dB using a reduced ROM with 8-bit phase, 12-bit amplitude resolution and a size of 0.09 mm/sup 2/. The total power consumption of the DDFS is 200mW with a 3.3-V power supply.  相似文献   

11.
A 1-GS/s FFT/IFFT processor for UWB applications   总被引:1,自引:0,他引:1  
In this paper, we present a novel 128-point FFT/IFFT processor for ultrawideband (UWB) systems. The proposed pipelined FFT architecture, called mixed-radix multipath delay feedback (MRMDF), can provide a higher throughput rate by using the multidata-path scheme. Furthermore, the hardware costs of memory and complex multipliers in MRMDF are only 38.9% and 44.8% of those in the known FFT processor by means of the delay feedback and the data scheduling approaches. The high-radix FFT algorithm is also realized in our processor to reduce the number of complex multiplications. A test chip for the UWB system has been designed and fabricated using 0.18-/spl mu/m single-poly and six-metal CMOS process with a core area of 1.76/spl times/1.76 mm/sup 2/, including an FFT/IFFT processor and a test module. The throughput rate of this fabricated FFT processor is up to 1 Gsample/s while it consumes 175 mW. Power dissipation is 77.6 mW when its throughput rate meets UWB standard in which the FFT throughput rate is 409.6 Msample/s.  相似文献   

12.
This work presents a shared fractional-N synthesizer used by two dual-band 802.11 radios integrated on a single chip for 2/spl times/2 multiple-input multiple-output (MIMO) applications. Additional 2/spl times/2 MIMO chips can be used in a system by phase synchronizing the signal paths through a bidirectional LO porting scheme developed for this application. This synthesizer was fully integrated with the exception of an off-chip loop filter. The synthesizer is a /spl Delta//spl Sigma/-based fractional-N frequency synthesizer with three on-chip LC tuned VCOs to cover the entire frequency bands specified in the IEEE 802.11a/b/g and Japanese WLAN standards. The radio uses a variable IF frequency so that both the RF LO and IF LO can be derived from a single synthesizer saving chip area and power. The synthesizer includes a programmable second/third-order /spl Delta//spl Sigma/ noise shaper, a phase frequency detector, a differential charge pump, and a 6-bit multimodulus divider (MMD). The nominal jitter from 100 Hz to 10 MHz is 0.63-0.86/spl deg/ rms in the 5-GHz band and 0.35-0.43/spl deg/ rms in the 2.4-GHz band. The maximum frequency deviation of the synthesizer when enabling the transmitter is less than 150 kHz and the frequency error settles to 2 kHz in less than 12 /spl mu/s. For MIMO applications requiring more than two full paths, a single synthesizer on one die can be used to generate the LOs for all other radios integrated in different dies.  相似文献   

13.
A 640-Mb/s 2048-bit programmable LDPC decoder chip   总被引:3,自引:0,他引:3  
A 14.3-mm/sup 2/ code-programmable and code-rate tunable decoder chip for 2048-bit low-density parity-check (LDPC) codes is presented. The chip implements the turbo-decoding message-passing (TDMP) algorithm for architecture-aware (AA-)LDPC codes which has a faster convergence rate and hence a throughput advantage over the standard decoding algorithm. It employs a reduced complexity message computation mechanism free of lookup tables, and features a programmable network for message interleaving based on the code structure. The chip decodes any mix of 2048-bit rate-1/2 (3,6)-regular AA-LDPC codes in standard mode by programming the network, and attains a throughput of 640 Mb/s at 125 MHz for 10 TDMP-decoding iterations. In augmented mode, the code rate can be tuned up to 14/16 in steps of 1/16 by augmenting the code. The chip is fabricated in 0.18-/spl mu/m six-metal-layer CMOS technology, operates at a peak clock frequency of 125 MHz at 1.8 V (nominal), and dissipates an average power of 787 mW.  相似文献   

14.
A set of eight chips that perform real-time image-processing tasks was designed and fabricated with a 4-/spl mu/m NMOS technology. The chips include a 3/spl times/3 linear convolver, a 3/spl times/3 sorting filter, a 7/spl times/7 logical convolver, a contour tracer, a feature extractor, a lookup-table ROM, and two postprocessors for the linear convolver. All chips were designed with architectures that are dedicated to the particular image-processing task to be performed. The image-processing circuits operate on 10-MHz video data (512/spl times/512-pixel images). The design time for the chips was kept to 1.5 man-years by reusing hardware and using (and developing) appropriate CAD tools, ROM generators and a data-path generator were developed to reduce the circuit design time. An image recognition system was built with these custom chips that can recognize two-dimensional objects that are characterized by their closed outer contours. The complete system is controlled by a Sun workstation and operates at rates up to 15 frames/s. The recognition system achieved a 98% recognition rate for eight objects over a wide range of orientation and size variations and a 100% recognition rate without size variations.  相似文献   

15.
An implementation of the IF section of WCDMA mobile transceivers with a set of two chips fabricated in an inexpensive 0.35-/spl mu/m two-poly three-metal CMOS process is presented. The transmit/receive chip set integrates quadrature modulators and demodulators, wide dynamic range automatic gain control (AGC) amplifiers, with linear-in-decibel gain control, and associated circuitry. This paper describes the problems encountered and the solutions envisaged to meet stringent specifications, with process and temperature variations, thus overcoming the limitations of CMOS devices, while operating at frequencies in the range of 100 MHz-1 GHz. Detailed measurement results corroborating successful application of the new techniques are reported. A receive AGC dynamic range of 73 dB with linearity error of less than /spl plusmn/2 dB and spread of less than 5 dB for a temperature range of -30/spl deg/C to +85/spl deg/C in the gain control characteristic has been measured. The modulator measurement shows a carrier suppression of 35 dB and sideband/third harmonic suppression of over 46 dB. The core die area of each chip is 1.5 mm/sup 2/.  相似文献   

16.
In this paper, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in an OFDM-based IEEE 802.11a wireless LAN baseband processor. The 64-point FFT is realized by decomposing it into a two-dimensional structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use a two-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25-/spl mu/m BiCMOS technology. The core area of this chip is 6.8 mm/sup 2/. The average dynamic power consumption is 41 mW at 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption.  相似文献   

17.
This paper presents a mixed-signal programmable chip for high-speed vision applications. It consists of an array of processing elements, arranged to operate in accordance with the principles of single instruction multiple data (SIMD) computing architectures. This chip, implemented in a 0.35-/spl mu/m fully digital CMOS technology, contains /spl sim/ 3.75 M transistors and exhibits peak performance figures of 330 GOPS (8-bit equivalent giga-operations per second), 3.6 GOPS/mm/sup 2/ and 82.5 GOPS/W. It includes structures for image acquisition and for image processing, meaning that it does not require a separate imager for operation. At the sensory side, integration and log-compression sensing circuits are embedded, thus allowing the chip to handle a large variety of illumination conditions. At the processing plane, analog and digital circuits are employed whose parameters can be programmed and their architecture reconfigured for the realization of software-coded processing algorithms. The chip provides, and accepts, 8-bit digitized data through a 32-bit bidirectional data bus which operates at 120 MB/s. Experimental results show that frame rates of 1000 frames per second (FPS) can be achieved under room illumination conditions; applications using exposures of about 50 /spl mu/s have been recently reached by using special illumination setups. The chip can capture an image, run approximately 150 two-dimensional linear convolutions, and download the result in 8-bit digital format, in less than 1 ms. This feature, together with the possibility of executing sequences of user-definable instructions (stored on a full-custom 32-kb on-chip memory), and storing intermediate results (up to 8 grayscale images) makes the chip a true general-purpose sensory/processing device.  相似文献   

18.
A radix-8 wafer scale FFT processor   总被引:2,自引:0,他引:2  
Wafer Scale Integration promises radical improvements in the performance of digital signal processing systems. This paper describes the design of a radix-8 systolic (pipeline) fast Fourier transform processor for implementation with wafer scale integration. By the use of the radix-8 FFT butterfly wafer that is currently under development, continuous data rates of 160 MSPS are anticipated for FFTs of up to 4096 points with 16-bit fixed point data.  相似文献   

19.
This paper proposes a circuit-sharing approach to improve efficiency for the key digital audio broadcasting (DAB) techniques, i.e., MPEG1-audio decoding and orthogonal frequency division multiplexing (OFDM). Because OFDM's fast Fourier transform (FFT) requires heavy computational power for implementation, a single butterfly processing element (BPE) is adopted to reduce the chip area required for FFT. Furthermore, by modifying the BPE logical combinational circuit, both IMDCT (inverse modified discrete cosine transform) and FFT functions can be obtained simultaneously from a single VLSI chip. Therefore, the proposed technique reduces hardware overhead, enhances circuit efficiency and significantly reduces the cost of DAB receivers. The proposed circuit is simulated as a VLSI prototype chip using a 0.35 /spl mu/m CMOS process, with a chip area of about 22.09 mm/sup 2/ and a total gate count of approximately 10839 (excluding ROM and RAM).  相似文献   

20.
In this study, an improved butterfly structure and an address generation method for fast Fourier transform (FFT) are presented. The proposed method uses reduced logic to generate the addresses, avoiding the parity check and barrel shifters commonly used in FFT implementations. A general methodology for radix-2 $N$-point transforms is derived and the signal flow graph for a 16-point FFT is presented. Furthermore, as a case study, a 16-point FFT with 32-bit complex numbers is synthesized using a CMOS 0.18 $mu{rm m}$ technology. The circuit gate count analysis indicates that significant logic reduction can be achieved with improved throughput compared to the conventional implementations.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号