期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Low Power DSP Engine for Wireless Communications

Ingrid Verbauwhede Mihran Touriguian 《The Journal of VLSI Signal Processing》1998,18(2):177-186

This paper describes the architecture and the performance of a new programmable 16-bit Digital Signal Processor (DSP) engine. It is developed specifically for next generation wireless digital systems and speech applications. Besides providing a basic instruction set, similar to current day 16-bit DSP's, it contains distinctive architectural features and unique instructions, which make the engine highly efficient for compute-intensive tasks such as vector quantization and Viterbi operations. The datapath contains two Multiply-Accumulate units and one ALU. The external memory bandwidth is kept to two data busses and two corresponding address busses. Still, the internal bus network is designed such that all three units are operating in parallel. This parallelism is reflected in the performance benchmarks. For example, an FIR filter of N taps will take N/2 instruction cycles compared to N for a general purpose 16-bit DSP, and it will require only half the number of memory accesses of a general purpose DSP. This efficiency is reflected in the very low MIPS requirement to implement cellular standards. 相似文献

2.

A flexible adaptive FIR filter VLSI IC

Borth D.E. Gerson I.A. Haug J.R. Thompson C.D. 《Selected Areas in Communications, IEEE Journal on》1988,6(3):494-503

The architecture and features of the Motorola DSP56200 are described. The DSP56200 is an algorithm-specific cascadable digital signal processing peripheral designed to perform the computationally intensive tasks associated with finite impulse response (FIR) and adaptive FIR digital filtering applications. The DSP56200 is implemented in high-performance, low-power 1.5-μm HCMOS technology and is available in a 28-pin DIP package. The on-chip computation unit includes a 97.5-ns 24-bit×24-bit coefficient RAM, and a 256-bit×16-bit data RAM. Three modes of operation allow the part to be used as a single, dual, or single adaptive FIR filter, with up to 256 taps per chip. In the adaptive mode, the part performs the FIR filtering and least-mean-square (LMS) coefficient update operations for a single tap in 195 ns, permitting use of the part as a 19-kHz sampling rate, 256-tap adaptive FIR filter. A programmable DC tap, coefficient leakage, and adaptation coefficient parameters in the adaptive FIR mode allow the DSP56200 to be used in a wide variety of adaptive FIR filtering applications. The performance of the part in an echo canceler configuration is presented. Typical applications of the part are also described 相似文献

3.

30-Msamples/s programmable filter processor

Golla C. Nava F. Cavallotti F. Cremonesi A. Casgrande G. 《Solid-State Circuits, IEEE Journal of》1990,25(6):1502-1509

A 30-MHz finite impulse response (FIR) programmable filter processor that has been developed using a 1.2-μm CMOS EPROM technology with single metal is discussed. Its 30-MHz worst-case operating frequency meets most video filtering requirements and demonstrates the potential of nonvolatile memory technologies in embedded applications. The processor has been designed with a high level of parallelism and pipelining by using a transposed FIR structure. In this approach, the multipliers are implemented with an EPROM-based look-up table containing the results of the products between video samples and filter coefficients, according to the user's application. The chap can implement every kind of FIR filter with a maximum complexity of 59 taps in a half-band filter configuration, 32 taps for a symmetric filter, and 167 taps for an asymmetric one. The equivalent coefficient precision is 12 b, assuming 8 b of input data precision. Multiprocessor configurations are allowed for more demanding performances such as longer filters, input signal precision extension, two-dimensional processing, and increased throughput 相似文献

4.

An 85-MHz fourth-order programmable IIR digital filter chip

Hatamian M. Parhi K.K. 《Solid-State Circuits, IEEE Journal of》1992,27(2):175-183

The authors describe the design and VLSI implementation of a single-chip 85-MHz fourth-order infinite impulse response (IIR) digital filter chip fabricated in 0.9-μm CMOS technology. The coefficient and input data word lengths of the filter are 10 b each, and the output data word length is 15 b. The coefficients are fully programmable. The chip can be programmed to implement any IIR filter from first to fourth order or an FIR filter up to 16th order at sample rates up to 85 MHz. A total of seventeen 10×10 multiply-add modules are used in this chip. The chip contains 80000 devices in an active area of 14 mm². It dissipates 2.2 W at 85-MHz clock rate and performs over 1.5×10⁹ multiply-add operations per second. The underlying filtering algorithm, chip architecture, circuit and layout design, speed issues, and test results are described. The results of an E-beam probing experiment on packaged chips at 100-MHz clock rates are also presented and discussed 相似文献

5.

面向音频应用的可配置片上系统的设计与实现

谭洪贺孙义和《微电子学》2009,39(6)

基于一款通用的16位定点数字信号处理器,结合D/A转换器、A/D转换器和放大器等模拟电路模块,设计并实现了一种面向音频应用的可配置片上系统.该系统支持立体声输入输出,具有8～48 kHz之间可编程的采样频率,以及可编程的输入输出放大器增益.同时,设计使用了24位高精度Σ-Δ A/D转换器,并配有可供选择的数字滤波器.为支持不同应用,系统提供24位或16位的可编程字长调节.系统芯片工作在1.8 V电压下,芯片内各部分支持挂起或睡眠状态,有利于低功耗的便携式应用开发.介绍了部分关键功能模块的仿真、验证和测试,以及整个系统仿真模型的建立. 相似文献

6.

A programmable FIR digital filter using CSD coefficients

Kei-Yong Khoo Kwentus A. Willson A.N. Jr. 《Solid-State Circuits, IEEE Journal of》1996,31(6):869-874

An area-efficient programmable FIR digital filter using canonic signed-digit (CSD) coefficients was implemented that uses a switchable unit-delay to allocate the desired number of nonzero CSD coefficient digits to each filter tap. The prototype chip can allocate up to 16 pairs of nonzero CSD coefficient digits for a linear-phase filter, thus realizing filters with 32 linear-phase taps operating at 180 MHz with two nonzero CSD digits per filter tap. Additional nonzero CSD digits can be allocated to filter taps at the penalty of a reduced filter length and a reduced data-rate. The chip was implemented with 16-bit I/O in a die size of 5.9 mm by 3.4 mm using 1.0-μm CMOS technology 相似文献

7.

Kestrel: A Programmable Array for Sequence Analysis

Jeffrey D. Hirschberg David M. Dahle Kevin Karplus Don Speck Richard Hughey 《The Journal of VLSI Signal Processing》1998,19(2):115-126

Kestrel is a programmable linear array processor designed for sequence analysis. Among other features, Kestrel includes an 8-bit word, a single-cycle add-and-minimize instruction, a multiplier and efficient communication using shared registers. This paper describes Kestrel's functional units in detail, and examines each of their effects on system performance. With functional prototype chips completed, we will assemble a full single-board Kestrel array, with 512 processing elements on eight chips, in early 1998. 相似文献

8.

基于两层流水线结构的FIR滤波器设计 总被引：4，自引：0，他引：4

下载免费PDF全文

王沁李占才齐悦《电子学报》2005,33(2):367-369

本文提出了一种基于两层流水线体系结构的FIR滤波器的实现方案(2HPFIR).采用比输入采样频率快几倍的内部时钟频率,实现了乘加器件的高度复用,进而缩减了芯片面积.根据滤波器的抽头数目N和内部时钟快于采样频率的倍数M,在二层流水线结构的抽头链中,加入N/M-1个抽头把运算分成N/M个组.在流水线结构的组内形成M个阶段,组间形成N/M个阶段.随着抽头数量的增长,此结构很容易扩展,且不会增加关键路径的延时.此方法可以灵活应用到其它类似的专用滤波器设计中. 相似文献

9.

A single-chip video ghost canceller

Edwards B. Corry A. Weste N. Greenberg C. 《Solid-State Circuits, IEEE Journal of》1993,28(3):379-383

A 450 K-transistor video ghost canceller chip, which implements a flexibly configurable IIR and FIR filter, is described. A very compact digital filter tap operating at a pixel rate of 14.32 MHz (4×F _sc) allows 180 programmable taps to be implemented in a die area of 56.25 mm² in a 1 μm TLM CMOS process. The device operates with 3.3- or 5-V power supplies 相似文献

10.

A single-chip programmable platform based on a multithreaded processor and configurable logic clusters

Young-Don Bae Seong-Il Park In-Cheol Park 《Solid-State Circuits, IEEE Journal of》2003,38(10):1703-1711

This paper presents a single-chip programmable platform that integrates most of hardware blocks required in the design of embedded system chips. The platform includes a 32-bit multithreaded RISC processor (MT-RISC), configurable logic clusters (CLCs), programmable first-in-first-out (FIFO) memories, control circuitry, and on-chip memories. For rapid thread switch, a multithreaded processor equipped with a hardware thread scheduling unit is adopted, and configurable logics are grouped into clusters for IP-based design. By integrating both the multithreaded processor and the configurable logic on a single chip, high-level language-based designs can be easily accommodated by performing the complex and concurrent functions of a target chip on the multithreaded processor and implementing the external interface functions into the configurable logic clusters. A 64-mm/sup 2/ prototype chip integrating a four-threaded MT-RISC, three CLCs, programmable FIFOs, and 8-kB on-chip memories is fabricated in a 0.35-/spl mu/m CMOS technology with four metal layers, which operates at 100-MHz clock frequency and consumes 370 mW at 3.3-V power supply. 相似文献

11.

Minimization of switching activities of partial products for designing low-power multipliers

Chen O.T.-C. Sandy Wang Yi-Wen Wu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(3):418-433

This work presents low-power 2's complement multipliers by minimizing the switching activities of partial products using the radix-4 Booth algorithm. Before computation for two input data, the one with a smaller effective dynamic range is processed to generate Booth codes, thereby increasing the probability that the partial products become zero. By employing the dynamic-range determination unit to control input data paths, the multiplier with a column-based adder tree of compressors or counters is designed. To further reduce power consumption, the two multipliers based on row-based and hybrid-based adder trees are realized with operations on effective dynamic ranges of input data. Functional blocks of these two multipliers can preserve their previous input states for noneffective dynamic data ranges and thus, reduce the number of their switching operations. To illustrate the proposed multipliers exhibiting low-power dissipation, the theoretical analyzes of switching activities of partial products are derived. The proposed 16 /spl times/ 16-bit multiplier with the column-based adder tree conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and wavelet-based image coders, respectively. Furthermore, the proposed multipliers with row-based, hybrid-based adder trees reduce power consumption by over 35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When considering product factors of hardware areas, critical delays and power consumption, the proposed multipliers can outperform the conventional multipliers. Consequently, the multipliers proposed herein can be broadly used in various media processing to yield low-power consumption at limited hardware cost or little slowing of speed. 相似文献

12.

A 30-MHz mixed analog/digital signal processor

Takeuchi S. Kouno H. Hayashi Y. Maeda A. Okada K. Yazawa N. 《Solid-State Circuits, IEEE Journal of》1990,25(6):1458-1463

A multiplying encoder architecture that is implemented in the design of a mixed analog and digital signal processor is presented. The processor is suitable for performing both high-speed A/D conversion and digital filtering in a single chip. The device can resolve the input with 8 b at 30 Msample/s and perform 28 multiply and 28 add operations per sample under typical conditions. The processor is designed for a 28-tap programmable FIR (finite impulse response) filter with analog input signal which can be used for waveform shaping of the modem to obtain the desired transmission performance for business satellite communication and mobile communication. The chip is fabricated in a 1-μm double-polysilicon and double-metal CMOS technology. The chip size is 9.73×8.14 mm², and the chip operates with a single +5.0-V power supply. Typical power dissipation is 950 mW; 330 mW is dissipated in analog and 620 mW is in the digital block 相似文献

13.

Merging VLIW and vector processing techniques for a simple,high-performance processor architecture

《Microelectronics Journal》2015,46(7):637-655

This paper proposes a new processor architecture called VVSHP for accelerating data-parallel applications, which are growing in importance and demanding increased performance from hardware. VVSHP merges VLIW and vector processing techniques for a simple, high-performance processor architecture. One key point of VVSHP is the execution of multiple scalar instructions within VLIW and vector instructions on unified parallel execution datapaths. Another key point is to reduce the complexity of VVSHP by designing a two-part register file: (1) shared scalar–vector part with eight-read/four-write ports 64×32-bit registers (64 scalar or 16×4 vector registers) for storing scalar/vector data and (2) vector part with two-read/one-write ports 48 vector-registers, each stores 4×32-bit vector data. Moreover, processing vector data with lengths varying from 1 to 256 represents a key point for reducing the loop overheads. VVSHP can issue up to four scalar/vector operations in each cycle for parallel processing a set of operands and producing up to four results to be written back into VVSHP register file. However, it cannot issue more than one memory operation at a time, which loads/stores 128-bit scalar/vector data from/to data memory. The design of our proposed VVSHP processor is implemented using VHDL targeting the Xilinx FPGA Virtex-5 and its performance is evaluated. 相似文献

14.

块基型FPGA中IO模块阵列的设计

丁光新陈陵都刘忠立《半导体学报》2009,30(8):085008-6

相似文献

15.

The design for a Josephson micro-pipelined processor

Harada Y. Hioe W. Takagi K. Kawabe U. 《Applied Superconductivity, IEEE Transactions on》1994,4(2):97-106

A novel processor with micro-pipelined architecture is proposed for latch-type Josephson logic devices. The processor is segmented into several operating stages activated by a multi-phase power system. Independent register groups are allocated to each stage in order to support pipeline processing of several instruction streams. This architecture allows building of a fine pipeline pitch processor which is capable of MIMD processing. A 12-bit micro-pipelined Josephson processor, containing an ALU, a multiplier and 16 registers, is described. Driven by a 3-phase AC power system, it is able to process 4 instruction streams simultaneously. A pipeline pitch of 3.3 GHz is expected using conventional Josephson device technology. A 4-bit processor design for 12-bit data length is also discussed 相似文献

16.

A pipelined 50-MHz CMOS 64-bit floating-point arithmetic processor

Benschneider B.J. Bowhill W.J. Copper E.M. Gavrielov M.N. Gronowski P.E. Maheshwari V.K. Peng V. Pickholtz J.D. Samudrala S. 《Solid-State Circuits, IEEE Journal of》1989,24(5):1317-1323

A 135K transistor, uniformly pipelined 50-MHz CMOS 64-bit floating-point arithmetic processor chip is described. The execution unit is capable of sustaining pipelined performance of one 32-bit or 64-bit result every 20 ns for all operations except double-precision multiply (40 ns) and divide. The chip employs an exponent difference prediction scheme and a unified leading-one and sticky-bit computation logic for the addition and subtraction operations. A hardware multiplier using a radix-8 modified Booth algorithm and a divider using a radix-2 SRT algorithm are employed.<> 相似文献

17.

Low-power and area-efficient FIR filter implementation suitable for multiple taps

Kyung-Saeng Kim Kwyro Lee 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(1):150-153

This paper describes a 32-tap finite impulse response (FIR) filter with two 16-tap macros suitable for multiple taps. The derived condition for a coded coefficient and data block shows 35% savings in power consumption and 44% improvement in occupied area compared to a typical radix-4 modified Booth algorithm. According to the condition and separated shifting-accessing clock scheme, we have implemented a 32-tap FIR filter in 0.6-/spl mu/m CMOS technology with three levels of metal. The chip that occupies 2.3/spl times/2.5 mm/sup 2/ of silicon area has an operating frequency of 20 MHz and consumes 75 mW at V/sub dd/=3.3 V. 相似文献

18.

A Custom VLSI Chip Set for Digital Signal Processing in High-Speed Voiceband Modems

Qureshi S. Ahmed H. 《Selected Areas in Communications, IEEE Journal on》1986,4(1):81-91

Systems modems intended for use in relatively large private networks are characterized by high performance, reliability and flexibility to support network management, and multiple modes of operation and user features. This paper describes a programmable digital signal processor which is teamed with a 16-bit microprocessor in a dual processor architecture satisfying the requirements of high-speed voiceband systems modems. The architecture of the two custom integrated circuits which form the basis of the signal processor is presented. This processor has novel arithmetic, data structure address generation, and program flow-control capabilities, which result in a high utilization of the arithmetic unit and a low program overhead for housekeeping tasks. Some of these features are illustrated by programming examples. 相似文献

19.

A low-power adder operating on effective dynamic data ranges

Chen O.T.-C. Sheen R.R.-B. Wang S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2002,10(4):435-453

To design a power-efficient digital signal processor, this study develops a fundamental arithmetic unit of a low-power adder that operates on effective dynamic data ranges. Before performing an addition operation, the effective dynamic ranges of two input data are determined. Based on a larger effective dynamic range, only selected functional blocks of the adder are activated to generate the desired result while the input bits of the unused functional blocks remain in their previous states. The added result is then recovered to match the required word length. Using this approach to reduce switching operations of noneffective bits allows input data in 2's complement and sign magnitude representations to have similar switching activities. This investigation thus proposes a 2's complement adder with two master-stage and slave-stage flip-flops, a dynamic-range determination unit and a sign-extension unit, owing to the easy implementation of addition and subtraction in such a system. Furthermore, this adder has a minimum number of transistors addressed by carry-in bits and thus is designed to reduce the power consumption of its unused functional blocks. The dynamic range and sign-extension units are explored in detail to minimize their circuit area and power consumption. Experimental results demonstrate that the proposed 32-bit adder can reduce power dissipation of conventional low-power adders for practical multimedia applications. Besides the ripple adder, the proposed approach can be utilized in other adder cells, such as carry lookahead and carry-select adders, to compromise complexity, speed and power consumption for application-specific integrated circuits and digital signal processors. 相似文献

20.

A 155-mW 50-m vertices/s graphics processor with fixed-point programmable vertex shader for mobile applications

Ju-Ho Sohn Jeong-Ho Woo Min-Wuk Lee Hye-Jung Kim Woo R. Hoi-Jun Yoo 《Solid-State Circuits, IEEE Journal of》2006,41(5):1081-1091

A 36 mm/sup 2/ graphics processor with fixed-point programmable vertex shader is designed and implemented for portable two-dimensional (2-D) and three-dimensional (3-D) graphics applications. The graphics processor contains an ARM-10 compatible 32-bit RISC processor,a 128-bit programmable fixed-point single-instruction-multiple-data (SIMD)vertex shader, a low-power rendering engine, and a programmable frequency synthesizer (PFS). Different from conventional graphics hardware, the proposed graphics processor implements ARM-10 co-processor architecture with dual operations so that user-programmable vertex shading is possible for advanced graphics algorithms and various streaming multimedia processing in mobile applications. The circuits and architecture of the graphics processor are optimized for fixed-point operations and achieve the low power consumption with help of instruction-level power management of the vertex shader and pixel-level clock gating of the rendering engine. The PFS with a fully balanced voltage-controlled oscillator (VCO) controls the clock frequency from 8 MHz to 271 MHz continuously and adaptively for low-power modes by software. The chip shows 50 Mvertices/s and 200 Mtexels/s peak graphics performance, dissipating 155 mW in 0.18-/spl mu/m 6-metal standard CMOS logic process. 相似文献