首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A 2-/spl mu/m CMOS VLSI digital signal processor (DSP) family, the SP50, is described that is capable of eight million instructions per second and up to six concurrent operations in each instruction. Two DSPs, the PCB5010 and PCB5011, have been developed. Both are based on a common architecture which contains two 16-bit data buses, and a 16/spl times/16/spl rarr/40-bit multiplier accumulator and 16-bit ALU, both with multiprecision support in hardware. Also implemented are two static data RAMs (128/spl times/16 or 256/spl times/16), a data ROM (51/spl times/16), a 15-word three-port register file, three address computation units, and five serial and parallel I/O interfaces. The data path is controlled by an orthogonal instruction set, using 40-bit microcode words. The controller contains a five-level stack and an instruction repeat register, and can have either on-chip program memory (RAM: 32/spl times/40; ROM: 987/spl times/40) or off-chip program memory (up to 64K/spl times/40). Benchmarks show a two to sixfold improvement in overall performance over its predecessors.  相似文献   

2.
This paper describes the architecture and the performance of a new programmable 16-bit Digital Signal Processor (DSP) engine. It is developed specifically for next generation wireless digital systems and speech applications. Besides providing a basic instruction set, similar to current day 16-bit DSP's, it contains distinctive architectural features and unique instructions, which make the engine highly efficient for compute-intensive tasks such as vector quantization and Viterbi operations. The datapath contains two Multiply-Accumulate units and one ALU. The external memory bandwidth is kept to two data busses and two corresponding address busses. Still, the internal bus network is designed such that all three units are operating in parallel. This parallelism is reflected in the performance benchmarks. For example, an FIR filter of N taps will take N/2 instruction cycles compared to N for a general purpose 16-bit DSP, and it will require only half the number of memory accesses of a general purpose DSP. This efficiency is reflected in the very low MIPS requirement to implement cellular standards.  相似文献   

3.
Erdogan  A.T. Arslan  T. 《Electronics letters》1996,32(21):1959-1960
A new multiplication scheme is proposed, for application to single multiplier CMOS based DSP processors, for the implementation of low-power digital FIR filters through the reduction of switching activity within the multiplier section of the filter. The scheme operates in conjunction with a transpose direct form FIR filter structure and a modified DSP processor architecture, through a significant reduction in power can be obtained by using algorithms to order the filter coefficients. This reduction is demonstrated using two basic examples, with different wordlengths and filter orders, achieving up to 63% reduction in switching activity  相似文献   

4.
The architecture and features of the Motorola DSP56200 are described. The DSP56200 is an algorithm-specific cascadable digital signal processing peripheral designed to perform the computationally intensive tasks associated with finite impulse response (FIR) and adaptive FIR digital filtering applications. The DSP56200 is implemented in high-performance, low-power 1.5-μm HCMOS technology and is available in a 28-pin DIP package. The on-chip computation unit includes a 97.5-ns 24-bit×24-bit coefficient RAM, and a 256-bit×16-bit data RAM. Three modes of operation allow the part to be used as a single, dual, or single adaptive FIR filter, with up to 256 taps per chip. In the adaptive mode, the part performs the FIR filtering and least-mean-square (LMS) coefficient update operations for a single tap in 195 ns, permitting use of the part as a 19-kHz sampling rate, 256-tap adaptive FIR filter. A programmable DC tap, coefficient leakage, and adaptation coefficient parameters in the adaptive FIR mode allow the DSP56200 to be used in a wide variety of adaptive FIR filtering applications. The performance of the part in an echo canceler configuration is presented. Typical applications of the part are also described  相似文献   

5.
A processor architecture combining high-performance and low-power is presented. A prototype chip, Xetal-II, has been realized in 90 nm CMOS technology based on the proposed architecture. Recent experimental results show a compute performance of up to 140 GOPS at 785 mW when operating at 110 MHz. The main architectural feature that allows high computational efficiency is the massively-parallel single-instruction multiple-data (MP-SIMD) compute paradigm. Due to the high data-level parallelism, applications like video scene analysis can efficiently exploit the proposed architecture. The chip has an internal 16-bit datapath and 10 Mbit of on-chip video memory facilitating energy efficient implementation of video processing kernels.  相似文献   

6.
This paper describes a novel reconfigurable architecture for digital signal processing (DSP). This architecture consists of a two-level array of cells and interconnections. On the upper level, fundamental DSP operations such as multiplication and addition are mapped onto blocks of 4-bit cells. On the lower level, each cell uses a 4 × 4 matrix of smaller “elements” to perform the necessary computations. Cells also contain pipeline latches for increased throughput. The architecture features a simple VLSI implementation that combines the flexibility of memory elements with the speed of DOMINO logic. Initial prototypes have been fabricated using a modest 0.5-μm CMOS technology. Circuit simulations of the cell in 0.25-μm technology indicate that the design achieves a clock frequency of 200 MHz.  相似文献   

7.
文章通过对32位定点DSP的体系结构及其设计方法的研究,重点阐述了32位定点DSP中CPU包括ALU、MPY、ARAU、流水线、指令系统和总线接口等关键逻辑部件工作原理,对各个逻辑部件的设计思路和实现方法进行了分析描述。采用基于标准单元正向设计方法,设计了一款32位指令集的定点DSP电路,该电路采用哈佛总线结构,可以在单周期内实现16×16位有符号整数乘法、32位累加和32位数据的算术逻辑运算,处理精度高。该电路采用0.5μm 1P3M CMOS工艺流片,集成度7万门,工作频率可达36 MHz,动态功耗594 mW。  相似文献   

8.
This paper addresses the problem of reducing power dissipation of finite impulse response (FIR) filters implemented on programmable digital signal processors (DSPs). We describe a generic DSP architecture and identify the main sources of power dissipation during FIR filtering. We present seven transformations to reduce power dissipated in one or more of these sources. These transformations complement each other and together operate at algorithmic, architectural, logic and layout levels of design abstraction. Each of the transformations is discussed in detail and the results are presented to highlight its effectiveness. We show that the power dissipation can be reduced by more than 40% using these transforms. The transformations have been encapsulated in a framework that provides a comprehensive solution to low-power realization of FIR filters on programmable DSP's  相似文献   

9.
A general-purpose programmable digital signal processor (DSP) has been implemented in 1.5-/spl mu/m (L/SUB eff/) NMOS technology using full-custom circuit design for high performance. The DSP has a 32-bit instruction set, 32-bit data path, and full-hardware 32-bit floating-point arithmetic. The architecture is described section by section, and an overview of the instruction set is presented. The extensive design verification process applied to the DSP is also described.  相似文献   

10.
This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.  相似文献   

11.
A multimedia system-on-a-chip (SoC) usually contains one or more programmable digital signal processors (DSP) to accelerate data-intensive computations. But most of these DSP cores are designed originally for standalone applications, and they must have some overlapped (and redundant) components with the host microprocessor. This paper presents a compact DSP for multi-core systems, which is fully programmable and has been optimized to execute a set of signal processing kernels very efficiently. The DSP core was designed concurrently with its automatic software generator based on high-level synthesis. Moreover, it performs lightweight arithmetic—the static floating-point (SFP), which approximates the quality of floating-point (FP) operations with the hardware similar to that of the integer arithmetic. In our simulations, the compact DSP and its auto-generated software can achieve 3X performance (estimated in cycles) of those DSP cores in the dual-core baseband processors with similar computing resources. Besides, the 16-bit SFP has above 40 dB signal to round-off noise ratio over the IEEE single-precision FP, and it even outperforms the hand-optimized programs based on the 32-bit integer arithmetic. The 24-bit SFP has above 64 dB quality, of which the maximum precision is identical to that of the single-precision FP. Finally, the DSP core has been implemented and fabricated in the UMC 0.18μm 1P6M CMOS technology. It can operate at 314.5 MHz while consuming 52mW average power. The core size is only 1.5 mm×1.5 mm including the 16 KB on-chip memory and the AMBA AHB interface. This work was supported by the National Science Council, Taiwan under Grant NSC93-2220-E-009-017. Besides, the authors would like to thank the National Chip Implementation Center (CIC) for chip fabrication. Tay-Jyi Lin received the BS degree in electrical and control engineering from National Chiao Tung University, Taiwan, in 1998. He is working toward the PhD degree in the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University. His current researches include the heterogeneous computing platform for embedded multimedia systems, complexity-aware architecture design, and high-performance/low-power digital signal processors. Hung-Yueh Lin received the BS and the MS degrees in electronics engineering from National Chiao Tung University, Taiwan, in 2002 and 2004, respectively. He is now with MediaTek, Inc., Hsinchu, Taiwan. His research interests include lightweight computer arithmetic and DSP architecture. Chie-Min Chao received the BS degree in electronics engineering from National Chiao Tung University, Taiwan, in 2003, where he is currently pursuing his MS degree. His researches include system software development, VLSI system design, and DSP architecture. Chih-Wei Liu received the BS and the PhD degrees in electrical engineering from National Tsing Hua University, Taiwan, in 1991 and 1999, respectively. From 1999 to 2000, he was an integrated circuit design engineer at the Electronics Research and Service Organization (ERSO) of Industrial Technology Research Institute (ITRI), Taiwan. Then, near the end of 2000, he started to work for the SoC Technology Center (STC) of ITRI as a project leader and eventually left ITRI at the end of Oct., 2003. He is currently with the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University, Taiwan, as an assistant professor. His current research interests include SoC and VLSI system design, processor architecture, digital signal processing, digital communications, and coding theory. Chein-Wei Jen received the BS degree from National Chiao Tung University, Taiwan, in 1970, the MS degree from Stanford University in 1977, and the PhD degree from National Chiao Tung University in 1983. From 1981 to 2004, he was with the Department of Electronics Engineering and the Institute of Electronics at National Chiao Tung University. Dr Jen was given the Outstanding Electrical Engineering Professor Award by the Chinese Institute of Electrical Engineering in 2002. He is currently the General Director of the SoC Technology Center at Industrial Technology Research Institute, the Adviser of National SoC Program, and the Managing Director of the Board of the Taiwan IC Design Society. His research interests include SoC design, VLSI architectures, multimedia processing, and design automation. He holds seven patents and has published over 50 journal and 100 conference papers in these areas.  相似文献   

12.
Describes a new 4-bit microcomputer fabricated using a low-power silicon gate CMOS process and working from a supply voltage down to 1.2 V. The /spl mu/C can directly drive up to seven 3:1 multiplexed LCD digits, scan up 48 keys, and perform 4-bit handshaking data transfer with external devices. 16-bit, single-word instructions and eight stack levels permit efficient use of the 640-word ROM. Operating from a 4.19 MHz crystal, the device has an instruction cycle time of 15 /spl mu/s. An operating power of 100 /spl mu/W at 1.5 W makes the chip ideal for performing control and timing functions in battery operated applications.  相似文献   

13.
This paper presents a network flow approach to solving the register binding and allocation problem for multiword memory access DSP processors. In recently announced DSP processors, 16-bit instructions which simultaneously access four words from memory are supported. A polynomial-time network flow methodology is used to allocate multiword accesses, including constant data-memory layout, while minimizing code size. Results show that improvements of up to 87% in terms of memory bandwidth are obtained compared to compiler-generated DSP code. This research is important for industry since this value-added technique can improve code size and utilize higher-memory bandwidths without increasing cost.  相似文献   

14.
We describe the design and implementation of a 16-bit clock-powered microprocessor that dissipates only 2.9 mW at 15.8 MHz based on laboratory measurements. Clock-powered logic (CPL) has been developed as a new approach for designing and building low-power VLSI systems that exploit the benefits of supply-voltage-scaled static CMOS and energy-recovery CMOS techniques. In CPL, the clock signals are a source of ac power for the other large on-chip capacitive loads. Clock amplitude and waveform shape combine to reduce power. By exploiting energy recovery and an energy-conserving clock driver, it is possible to build ultra-low-power CMOS processors with this approach. We compare the CPL approach with a conventional, fully dissipative approach for a processor with a similar ISA and VLSI architecture which was designed using the same set of VLSI CAD tools. The simulation results indicate that the CPL microprocessor would dissipate 40% less power than the conventional design  相似文献   

15.
A 32-b RISC/DSP microprocessor with reduced complexity   总被引:2,自引:0,他引:2  
This paper presents a new 32-b reduced instruction set computer/digital signal processor (RISC/DSP) architecture which can be used as a general purpose microprocessor and in parallel as a 16-/32-b fixed-point DSP. This has been achieved by using RISC design principles for the implementation of DSP functionality. A DSP unit operates in parallel to an arithmetic logic unit (ALU)/barrelshifter on the same register set. This architecture provides the fast loop processing, high data throughput, and deterministic program flow absolutely necessary in DSP applications. Besides offering a basis for general purpose and DSP processing, the RISC philosophy offers a higher degree of flexibility for the implementation of DSP algorithms and achieves higher clock frequencies compared to conventional DSP architectures. The integrated DSP unit provides instruction set support for highly specialized DSP algorithms. Subword processing optimized for DSP algorithms has been implemented to provide maximum performance for 16-b data types. While creating a unified base for both application areas, we also minimized transistor count and we reduced complexity by using a short instruction pipeline. A parallelism concept based on a varying number of instruction latency cycles made superscalar instruction execution superfluous  相似文献   

16.
The convergence of voice and video in next-generation wireless applications requires a processor that can efficiently implement a broad range of advanced third generation (3G) wireless algorithms. The micro signal architecture (MSA) core is a dual-MAC modified Harvard architecture that has been designed to have good performance on both voice and video algorithms. In addition, some of the best features and simplicity of microcontrollers has been incorporated into the MSA core. This article presents an overview of the MSA architecture, key engineering issues and their solutions, and details associated with the first implementation of the core. The utility of the MSA architecture for practical 3G wireless applications is illustrated with several application examples and performance benchmarks for typical DSP and image/video kernels. The DSP features of the MSA core include: two 16-bit single-cycle throughput multipliers, two 40-bit split data ALUs, and hardware support for on-the-fly saturation and clipping; two 32-bit pointer ALUs with support for circular and bit-reversed addressing; two separate data ports to a unified 4 GB memory space, a parallel port for instructions, and two loop counters that allow nested zero overhead looping  相似文献   

17.
Providing quality mobile video applications in hand-held mobile devices requires increased computational capability. Using Single Instruction Multiple Data (SIMD) techniques to expose and accelerate the data parallelism inherent in video processing increases performance in handheld and wireless systems. The paper introduces a new 64-bit SIMD coprocessor of the Intel® XScale® microarchitecture which is optimized for low-power handheld applications. The architecture blends the SIMD media processing style with the capabilities of the XScale microarchitecture. This paper provides an overview of the architecture, its instruction set, programming model, the pipeline organization and functional units. The paper also describes how key features of architecture improve the performance of video applications as compared to a scalar implementation. The performance and power improvements based upon measured results are analyzed to show how the opportunities of power savings by reducing the frequency and voltage can be realized.Nigel C. Paver has 13 years experience with the ARM architecture, and in the Intel PCA Components group in Austin, Texas, he is responsible for the architecture and implementation of multimedia coprocessors for the Intel XScale micro-architecture. He is also involved in product architecture and definition of Intel PCA processors. Before Intel, Nigel was one of the lead designers of the early AMULET asynchronous ARM microprocessors at the University of Manchester. He was also vice president in a startup company which used asynchronous design techniques to produce a low-power asynchronous DSP core. Nigel holds a Master of Science degree and Ph.D. in computer science from the University of Manchester and a Bachelor of Science degree in electronics from UMIST.Moinul Khan is a multimedia product architect at Intel Corporation PCA Components group. He is responsible PCA graphics and security architecture. His research interests are virtual prototyping, signal processing algorithms and architecture and communications networking. Before joining Intel he was a technology specialist and founding member of a startup at ATDC, Georgia. He worked on his doctoral research at Georgia Center for Advanced Telecommunications Technology at Georgia Institute of Technology. He received his B.Tech form Indian Insti-ture of Technology and MSEE from Georgia Tech. He also worked as a research member for Canadian Institute for Telecommunications Research and Bell Communications Laboratories.Bradley C. Aldrich joined Intel in 1997 where he is currently an architect within the PCA Components Group. His current work includes the development of coprocessor instruction support in addition to image capture and display technologies for XScale based application processors. He was previously a member of the Intel/Analog Devices joint development architecture team responsible for video enhancements for the Micro Signal Architecture. Prior to that he was a video system architect in Intel’s Digital Imaging and Video Division working on CMOS sensors, still cameras, and tethered PC based video peripherals. He has also worked as a device engineer for Motorola and as a test engineer for Tektronix. He received a BSEE in 1988 and MSEE in 1994 from the University of Texas at San Antonio.Christopher D. Emmons received a Bachelor of Science degree in Computer Science from the University of Texas at Austin in 2003. He joined Intel in 2001 and is currently a multimedia architect responsible for algorithm development and performance optimization for handheld products within the PCA Components Group. Prior to this he worked as an applications engineer providing performance and power analysis in support of product marketing groups. His research interests include video compression, operating system design, and dynamic resource management.  相似文献   

18.
Mathematical morphology has proven to be a very useful tool for applications such as smoothing, image skeletonization, pattern recognition, machine vision, etc. In this paper we present a 1-dimensional systolic architecture for the basic gray-scale morphology operations: dilation and erosion. Most other morphological operations like opening and closing, are also supported by the architecture since these operations are combinations of the basic ones. The advantages of our design stem from the fact that it has pipeline period = 1 (i.e., 100% processor utilization), it requires simple communications, and it is exploiting the simplicity of the morphological operations to make it possible to implement them in a linear target machine although the starting algorithm is a generalized 2-D convolution. We also propose a Locally Parallel Globally Sequential (LPGS) partitioning strategy for the best mapping of the algorithm onto the architecture. We conclude that for this particular problem LPGS is better than LSGP in a practical sense (pinout, memory requirement, etc.). Furthermore, we propose a chip design for the basic component of the array that will allow real-time video processing for 8- and 16-bit gray-level frames of size 512 × 512, using only 32 processors in parallel. The design is easily scalable so it can be custom-taylored to fit the requirement of each particular application.  相似文献   

19.
This paper reviews the design challenges that current and future processors must face, with stringent power limits, high-frequency targets, and the continuing system integration trends. This paper then describes the architecture, circuit design, and physical implementation of a first-generation Cell processor and the design techniques used to overcome the above challenges. A Cell processor consists of a 64-bit Power Architecture processor coupled with multiple synergistic processors, a flexible IO interface, and a memory interface controller that supports multiple operating systems including Linux. This multi-core SoC, implemented in 90-nm SOI technology, achieved a high clock rate by maximizing custom circuit design while maintaining reasonable complexity through design modularity and reuse.  相似文献   

20.
本文从设计和应用的角度分析了数字信号处理器(DSP)的特点,详细地从结构、指令集和运算单元方面阐述了DSP区别于其它处理器的特点;介绍了DSP的发展概况,从复杂指令单个乘法累加运算单元发展到复杂指令两个运算单元,又发展到简单指令多个运算单元,并指出是应用推动了DSP的飞速发展;最后,对DSP的发展作了预测,DSP将在多发射、嵌入式DSP核和控制运算混合处理器方向发展。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号