首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A low-power multimedia processor for mobile applications is presented. An 80-MHz 32-b RISC with enhanced multiplier, two 20-MHz hardware accelerators with 7.125-Mb embedded DRAM for MPEG-4 visual SP@L1 decoding and 3-D graphics processing, 2-kB dual-port SRAM, and peripheral blocks are integrated together on a single chip, MPEG-4 SP@L1 video decoding and 3-D graphics rendering with a 16-b depth-buffer alpha-blending double-buffering and gouraud-shading features at 2, 2-Mpolygons/s speed are realized with the help of the dedicated hardware accelerators/ The architecture of the processor is optimized in terms of power consumption and performance, and various low-power circuit techniques are adopted in each hardware block. The chip is implemented using 0.18-μm embedded memory logic (EML) technology. Its area is 84 mm2, and power consumption is 160 mW when all of the functions are activated  相似文献   

2.
The first single-chip 64-b vector-pipelined processor (VPP) ULSI is described. It executes vector operations indispensable to high-speed scientific computation. The VPP ULSI attains a 200-MFLOPS peak performance at a 100-MHz clock frequency. This extremely high performance is made possible by the integration on the VPP of a 64-b five-stage pipelined adder/shifter, a 64-b five-stage pipelined multiplier/divider/logic operation unit, and a 40-kb register file. Various new high-speed circuit techniques have been also developed for 100-MHz operations. The chip, which was fabricated with a 0.8-μm BiCMOS and triple-layer metallization process technology, has a 17.2-mm×17.3-mm area and contains about 693 K transistors. It consumes 13.2 W at a 100-MHz clock frequency with a single 5-V power supply  相似文献   

3.
A 28 mW/MHz at 80 MHz structured-custom RISC microprocessor design is described. This 32-b implementation of the PowerPC architecture is fabricated in a 3.3 V, 0.5 μm, 4-level metal CMOS technology, resulting in 1.6 million transistors in a 7.4 mm by 11.5 mm chip size. Dual 8-kilobyte instruction and data caches coupled to a high performance 32/64-b system bus and separate execution units (float, integer, loadstore, and system units) result in peak instruction rates of three instructions per clock cycle. Low-power design techniques are used throughout the entire design, including dynamically powered down execution units. Typical power dissipation is kept under 2.2 W at 80 MHz. Three distinct levels of software-programmable, static, low-power operation-for system power management are offered, resulting in standby power dissipation from 2 mW to 350 mW. CPU to bus clock ratios of 1×, 2×, 3×, and 4× are implemented to allow control of system power while maintaining processor performance. As a result, workstation level performance is packed into a low-power, low-cost design ideal for notebooks and desktop computers  相似文献   

4.
方文森  许大丹 《信息技术》2007,31(11):118-120
随着嵌入式系统应用领域的发展,基于性价比、功耗、共享性、可靠性和实时性等诸多因素的考虑,许多嵌入式设计正在采用多处理器体系结构。但同时也带来了处理器间如何协调、相互沟通以确保一致性的问题。提出了I^2C总线和中断相结合的解决机制,并阐述了该机制在远程无线抄表系统中的应用。  相似文献   

5.
A 36 mm/sup 2/ graphics processor with fixed-point programmable vertex shader is designed and implemented for portable two-dimensional (2-D) and three-dimensional (3-D) graphics applications. The graphics processor contains an ARM-10 compatible 32-bit RISC processor,a 128-bit programmable fixed-point single-instruction-multiple-data (SIMD)vertex shader, a low-power rendering engine, and a programmable frequency synthesizer (PFS). Different from conventional graphics hardware, the proposed graphics processor implements ARM-10 co-processor architecture with dual operations so that user-programmable vertex shading is possible for advanced graphics algorithms and various streaming multimedia processing in mobile applications. The circuits and architecture of the graphics processor are optimized for fixed-point operations and achieve the low power consumption with help of instruction-level power management of the vertex shader and pixel-level clock gating of the rendering engine. The PFS with a fully balanced voltage-controlled oscillator (VCO) controls the clock frequency from 8 MHz to 271 MHz continuously and adaptively for low-power modes by software. The chip shows 50 Mvertices/s and 200 Mtexels/s peak graphics performance, dissipating 155 mW in 0.18-/spl mu/m 6-metal standard CMOS logic process.  相似文献   

6.
A time-shifted correlated double sampling (CDS) technique is proposed in the design of a 10-bit 100-MS/s pipelined ADC. This technique significantly reduces the finite opamp gain error without compromising the conversion speed, allowing the active opamp blocks to be replaced by simple cascoded CMOS inverters. Both high-speed and low-power operation is achieved without compromising the accuracy requirement. An efficient common-mode voltage control is introduced for pseudodifferential architecture which can further reduce power consumption. Fabricated in a 0.18-/spl mu/m CMOS process, the prototype 10-bit pipelined ADC occupies 2.5 mm/sup 2/ of active die area. With 1-MHz input signal, it achieves 65-dB SFDR and 54-dB SNDR at 100MS/s. For 99-MHz input signal, the SFDR and SNDR are 63 and 51 dB, respectively. The total power consumption is 67 mW at 1.8-V supply, of which analog portion consumes 45 mW without any opamp current scaling down the pipeline.  相似文献   

7.
This paper presents a single-chip programmable platform that integrates most of hardware blocks required in the design of embedded system chips. The platform includes a 32-bit multithreaded RISC processor (MT-RISC), configurable logic clusters (CLCs), programmable first-in-first-out (FIFO) memories, control circuitry, and on-chip memories. For rapid thread switch, a multithreaded processor equipped with a hardware thread scheduling unit is adopted, and configurable logics are grouped into clusters for IP-based design. By integrating both the multithreaded processor and the configurable logic on a single chip, high-level language-based designs can be easily accommodated by performing the complex and concurrent functions of a target chip on the multithreaded processor and implementing the external interface functions into the configurable logic clusters. A 64-mm/sup 2/ prototype chip integrating a four-threaded MT-RISC, three CLCs, programmable FIFOs, and 8-kB on-chip memories is fabricated in a 0.35-/spl mu/m CMOS technology with four metal layers, which operates at 100-MHz clock frequency and consumes 370 mW at 3.3-V power supply.  相似文献   

8.
This paper describes a 160 MHz 500 mW 32 b StrongARM(R) microprocessor designed for low-power, low-cost applications. The chip implements the ARM(R) V4 instruction set and is bus compatible with earlier implementations. The pin interface runs at 3.3 V but the internal power supplies can vary from 1.5 to 2.2 V, providing various options to balance performance and power dissipation. At 160 MHz internal clock speed with a nominal Vdd of 1.65 V, it delivers 185 Dhrystone 2.1 MIPS while dissipating less than 450 mW. The range of operating points runs from 100 MHz at 1.65 V dissipating less than 300 mW to 200 MHz at 2.0 V for less than 900 mW. An on-chip PLL provides the internal clock based on a 3.68 MHz clock input. The chip contains 2.5 million transistors, 90% of which are in the two 16 kB caches. It is fabricated in a 0.35-μm three-metal CMOS process with 0.35 V thresholds and 0.25 μm effective channel lengths. The chip measures 7.8 mm×6.4 mm and is packaged in a 144-pin plastic thin quad flat pack (TQFP) package  相似文献   

9.
AMULET2e: an asynchronous embedded controller   总被引:5,自引:0,他引:5  
AMULET2e is an embedded system chip incorporating a 32-bit ARM-compatible asynchronous processor core, a 4-Kb pipelined cache, a flexible memory interface with dynamic bus sizing, and assorted programmable control functions. Many on-chip performance-enhancing and power-saving features are switchable, enabling detailed experimental analysis of their effectiveness. AMULET2e silicon demonstrates competitive performance and power efficiency, ease of system design, and it includes innovative features that exploit its asynchronous operation to advantage in applications that require low standby power and/or freedom from the electromagnetic interference generated by system clocks  相似文献   

10.
A multicore system-on-chip (SoC) has been developed for various applications (recognition, inference, measurement, control, and security) that require high-performance processing and low power consumption. This SoC integrates three types of synthesizable processors: eight CPUs (M32R), two multi-bank matrix processors (MBMX), and a controller (M32C). These processors operate at 1 GHz, 500 MHz, and 500 MHz, respectively. These three types of processors are interconnected on this chip with a high-bandwidth multi-layer system bus. The eight CPUs are connected to a common pipelined bus using a cache coherence mechanism. Additionally, a 512-kB L2 cache memory is shared by the eight CPUs to reduce internal bus traffic. A multi-bank matrix processor with 2-read/1-write calculation and background I/O operation has been adopted. The 1-GHz CPU is realized using a delay management network which consists of delay monitors that can be applied for any kind of application or process technology. Our configurable heterogeneous architecture with nine CPUs and two matrix processors reduces power consumption by 45%.  相似文献   

11.
A low-power, large-scale parallel video compression architecture for a single-chip digital CMOS camera is discussed in this paper. This architecture is designed for highly computationally intensive image and video processing tasks necessary to support video compression. Two designs of this architecture, an MPEG2 encoder and a DV encoder, are presented. At an image resolution of 640 × 480 pixels (MPEG2) and 720 × 576 (DV) and a frame rate of 25 to 30 frames per second, a computational throughput of up to 1.8 billion operations per second (BOPS) is required. This is supported in the proposed architecture using a 40 MHz clock and an array of 40 to 45 parallel processors implemented in a 0.2 m CMOS technology and with a 1.5 V supply voltage. Power consumption is significantly reduced through the single-chip integration of the CMOS photo sensors, the embedded DRAM technology, and the proposed pipelined parallel processors. The parallel processors consume approximately 45 mW of power resulting a power efficiency of 40 BOPS/W.  相似文献   

12.
A single-chip rendering engine that consists of a DRAM frame buffer, a SRAM serial access memory, pixel/edge processor array and 32-b RISC core is proposed for low-power three-dimensional (3-D) graphics in portable systems. The main features are two-dimensional (2-D) hierarchical octet tree (HOT) array structure with bandwidth amplification, three dedicated network schemes, virtual page mapping, memory-coupled logic pipeline, low-power operation, 7.1-GB/s memory bandwidth, and 11.1-Mpolygon/s drawing speed. The 56-mm2 prototype die integrating one edge processor, eight pixel processors, eight frame buffers, and a RISC core are fabricated using 0.35-μm CMOS embedded memory logic (EML) technology with four poly layers and three metal layers. The fabricated test chip, 590 mW at 100 MHz 3.3 V operation, is demonstrated with a host PC through a PCI bridge  相似文献   

13.
A 250-MHz single-chip multiprocessor, which can implement multichannel decoding, encoding, and transcoding of various audio and video standards, was fabricated using 0.25-μm CMOS technology and consumes 2.38 W at 2.5 V. The multiprocessor integrates four processors and 64-kB shared level-2 cache and exploits coarse-grained parallelism inherent in audio and video signal processing with multithreaded programming. Three coprocessors and scratch-pad memory have been added to each processing element and perform subword parallel processing, background data transfer, and bitstream processing for audio and video signal processing. Useful-skew and clock gating have been utilized to achieve high-speed operation and low power consumption. Consequently, the multiprocessor achieves MPEG2 (MP@HL) video decoding at 20 frames/s  相似文献   

14.
First-in-first-out (FIFO) data storages are in great demand for telecommunication LSIs. This paper presents high-speed and low-power CMOS memory techniques specialized for FIFO operation. A size-configurable architecture using the tile methodology is employed to customize the word counts and/or data bits with a. short time of less than 30 min. Four flag bits are introduced to inform the internal state of FIFO memories. To obtain a higher operating speed, an SRAM-like memory cell with current-sense readout is used. The critical-path delay of the Gray-code up/down counter, indicating the stored data volume, is shortened to 6.0 ns (66%) by using a double-rail single-stage XOR circuit. As to the low-power techniques, a wordline/bitline-swapped dual-port memory-cell architecture is proposed to cut off the static power-supply current of unselected columns. By using the hidden blanket-precharged bitline scheme, the power dissipation of the writing circuitry is minimized without degrading the operating speed. A new data-driven gated-shift-pulse architecture is also proposed to reduce the power dissipation of shift-register-type address pointers (1.5 mW at 100 MHz). A 2K-words × 8-bits FIFO memory test chip, fabricated with a 0.6-μm CMOS process (a short effective channel length of 0.35 μm is available for both the nMOS and pMOS), has demonstrated the 140-MHz operation at a typical 3.3-V power supply. The power dissipation in standby is less than 0.1 μW and that at 100-MHz dual-port operation with single fan-out loads is in the range from 28 mW (in the best case with the M-scan test pattern) to 46 mW (in the worst case with the checkerboard test pattern)  相似文献   

15.
An MIMD multiprocessor digital signal-processing (DSP) chip containing four 64-b processing elements (PE's) interconnected by a 128-b pipelined split transaction bus (STBus) is presented. Each PE contains a 32-b RISC core with DSP enhancements and a 64-b single-instruction, multiple-data vector coprocessor with four 16-b MAC/s and a vector reduction unit. PEs are connected to the STBus through reconfigurable dual-ported snooping L1 cache memories that support shared memory multiprocessing using a modified-MESI data coherency protocol. High-bandwidth data transfers between system memory and on-chip caches are managed in a pipelined memory controller that supports multiple outstanding transactions. An embedded RTOS dynamically schedules multiple tasks onto the PEs. Process synchronization is achieved using cached semaphores. The 200-mm2, 0.25-μm CMOS chip operates at 100 MHz and dissipates 4 W from a 3.3-V supply  相似文献   

16.
A high performance 10-bit 100-MS/s two-channel time-interleaved pipelined ADC is designed for intermediate frequency 3G receivers,and OTA is shared among the channels for low power dissipation.Offset mismatch, gain mismatch and time skew mismatch are overcome by OTA sharing,increasing the accuracy of each channel and global passive sampling respectively.The linearity deterioration caused by the charge injection of the output switch and the crosstalk of the off-switch capacitor is removed by modifying the...  相似文献   

17.
10bit 100M低功耗时间交织运放共享模数转换器设计   总被引:1,自引:1,他引:0  
许莱  殷秀梅  杨华中 《半导体学报》2010,31(9):095012-6
本文设计了一个应用于3G接收机中频的10比特100兆采样率的双通道时间交织流水线模数转换器,为了降低功耗,运放在两通道间共享。针对通道间的直流失调失配,增益失配以及采样时间偏差,设计分别采用共享运放,增加每个通道转换精度以及全局采样技术来加以解决。通过改变时序,消除了输出开关电荷注入以及断开开关的电容造成的串扰,从而提高了整个模数转换器的线性度。整个模数转换器的供电电压为3.3V,功耗为70毫瓦,采用了180纳米CMOS工艺,面积为3×2mm2,在奈奎斯特频率以内,其杂散无失真动态范围大于70dB,其信杂比大于56dB。  相似文献   

18.
A 200-MHz 16-b BiCMOS super high-speed signal processing (SSSP) circuit has been developed for high-speed digital signal processor (DSP) LSIs. In order to produce extremely fast LSI circuits, several novel techniques have been combined for integration of the SSSP. They include a redundant binary convolver architecture, a double-stage pipelined convolver architecture, and submicrometer BiCMOS drivers with large capacitive load drivability. The SSSP performs 200-MHz addition. The chip, which was fabricated with 0.8-μm BiCMOS and triple-layer metallization technology, has an area of 5.87 mm×5.74 mm and contains 20150 transistors. It operates at a clock frequency of 200 MHz with a single 5-V power supply and typically consumes 800 mW  相似文献   

19.
We have developed a 0.25-μm, 200-MHz embedded RISC processor for multimedia applications. This processor has a dual-issue superscalar datapath that consists of a 32-bit integer unit and a 64-bit single-instruction multiple-data (SIMD) function unit that together have a total of five multiply-adders. An on-chip concurrent Rambus DRAM (C-RDRAM) controller uses interleaved transactions to increase the memory bandwidth of the Rambus channel to 533 Mb/s. The controller also reduces latency by using the transaction interleaving and instruction prefetching. A 64-bit, 200-MHz internal bus transfers data among the CPU core, the C-RDRAM, and the peripherals. These high-data-rate channels improve CPU performance because they eliminate a bottleneck in the data supply. The datapath part of this chip was designed using a functional macrocell library that included placement information for leaf cells and resulted in the SIMD function unit of this chip's having 68000 transistors per square millimeter  相似文献   

20.
A 1.4-V 8-bit 300-MS/s folding and interpolating analog-to-digital converter (ADC) is proposed. Fabricated in the 0.13-μm CMOS process and occupying only 0.6-mm2 active area, the ADC is especially suitable for embedded applications. The system is optimized for a low-power purpose. Pipelining sampling switches help to cut down the extra power needed for complete settling. An averaging resistor array is placed between two folding stages for power-saving considerations. The converter achieves 43.4-dB signal-to-noise and distortion ratio and 53.3-dB spurious-free dynamic range at 1-MHz input and 42.1-dB and 49.5-dB for Nyquist input. Measured results show a power dissipation of 34 mW and a figure of merit of 1.14 pJ/convstep at 250-MHz sampling rate at 1.4-V supply.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号