首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 172 毫秒
1.
A 36 mm/sup 2/ graphics processor with fixed-point programmable vertex shader is designed and implemented for portable two-dimensional (2-D) and three-dimensional (3-D) graphics applications. The graphics processor contains an ARM-10 compatible 32-bit RISC processor,a 128-bit programmable fixed-point single-instruction-multiple-data (SIMD)vertex shader, a low-power rendering engine, and a programmable frequency synthesizer (PFS). Different from conventional graphics hardware, the proposed graphics processor implements ARM-10 co-processor architecture with dual operations so that user-programmable vertex shading is possible for advanced graphics algorithms and various streaming multimedia processing in mobile applications. The circuits and architecture of the graphics processor are optimized for fixed-point operations and achieve the low power consumption with help of instruction-level power management of the vertex shader and pixel-level clock gating of the rendering engine. The PFS with a fully balanced voltage-controlled oscillator (VCO) controls the clock frequency from 8 MHz to 271 MHz continuously and adaptively for low-power modes by software. The chip shows 50 Mvertices/s and 200 Mtexels/s peak graphics performance, dissipating 155 mW in 0.18-/spl mu/m 6-metal standard CMOS logic process.  相似文献   

2.
A 121-mm/sup 2/ graphics LSI is designed and implemented for portable two-dimensional (2-D) and three-dimensional (3-D) graphics and MPEG-4 applications. The LSI contains a RISC processor with a multiply-accumulate unit (MAC), a 3-D rendering engine, a programmable power optimizer, and 29-Mb embedded DRAM. The chip is built in a 0.16-/spl mu/m pure DRAM technology to reduce the fabrication cost. Texture-mapped 3-D graphics with perspective-correct address calculation and bilinear MIPMAP filtering can be realized while consuming the low power with the help of depth-first clock gating, address alignment logic, and embedded DRAM. Programmable clocking allows the LSI to operate in lower power modes for various applications. The chip consumes less than 210 mW, delivering 66 Mpixels/s and 264 Mtexel/s texture-mapped pixels with real-time special effects such as full-scene antialiasing and motion blur.  相似文献   

3.
A low-power three-dimensional (3-D) rendering engine with two texture units and 29-Mb embedded DRAM is designed and integrated into an LSI for mobile third-generation (3G) multimedia terminals. Bilinear MIPMAP texture-mapped 3-D graphics can be realized with the help of low-power pipeline structure, optimization of datapath, extensive clock gating, texture address alignment, and the distributed activation of embedded DRAM. The scalable performance reaches up to 100 Mpixels/s and 400 Mtexels/s at 50 MHz. The chip is implemented with 0.16-/spl mu/m pure DRAM process to reduce the fabrication cost of the embedded-DRAM chip. The logic with DRAM takes 46 mm/sup 2/ and consumes 140 mW at 33-MHz operation, respectively. The 3-D graphics images are successfully demonstrated by using the fabricated chip on the prototype PDA board.  相似文献   

4.
A single-chip rendering engine that consists of a DRAM frame buffer, a SRAM serial access memory, pixel/edge processor array and 32-b RISC core is proposed for low-power three-dimensional (3-D) graphics in portable systems. The main features are two-dimensional (2-D) hierarchical octet tree (HOT) array structure with bandwidth amplification, three dedicated network schemes, virtual page mapping, memory-coupled logic pipeline, low-power operation, 7.1-GB/s memory bandwidth, and 11.1-Mpolygon/s drawing speed. The 56-mm2 prototype die integrating one edge processor, eight pixel processors, eight frame buffers, and a RISC core are fabricated using 0.35-μm CMOS embedded memory logic (EML) technology with four poly layers and three metal layers. The fabricated test chip, 590 mW at 100 MHz 3.3 V operation, is demonstrated with a host PC through a PCI bridge  相似文献   

5.
A high-speed three-dimensional (3-D) graphics SoC for consumer applications is presented. A 166-MHz 3-D graphics full pipeline engine with performance of 33 Mvertices/s and 1.3Gtexels/s, and 333-MHz ARM11 RISC processor, and video composition IPs are integrated together on a single chip. The geometry part of 3-D graphics IP provides full programmability in vertex and triangle level, and two-level multi-texturing with trilinear MIPMAP filtering are realized in the rasterization part. Per-pixel effects such as fog effects, alpha blending, and stencil test are also implemented in the proposed 3-D graphics IP. The rasterization architecture is designed for reducing external memory accesses to achieve the peak performance. The chip is fabricated using 0.13/spl mu/m CMOS technology and its area is 7.1/spl times/7.0mm/sup 2/.  相似文献   

6.
A low-power dual-standard video decoder has been developed for mobile applications. It supports MPEG-2 SP@ML and H.264/AVC BL@L4 video decoding in a single chip and features a scalable architecture to reach area/power efficiency. This chip integrates diverse algorithms of MPEG-2 and H.264/AVC to reduce silicon area. Three low-power techniques are proposed. First, a domain-pipelined scalability (DPS) technique is used to optimize the pipelined structure according to the number of processing cycles. Second, bandwidth scalability is implemented via a line-pixel-lookahead (LPL) scheme to improve the external bandwidth and reduce the internal memory size, leading to 51% of memory power reduction compared to a conventional design. Third, low-power motion compensation and deblocking filter are designed to reduce the operating frequency without degrading system performance. A test chip is fabricated in a 0.18mum one-poly six-metal CMOS technology with an area of 15.21 mm2. For mobile applications, H.264/AVC and MPEG-2 video decoding of quarter-common intermediate format (QCIF) sequences at 15 frames per second are achieved at 1.15 MHz clock frequency with power dissipation of 125 muW and 108 muW, respectively, at 1V supply voltage  相似文献   

7.
A 240-mW single-chip MPEG-4 videophone LSI with a 16-Mb embedded DRAM is fabricated utilizing a 0.25-μm CMOS triple-well quad-metal technology. The videophone LSI is applied to the 3GPP 3G-324M video-telephony standard for IMT-2000, and implements the MPEG-4 video SPL1 codec, the AMR speech codec, and the ITU-T H.223 Annex B multiplexing/demultiplexing at the same time. Three 16-bit multimedia-extended RISC processors, dedicated hardware accelerators, and a 16-Mb embedded DRAM are integrated on a 10.84 mm×10.84 mm die. It also integrates camera, display, audio, and network interfaces required for a mobile video-phone terminal. In addition to conventional low-power techniques, such as clock gating and parallel operation, some new low-power techniques are also employed. These include an embedded DRAM with optimized configuration, a low-power motion estimator, and the adoption of the variable-threshold voltage CMOS (VT-CMOS). The MPEG-4 videophone LSI consumes 240 mW at 60 MHz, which is only 22% of that for a conventional multichip design. Variable threshold voltage CMOS reduces standby leakage current to 26 μA, which is only 17% of that for the conventional CMOS design  相似文献   

8.
The token-ring controller (TRC) consists of five functional blocks. they are a dedicated 16-b microprocessor which includes 11 K-word×20-b protocol firmware ROM, finite-state machines for real-time handling of frames, an 896-word×16-b dual-port RAM for frame buffer FIFOs and working memory (FIFO/RAM), a host processor bus interface, and a three-channel DMA controller which can follow list structure frame buffers. The TRC interprets and executes 16 types of commands and handles 23 types of media access control (MAC) frames. It can continuously receive more than 90% of incoming packets with 64-byte information length at 40 Mbit/s network speed. It is fabricated with double-metal-layer 1.2-μm CMOS technology and integrates 510 K MOSFETs in a 14.49-mm×14.62-mm chip area. The maximum power consumption is 0.945 W at 8-MHz operating frequency and 5-V±5% power supply low-power systems but also for high-performance applications  相似文献   

9.
针对目前数字音频广播(DAB)收音机中DSP软件AAC+解码器功耗大的问题,该文提出了低功耗AAC LC解码器的ASIC设计,以极低的硬件代价完成了最基本的DAB+节目解码,加入DAB解码芯片后巧妙地实现了DAB+和DAB两种不同标准的兼容。该文设计优化了反量化与IMDCT算法,使用了分时工作法,从而实现了低功耗。该设计的系统时钟为16.384 MHz,采用0.18 m CMOS工艺,功耗约为6.5 mW,并与DAB信道解码结合,通过了FPGA开发板上的实时验证,且完成了芯片的版图设计,芯片面积为14 mm2。  相似文献   

10.
A low-power three-dimensional (3-D) rendering engine is implemented as part of a mobile personal digital assistant (PDA) chip. Six-megabit embedded DRAM macros attached to 8-pixel-parallel rendering logic are logically localized with a 3.2-GB/s runtime reconfigurable bus, reducing the area by 25% compared with conventional local frame-buffer architectures. The low power consumption is achieved by polygon-dependent access to the embedded DRAM macros with line-block mapping providing read-modify-write data transaction. The 3-D rendering engine with 2.22-Mpolygons/s drawing speed was fabricated using 0.18-/spl mu/m CMOS embedded memory logic technology. Its area is 24 mm/sup 2/ and its power consumption is 120 mW.  相似文献   

11.
A single chip system for real–time MPEG–2 decoding can be created by integrating a general purpose dual–issue RISC processor, with a small dedicated hardware for the variable length decoding (VLD) and block loading processes; a 32KB instruction RAM; and a 32KB data RAM. The VLD hardware performs Huffman decoding on the input data. The block loader performs the half–sample prediction for motion compensation and acts as a direct memory access (DMA) controller for the RISC processor by transferring data between an external 2MB DRAM and the internall 32 KB data RAM. The dual-issue RISC processor, running at 250MHz, is enhanced with a set of key sub-word and multimedia instructions for a sustained peak performance of 1000 MOPS. With this setup for MPEG-2 decoding applications, bi-directionally predicted non-intra video blocks are decoded in less than 800 cycles, leading to a single-chip, real-time MPEG-2 decoding system.  相似文献   

12.
A 600-MHz single-chip multiprocessor, which includes two M32R 32-bit CPU cores , a 512-kB shared SRAM and an internal shared pipelined bus, was fabricated using a 0.15-/spl mu/m CMOS process for embedded systems. This multiprocessor is based on symmetric multiprocessing (SMP), and supports modified-exclusive-shared-invalid (MESI) cache coherency protocol. The multiprocessor inherits the advantages of previously reported single-chip multiprocessors, while its multiprocessor architecture is optimized for use as an embedded processor. The internal shared pipelined bus has a low latency and large bandwidth (4.8 GB/s). These features enhance the performance of the multiprocessor. In addition, the multiprocessor employs various low-power techniques. The multiprocessor dissipates 800 mW in a 1.5-V 600-MHz multiprocessor mode. Standby power dissipation is less than 1.5 mW at 1.5 V. Hence, the multiprocessor achieves higher performance and lower power consumption. This paper presents a single-chip multiprocessor architecture optimized for use as an embedded processor and its various low-power techniques.  相似文献   

13.
A single-chip MPEG-2 MP@ML codec, integrating 3.8M gates on a 72-mm/sup 2/ die, is described. The codec employs a heterogeneous multiprocessor architecture in which six microprocessors with the same instruction set but different customization execute specific tasks such as video and audio concurrently. The microprocessor, developed for digital media processing, provides various extensions such as a very-long-instruction-word coprocessor, digital signal processor instructions, and hardware engines. Making full use of the extensions and optimizing the architecture of each microprocessor based upon the nature of specific tasks, the chip can execute not only MPEG-2 MP@ML video/audio/system encoding and decoding concurrently, but also MPEG-2 MP@HL decoding in real time.  相似文献   

14.
Presented here is MPEG-2 AAC low complexity profile decoder software for a low-power embedded RISC microprocessor, NEC V830 (300 mW @133 MHz). Fast processing methods for IMDCT reduce execution time by 41% and help achieve real-time decoding of a 5.1-channel audio signal, while using only 64.7% of processor capacity.  相似文献   

15.
A low-power and high-performance 4-way 32-bit stream processor core is developed for handheld low-power 3-D graphics systems. It contains a floating-point unified matrix, vector, and elementary function unit. By exploiting the logarithmic arithmetic and the proposed adaptive number conversion scheme, a 4-way arithmetic unit achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way. The processor featured by this functional unit and several proposed architectural schemes including embedded register index calculations, functional unit reconfiguration, and operand forwarding in logarithmic domain achieves 19.1% cycle count reduction for OpenGL transformation and lighting (TnL) operation from the latest work.   相似文献   

16.
This paper describes a heterogeneous multi-core processor (HMCP) architecture that integrates general-purpose processors (CPUs) and accelerators (ACCs) to achieve exceptional performance as well as low-power consumption for the SoCs of embedded systems. The memory architectures of CPUs and ACCs were unified to improve programming and compiling efficiency. Advanced audio codec-low complexity (AAC-LC) stereo audio encoding was parallelized on a heterogeneous multi-core having homogeneous processor cores and dynamically reconfigurable processor (DRP) ACC cores in a preliminary evaluation of the HMCP architecture. The performance evaluation revealed that 54times AAC encoding was achieved on the chip with two CPUs at 600 MHz and two DRPs at 300 MHz, which achieved encoding of an entire CD within 1- 2 min.  相似文献   

17.
This paper presents a single-chip programmable platform that integrates most of hardware blocks required in the design of embedded system chips. The platform includes a 32-bit multithreaded RISC processor (MT-RISC), configurable logic clusters (CLCs), programmable first-in-first-out (FIFO) memories, control circuitry, and on-chip memories. For rapid thread switch, a multithreaded processor equipped with a hardware thread scheduling unit is adopted, and configurable logics are grouped into clusters for IP-based design. By integrating both the multithreaded processor and the configurable logic on a single chip, high-level language-based designs can be easily accommodated by performing the complex and concurrent functions of a target chip on the multithreaded processor and implementing the external interface functions into the configurable logic clusters. A 64-mm/sup 2/ prototype chip integrating a four-threaded MT-RISC, three CLCs, programmable FIFOs, and 8-kB on-chip memories is fabricated in a 0.35-/spl mu/m CMOS technology with four metal layers, which operates at 100-MHz clock frequency and consumes 370 mW at 3.3-V power supply.  相似文献   

18.
The TANGRAM VLSI co-processor is intended as a building block for use in system-on-chip (SOC) designs for the versatile MPEG-4 multimedia standard. It is designed to perform the computation intensive final step of MPEG-4 video decoding: compositing of scenes at the display. This includes warping and alpha blending of multiple full-screen video textures in real-time. TANGRAM consists of a RISC control processor and multiple powerful arithmetic units that perform rendering calculations directly in hardware. This hybrid architecture enables adaptation to changes in algorithms or support for different video-formats in software. Communication to a host CPU and video decoding hardware is done via the very common PI-bus on-chip interface. TANGRAM directly interfaces with the ITU-R601/656 digital video output. VHDL implementation and synthesis for a 0.35 standard-cell library provide an estimate of 100 MHz achievable clock frequency (worst-case), 52 mm2 overall area and 1 Watt power dissipation. TANGRAM has sufficient performance for rendering of MPEG-4 Main Profile@Layer3 scenes (ITU-R 601).  相似文献   

19.
为了减少功耗与降低成本,根据ARM芯片对C语言良好支持的特点,在深度剖析MP3解码算法、分析C语言在ARM芯片上编程的优化方法的基础上,通过软件形式实现MP3音频解码器,使一些无硬件解码器支持的ARM嵌入式系统完成MP3解码任务,从而实现基于ARM的嵌入式系统的MP3软解码器,可以有效地降低系统功耗,提高解码效率,更好地扩展和增强便携嵌入式系统多媒体功能。  相似文献   

20.
A 28 mW/MHz at 80 MHz structured-custom RISC microprocessor design is described. This 32-b implementation of the PowerPC architecture is fabricated in a 3.3 V, 0.5 μm, 4-level metal CMOS technology, resulting in 1.6 million transistors in a 7.4 mm by 11.5 mm chip size. Dual 8-kilobyte instruction and data caches coupled to a high performance 32/64-b system bus and separate execution units (float, integer, loadstore, and system units) result in peak instruction rates of three instructions per clock cycle. Low-power design techniques are used throughout the entire design, including dynamically powered down execution units. Typical power dissipation is kept under 2.2 W at 80 MHz. Three distinct levels of software-programmable, static, low-power operation-for system power management are offered, resulting in standby power dissipation from 2 mW to 350 mW. CPU to bus clock ratios of 1×, 2×, 3×, and 4× are implemented to allow control of system power while maintaining processor performance. As a result, workstation level performance is packed into a low-power, low-cost design ideal for notebooks and desktop computers  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号