期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

齐悦李占才王沁《计算机工程》2006,32(23):236-237

功耗与硅面积一样已成为芯片设计中的关键问题，尤其是在数字信号处理集成电路设计中。基于标准单元的VLSI设计是实现数字信号处理模块芯片或模块的重要方法。该文提出了一种基于标准单元的低功耗FIR滤波器多层次设计方案，其中体系结构层次采用多层流水线策略，逻辑层次将加法集成到部分积压缩中，在电路层次采用最小器件，从而在最大限度减少面积的同时降低了FIR的功耗。根据实际需求，该设计方案易于扩展和变换，可灵活应用到其它类似的滤波器设计中。实现结果表明在TSMC0.25标准单元库下FIR的功耗最多可降低20％以上。相似文献

2.

不同ALU实现方法的功耗研究

孙军凯蒋安平《微处理机》2011,32(4):1-4

低功耗是微处理器设计中一项具有挑战性的工作。对每一个组成单元进行功耗优化是进行低功耗微处理器设计必不可少的一种方法。算术逻辑单元(Arithmetic and Logic Unit,ALU)是微处理器中最基本的组成单元之一。ALU的结构与功耗、延迟和面积有着复杂的联系。常用的ALU结构有三种:复合结构、加法器独立结构和链式结构。基于这三种结构,实现了一个8比特ALU,通过对这个8-bit ALU进行功耗分析来研究ALU的结构对功耗的影响。研究结果表明:复合结构ALU具有最小的功耗,与其它两种结构的ALU相比,能分别节省19.38%和33.87%的功耗。相似文献

3.

一种定浮点合并的FALU设计与实现

王云贵杨靓《微处理机》2011,32(2):7-9,13

算术逻辑单元(ALU)是现代通用处理器和DSP处理器的核心功能部件。设计了一种定浮点合并的FALU,通过结合使用功能单元复用技术、操作数隔离技术和旁路技术,理论上能有效减少芯片的面积,降低芯片的功耗。FALU能实现21种指令,仿真验证显示其功能完全正确。相似文献

4.

基于Innovus的低功耗物理设计

《电子技术应用》2016,(8):21-24

为了减少芯片功耗,可靠的低功耗物理设计必不可少。基于新一代布局布线工具Innovus,分四个部分阐述了新的低功耗物理设计流程。这些内容包括:基于低功耗的物理库设计;低功耗布局和优化、基于输入向量的功耗优化;低功耗时钟树协同设计CCOPT(clock concurrent optimization);时钟树后低功耗优化。Innovus作为Cadence全新的布局布线工具,提供基于Giga Opt引擎的功耗驱动优化和高级时钟树协同优化(CCOPT)等方法,有效帮助设计者实现低功耗芯片设计。全新的低功耗物理设计可改善芯片数字逻辑15%功耗。相似文献

5.

X-DSP ALU与移位部件的设计与实现

彭元喜邹佳骏《计算机应用》2010,30(7):1978-1982

X型DSP是我们自主研发的一款低功耗高性能DSP。对X型DSP的CPU体系结构进行了深入研究,在详细分析X型DSP的ALU部件和移位器部件相关指令基础上,对ALU与移位器部件进行了设计与实现。采用Design Compiler综合工具,基于SMIC公司0.13um CMOS工艺库对ALU移位部件进行了逻辑综合,电路功耗共为4.2821mW,电路面积为71042.9804m2,工作频率达到250MHz。相似文献

6.

智能遥控控制芯片低功耗设计

陈志冲高军周锦锋《计算机应用研究》2004,21(6):193-195

在前人低功耗设计研究基础上,实现了智能遥控控制芯片的低功耗设计,重点从系统结构级提出了减少功耗的一些方法,如低功耗模式、降低高频时钟的活动、减少存储器的功耗、减少信号跳变等。相似文献

7.

14 nm工艺下基于CUPF的数字IC低功耗物理设计

《电子技术应用》2017,(9):25-29

随着集成电路生产工艺的迅速发展,功耗作为芯片质量的重要衡量标准引起了国内外学者越来越多的重视和研究。当晶体管的特征尺寸减小到纳米级时,其泄露电流的增加、工作频率的提高和晶体管门数的攀升极大提高了芯片的功耗。同时,传统的基于UPF(Unified Power Format)的低功耗设计流程存在着效率低、可修复性差等缺点。针对以上问题,以14 nm工艺下数字芯片fch_sata_t模块为例,简要介绍了全新的基于CUPF(Constant UPF)的低功耗物理设计流程,利用门控电源和多电源电压等技术对芯片进行低功耗设计。最终,通过Synopsys旗下PrimetimePX提供功耗分析结果,证明了芯片功耗满足设计要求。相似文献

8.

寄存器传输级低功耗设计方法 总被引：3，自引：0，他引：3

罗旻杨波高德远沈绪榜《小型微型计算机系统》2004,25(7):1207-1211

随着移动设备需求量的不断增大和芯片工作速度的不断提高，芯片的功耗已经成为电路设计者必须考虑的问题，对于芯片整体性能的评估已经由原来的面积和速度的权衡变成面积、时序、可测性和功耗的综合考虑，并且功耗所占的权重会越来越大。本文主要讲述在RTL设计中如何实现低功耗设计。相似文献

9.

基于结构级的低功耗设计方法 总被引：1，自引：0，他引：1

罗旻杨波高德远沈绪榜《小型微型计算机系统》2004,25(3):329-333

随着移动设备需求量的不断增大和芯片工作速度的不断提高，芯片的功耗已经成为电路设计者必须考虑的问题，对于芯片整体性能的评估已经由原来的面积和速度的权衡变成面积、时序、可测性和功耗的综合考虑，并且功耗所占的权重会越来越大．文中主要讲述通过不同方法在进行结构设计时如何实现低功耗设计，比如采用并行结、流水结构、优化编码风格等等。相似文献

10.

电表用主控芯片的低功耗设计研究

《微型机与应用》2017,(16)

智能电表的功能日趋复杂,电表要求能持续现场工作不少于5年,所以对主控芯片的功能和功耗要求都比较高。研究设计了一款新型电表用主控芯片,功能方面集成了RTC模块、LCD驱动及常用的通信接口模块,同时为了达到低功耗指标,设计了三种不同的工作模式满足电表不同应用场景下的低功耗需求。经过实测,在电表的各种工作模式下,本芯片都实现了非常好的功耗性能。相似文献

11.

声码器中一种四级可重构ALU的研究与设计

荆涛王沁《小型微型计算机系统》2008,29(12)

在面向语音编解码算法实现的高性能声码器设计中,支持可变长VLIW指令集的ALU单元是实现其设计目标的重要环节.本文提出一种四级可重构的ALU设计,以前缀算法加法器为核心,并通过操作数和资源的重构,能在单周期内完成81种复合算术逻辑运算,同时将其控制编码压缩了58.93%以适应指令集的宽度约束,高效实现了算法中潜在的高并行性,很好的满足了运算密集型的算法应用需求. 相似文献

12.

Design and analysis of high-speed 8-bit ALU using 18 nm FinFET technology

Shylashree N. Venkatesh B. Saurab T. M. Srinivasan Tarun Nath Vijay 《Microsystem Technologies》2019,25(6):2349-2359

All modern computational devices consist of ALU. With increase in complexity of software and the consistent shift of software towards parallelism, high speed processors with hardware support for time consuming operations such as multiplication would benefit. Smaller, compact devices such as IoT devices need to run software such as security software and be able to offload computation cost from the cloud. In this paper, a high speed 8-bit ALU using 18 nm FinFET technology is proposed. The arithmetic and logical unit consists of fast compute units such as Kogge Stone fast adder and Dadda multiplier along with basic logic gates. In this paper, an ALU with each compute unit optimized for speed is proposed, while responsibly consuming area. Dadda multiplier is of 8 × 8 architecture as opposed to conventional approach of 4 × 4 making it a true 8-bit ALU. Simulation and analysis is done using Cadence Virtuoso in Analog Design Environment. The transistor count of proposed design is 5298, the power consumption is 219 µW and maximum delay is 166.8 ps. The design is also expected to consume a maximum of one clock cycle for any computation.

相似文献

13.

Processing in memory: the Terasys massively parallel PIM array

Gokhale M. Holmes B. Iobst K. 《Computer》1995,28(4):23-31

SRC researchers have designed and fabricated a processor-in-memory (PIM) chip, a standard 4-bit memory augmented with a single-bit ALU controlling each column of memory. In principle, PIM chips can replace the memory of any processor, including a supercomputer. To validate the notion of integrating SIMD computing into conventional processors on a more modest scale, we have built a half dozen Terasys workstations, which are Sun Microsystems Sparcstation-2 workstations in which 8 megabytes of address space consist of PIM memory holding 32K single-bit ALUs. We have designed and implemented a high-level parallel language, called data parallel bit C (dbC), for Terasys and demonstrated that dbC applications using the PIM memory as a SIMD array run at the speed of multiple Cray-YMP processors. Thus, we can deliver supercomputer performance for a small fraction of supercomputer cost. Since the successful creation of the Terasys research prototype, we have begun work on processing in memory in a supercomputer setting. In a collaborative research project, we are working with Cray Computer to incorporate a new Cray-designed implementation of the PIM chips into two octants of Cray-3 memory 相似文献

14.

ABC95阵列机的FPGA实现及其优化

佟冬黎冬梅周永林方滨兴《计算机工程》1999,25(12):43-45

ＡＢＣ９５阵列机是１６个节点组成的ＳＩＭＤ并行在用ＦＰＧＡ设计实现此机器时的主要问题是ＦＰＧＡ利用率太低。介绍几种优化手段,将ＡＬＵ、乘法器和译码器都用ＦＰＧＡ实现。这样减少了系统各模块之间的连线数,达到了提高ＦＰＧＡ利用率的目的。相似文献

15.

Implementation of a Sliding Memory Plane Image Processor

Soohwan Ong Myung H. Sunwoo 《Journal of Parallel and Distributed Computing》1998,55(2):995

This paper presents architectures and implementation of a Sliding Memory Plane (SliM) Image Processor to build a SIMD parallel computer. The paper also proposes an enhanced multiplication algorithm to reduce the gate count and the number of cycles. The SliM chip consists of mesh-connected 5×5 PEs. Due to the idea ofsliding, that is, overlapping the inter-PE communication time with the computation time, SliM can greatly reduce the inter-PE communication overhead. In addition, four operations corresponding to ALU, shift, data I/O, and inter-PE communication can be grouped into an instruction to be executed in a cycle simultaneously. The implemented SliM chip operates at 25 MHz and gives 625 MIPS. Because of a mesh topology, a large number of chips can be easily connected to form a SIMD parallel computer. We have implemented the scalable SliM Array Processor and developed parallel algorithms for real-time image processing. 相似文献

16.

微控制器中ALU与移位逻辑的设计与改进 总被引：2，自引：0，他引：2

下载免费PDF全文

黄海林钱刚张盛兵《计算机工程与科学》2004,26(1):95-98

文章结合8位微控制器IP软核的设计，分析了指令系统的功能与特点，在算法级上对其处理器中数据路径进行了合理的调整与优化，并提出一种将ALU与移位逻辑并行设计的方法。较之于传统的串行设计方法而言，这种并行设计方法不仅描述简单，而且综合得到的电路降低了功耗，具有更快的运算速度，同时并不增加资源消耗。相似文献

17.

一种低功耗八位MCU的设计与实现 总被引：3，自引：0，他引：3

张旭李斌桥李树荣赵毅强周建国姚素英《微处理机》2003,(4):7-9

介绍了一个低功耗八位微控制器的结构设计，选择了适当的微控制器的体系结构和指令流水线，简化了电路结构，大大减少了执行每条指令所需要的时钟数。另外，通过对算术逻辑单元进行优化设计，节省了系统的资源，减小了电路的寄生电容，从而达到了降低功耗的设计目标。相似文献

18.

Improved GPU SIMD control flow efficiency via hybrid warp size mechanism

《Microprocessors and Microsystems》2014,38(7):717-729

High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, one warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this paper, the contemporary fixed-size warp design is abandoned for a hybrid warp size (HWS) mechanism. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. The paper also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism used to improve SIMD utilization by forming new warps out of split warps in real time. The simulation results show that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform with an estimated area increase of about 1% of DWF. 相似文献