首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A two-stage scan architecture is proposed to constrain transition propagation within a small part of scan flip-flops. Most scan flip-flops are deactivated during test application. The first stage includes multiple scan chains, where each scan chain is driven by a primary input. Each scan flip-flop in the multiple scan chains drives a group of scan flip-flops in the second stage. Scan flip-flops in different stages use separate clock signals. Test signals assigned to scan flip-flops in the multiple scan chains are applied to the scan flip-flops of the second stage in one clock cycle after the test vector has been applied to the multiple scan chains. There exists no transition at the scan flip-flops in the second stage when a test vector is applied to the multiple scan chains  相似文献   

2.
Power consumption during scan testing operations can be significantly higher than that expected in the normal functional mode of operation in the field. This may affect the reliability of the circuit under test (CUT) and/or invalidate the testing process increasing yield loss. In this paper, a scan chain partitioning technique and a scan hold mechanism are combined for low power scan operation. Substantial power reductions can be achieved, without any impact on the test application time or the fault coverage and without the need to use scan cell reordering or clock and data gating techniques. Furthermore, the proposed design solution for scan power alleviation, permits the efficient exploitation of X-filling techniques for capture power reduction or the use of extreme (power independent) compression techniques for test data volume reduction.  相似文献   

3.
用于SRAM的低功耗位线结构   总被引:1,自引:0,他引:1  
高宁  施亮  侯卫华  于宗光 《半导体技术》2006,31(12):935-937,950
提出了一种用于SRAM的低功耗位线结构,通过两种途径来实现低位线电压.在写操作时,利用单边驱动结构来抑制位线上充电电压的过大摆动;在读写操作时,改进预充结构来使位线电压保持较低.仿真表明,该结构使功耗大大节省.  相似文献   

4.
Most of the coarse-grained reconfigurable architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead (2.16%).   相似文献   

5.
Segmented Virtual Ground Architecture for Low-Power Embedded SRAM   总被引:1,自引:0,他引:1  
A new scheme to reduce the power consumption of static random access memories is presented. It is shown that using segmented virtual grounding (SVGND), it is possible to reduce both dynamic and static power consumption. The leakage power of the cells is reduced by reducing the voltage drop over a cell. The dynamic power dissipation is also reduced by eliminating the power consumption due to the discharge of the nondesired neighboring bitlines. The effectiveness of this scheme is compared to recently reported low-power schemes. It is shown that unlike those schemes, SVGND can accommodate multiple words in one row; a significant improvement in soft error rate tolerance  相似文献   

6.
We introduce a cross-layer customization methodology where application knowledge regarding data sharing in producer-consumer relationships is used in order to aggressively eliminate unnecessary and predictable snoop-induced cache lookups even for references to shared data, thus, achieving significant power reductions with minimal hardware cost. The technique exploits application-specific information regarding the exact producer-consumer relationships between tasks as well as information regarding the precise timing of synchronized accesses to shared memory buffers by their corresponding producers and/or consumers. Snoop-induced cache lookups for accesses to the shared data are eliminated when it is ensured that such lookups will not result in extra knowledge regarding the cache state in respect to the other caches and the memory. Our experiments show average power reductions of more than 80% compared to a general-purpose snoop protocol.   相似文献   

7.
In this paper, we propose a compact threshold-based resampling algorithm and architecture for efficient hardware implementation of particle filters (PFs). By using a simple threshold-based scheme, this resampling algorithm can reduce the complexity of hardware implementation and power consumption. Simulation results indicate that this algorithm has approximately equal performance with the traditional systematic resampling (SR) algorithm when the root-mean-square error (RMSE) and lost track are considered. Experimental comparison of the proposed hardware architecture with those based on the SR and the residual systematic resampling (RSR) algorithms was conducted on a Xilinx Virtex-II Pro field programmable gate array (FPGA) platform in the bearings-only tracking context, and the results establish the superiority of the proposed architecture in terms of high memory efficiency, low power consumption, and low latency.  相似文献   

8.
This paper describes a low-power programmable DSP architecture that targets audio signal processing. The architecture can be characterized as a heterogeneous multiprocessor consisting of small instruction set processors called mini-cores as well as standard DSP and CPU cores that communicate using message passing. The mini-cores are tailored for different classes of filtering algorithms (FIR, IIR, N-LMS etc.), and in a typical system the communication among processors occur at the sampling rate only.The mini-cores are intended as soft-macros to be used in the implementation of system-on-chip solutions using a synthesis-based design flow targeting a standard-cell implementation. They are parameterized in word-size, memory-size, etc. and can be instantiated according to the needs of the application. To give an impression of the size of a mini-core we mention that one of the FIR mini-cores in a prototype design has 16 instructions, a 32-word × 16-bit program memory, a 64-word × 16-bit data memory and a 25-word × 16-bit coefficient memory.Results obtained from the design of a prototype chip containing mini-cores for a hearing aid application, demonstrate a power consumption that is only 1.5–1.6 times larger than a hardwired ASIC and more than 6–21 times lower than current state of the art low-power DSP processors. This is due to: (1) the small size of the processors and (2) a smaller instruction count for a given task.  相似文献   

9.
We propose an efficient method to select a minimal set of testable paths in scan designs, such that every line in the circuit is covered by at least one of the longest testable paths that contain it (if there are any). The proposed path selection approach is based on a stepwise path expansion procedure that uses delay information and compact information about untestable paths to select longest paths while avoiding untestable paths. Techniques called delay analysis and delay-constrained path expansion are used to speedup the selection of paths to test. Compared to earlier approaches, the proposed approach is fast and it is guaranteed to find testable paths. Additionally the procedure also derives tests for the selected paths. Experimental results for ISCAS89 benchmark circuits using standard scan and broadside testing are presented to demonstrate the effectiveness of the proposed method.  相似文献   

10.
Fault diagnosis of full-scan designs has been progressed significantly. However, most existing techniques are aimed at a logic block with a single fault. Strategies on top of these block-level techniques are needed in order to successfully diagnose a large chip with multiple faults. In this paper, we present such a strategy. Our strategy is effective in identifying more than one fault accurately. It proceeds in two phases. In the first phase we concentrate on the identification of the so-called structurally independent faults based on a concept referred to as word-level prime candidate, while in the second phase we further trace the locations of the more elusive structural dependent faults. Experimental results show that this strategy is able to find 3 to 4 faults within 10 signal inspections for three real-life designs randomly injected with 5 node-type or stuck-at faults. Part of this work has ever appeared in the proceedings of Asian Test Symposium in 2003. Yu-Chiun Lin received his BS degrees in Electrical Engineering from National Central University in 2000, and MS degree from Electrical Engineering of National Tsing Hua University in 2002. Since then, he has been with Ali Corporation as a design engineer. His current interests include the design of USB controllers and imaging periperals. Shi-Yu Huang received his BS, MS degrees in Electrical Engineering from National Taiwan University in 1988, 1992 and Ph.D. degree in Electrical and Computer Engineering from the University of California, Santa Barbara in 1997, respectively. From 1997 to 1998 he was a software engineer at National Semiconductor Corp., Santa Clara, investigating the System-On-Chip design methodology. From 1998 to 1999, he was with Worldwide Semiconductor Manufacturing Corp., designing the high-speed Built-In Self-Test circuits for memories. He joined the faculty of National Tsing-Hua University, Taiwan, in 1999, where he is currently an Associate Professor. Dr. Huang’s research interests include CMOS image sensor design, low-power memory design, power estimation, and fault diagnosis methodologies.  相似文献   

11.
邵晶波  马光胜  冯刚 《微电子学》2007,37(4):494-498,503
提出了一种基于展开宽度可调的解压缩技术和X-压缩的多扫描电路的测试压缩方法。采用可变宽度的扫描链解压缩方法,对测试输入进行解压缩,且对于测试响应,结合了X-压缩的优点,测试响应整合器最小化故障被屏蔽的概率,扫描链的结构采取广播扫描模式。在此基础上对其改进,使其可同时处理取值相反的触发器。两种工作模式(串行模式和并行模式)可进一步处理剩余的紧凑的触发器值。提出的测试压缩算法的优点是:可节省测试设备的存储需求,减少测试输入输出引脚数和测试通道数,降低测试应用时间,从而全面提高测试激励数据和测试响应数据的压缩率。实验结果证明了该算法与以往算法相比较的优势。  相似文献   

12.
在一种基于望远镜搜索的块匹配运动估值的VLSI实现中,对用于加速搜索的传统心动阵列引擎进行了结构上的改进,从而能够显著地降低功耗.方法是使用一种新的块匹配误差计算的提早跳出技术,并通过在阵列处理单元中屏蔽操作数来避免不必要的计算操作.基于算法模拟结果的简单估计表明:使用新结构搜索引擎的运动估值,功耗可降低到原来的40%左右,而仍然保持着相同的处理速度和相似的视频解码图质量.  相似文献   

13.
基于望远镜搜索的块匹配运动估值的低功耗VLSI结构   总被引:1,自引:0,他引:1  
在一种基于望远镜搜索的块匹配运动估值的 VL SI实现中 ,对用于加速搜索的传统心动阵列引擎进行了结构上的改进 ,从而能够显著地降低功耗 .方法是使用一种新的块匹配误差计算的提早跳出技术 ,并通过在阵列处理单元中屏蔽操作数来避免不必要的计算操作 .基于算法模拟结果的简单估计表明 :使用新结构搜索引擎的运动估值 ,功耗可降低到原来的 40 %左右 ,而仍然保持着相同的处理速度和相似的视频解码图质量 .  相似文献   

14.
As technology scales, the shrinking wire width increases the interconnect resistivity, while the decreasing interconnect spacing significantly increases the coupling capacitance. This paper proposes reducing the number of bus lines of the conventional parallel-line bus (PLB) architecture by multiplexing each m-bits onto a single line. This bus architecture, the serial-link bus (SLB), transforms an n-bit conventional PLB into an n/m-line (serial link) bus. The advantage of SLBs is that they have fewer lines, and if the bus width is kept the same, SLBs will have a larger line pitch. Increasing the line width has a twofold reduction effect on the line resistance; as the resistivity of sub-100 nm wires drops significantly, the line width increases. Also, increasing the line width and spacing reduces the coupling capacitance between adjacent lines, but increases the line-to-ground capacitance. Thus, an optimum degree of multiplexing m opt and an optimum width to pitch ratio etaopt exist, which minimizes the bus energy dissipation and maximizes the bus throughput per unit area. The optimum degree of multiplexing and optimum width-to-pitch ratio for maximum throughput per unit area and minimum energy dissipation for the 25-130-nm technologies was determined in this paper. Also, an encoding technique was proposed and implemented to reduce the switch activity penalty due to serialization. HSPICE simulations show that for the same throughput per unit area as conventional parallel-line data buses, the SLB architecture reduces the energy dissipation by up to 31% for a 64-bit bus implemented in an intermediate metal layer of a 50-nm technology, and a reduction of 53% is projected for a 25-nm technology.  相似文献   

15.
A low-power, large-scale parallel video compression architecture for a single-chip digital CMOS camera is discussed in this paper. This architecture is designed for highly computationally intensive image and video processing tasks necessary to support video compression. Two designs of this architecture, an MPEG2 encoder and a DV encoder, are presented. At an image resolution of 640 × 480 pixels (MPEG2) and 720 × 576 (DV) and a frame rate of 25 to 30 frames per second, a computational throughput of up to 1.8 billion operations per second (BOPS) is required. This is supported in the proposed architecture using a 40 MHz clock and an array of 40 to 45 parallel processors implemented in a 0.2 m CMOS technology and with a 1.5 V supply voltage. Power consumption is significantly reduced through the single-chip integration of the CMOS photo sensors, the embedded DRAM technology, and the proposed pipelined parallel processors. The parallel processors consume approximately 45 mW of power resulting a power efficiency of 40 BOPS/W.  相似文献   

16.
The discrete wavelet transform (DWT) is an upcoming compression technique that has been selected for MPEG-4 and JEPG 2000, because it has no blocking effects and it efficiently determines the frequency property of the temporary signals. In this paper, we propose a low-complexity, low-power bit-serial DWT architecture, employing a two-channel lattice-based quadrature mirror filter (QMF). The filter consists of four lattices (filter length = 8), and we determine the quantization bit for the coefficients using a fixed-length peak signal-to-noise ratio analysis and propose the architecture of the bit-serial multiplier with a fixed coefficient. The canonical signed digit encoding for the coefficients is applied to minimize the number of nonzero bits, thus reducing the hardware complexity. The proposed folded one-dimensional DWT architecture processes the other resolution levels during idle periods by decimations, and it provides efficient scheduling. The proposed architecture requires only flip-flops and full adders. This architecture has been designed and verified by the Verilog HDL and synthesized using the Synopsys Design Compiler with the DongbuAnam 0.18 μm Standard Cell Library. The maximum throughput is 393 Mbps at 450 MHz with a latency of 16 clocks, and the gate count is about 5K in equivalent two-input NAND gates. The dynamic power is 7.02 mW at 1.8 V. The data scheduling using a data dependency graph, and the performance, power, and required hardware cost are discussed.  相似文献   

17.
This paper presents a new detailed analysis of low-voltage differential signaling (LVDS) output buffers that are intended for use in high-speed integrated circuits. Three theoretically possible architectures of a LVDS output driver are discussed in rigorous detail, resulting in the recognition of the most power-conserving circuit configuration. An innovative realization of this identified low-power architecture is presented in this paper along with computer simulation results and test lab measurement data. The novel LVDS driver is designed using a unique hetero-junction bipolar transistor structure. Computer simulation results show total current consumption of 6.3 mA for the bipolar driver at a 1-GHz clock frequency while operating from a positive supply voltage between 1.7 and 3.3 V, as well as demonstrate full stage compliance with all the requirements of the IEEE 1596.3–1996 standard. The presented version of the buffer was utilized in a multiplexer/demultiplexer chip set that was fabricated in a modern 50-GHz-$f _T$SiGe technology. Test results of the LVDS output buffer taken from five different chip samples reveal high-quality output eyes with more than 0.99 UI opening and close matching between the measured parameters and simulation results.  相似文献   

18.
In this paper, we describe a novel self-timed scan chain design approach to mitigate hold time and power supply noise problems during scan testing, and to simultaneously allow no delay penalty due to the front-end multiplexer in a multiplexer-D flip-flop (mux-DFF) scan cell. Hold time problems due to clock skew and static and dynamic power supply noise (i.e. IR drop and LdI/dt noise) due to simultaneous switching are two problems associated with shift operations during scan testing using ATPG patterns. These problems are particularly serious with mux-DFF style scan, and are either nonexistent or negligible with level-sensitive scan design (LSSD). This paper deals with a circuit technique to mitigate hold time, power supply noise and front-end delay penalty seen with mux-DFF and achieve a middle ground on clock routing overhead between LSSD and mux-DFF scan styles.  相似文献   

19.
Traditional approaches of automatic gain control (AGC) involve estimating the average power or the peak amplitude over an extended time period, which results in high hardware complexity and a long processing time. Moreover, the accuracy of traditional approaches is seriously degraded by noise and intersymbol interference. In this paper, we propose a joint AGC and equalization (Joint AGC-EQ) scheme, in which the AGC circuitry comprises only one-tenth of the area of a traditional AGC. In addition, the total convergence time of the proposed Joint AGC-EQ is only half that of traditional blind equalization. The scheme is already silicon proven for the application of a Fast Ethernet transceiver using Faraday/UMC 0.18-mum cell libraries  相似文献   

20.
In this article, an improved Distributed Arithmetic (DA) architecture is proposed, in which the high power consumed by adder units is relocated in the system to reduce the switching activity and total power needed. We used the concept of Time Domain Activity Duration Function (ADF) in architectural-level modification of target units at dynamic operating conditions. The proposed DA exploits the circuit activity, and the adder units are used in minimum states. The proposed DA is a run-time reconfigurable and lets system change the coefficients of FIR filter dynamically. The design was simulated, and the results were verified via two-phase power calculation method. The power calculations are based on forward synthesis invariant points and backward synthesis oriented activity approach. This method was applied to calculate the power and area of the proposed DA and other well-known counterparts in the literature. In the experimental results on 180 nm CMOS ASIC synthesis, the maximum clock of 100 MHz is achieved. In the 32-tap FIR filter implementation of our proposed DA and best known DA2 in serial DA structure, the switching power and internal power improvements are about 21 % and 10 %, respectively, in approximately equal speed and 5 % area increment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号