首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.

Successive-cancellation list (SCL) decoding for polar codes has the disadvantage of high latency owing to serial operations. To improve the latency, several algorithms with additional circuits have been proposed, but the area becomes larger. This paper proposes a fast multibit decision method having-high area efficiency based on the SCL decoding algorithm. First, multiple bits can be determined to reduce clock cycles using new nodes represented by the information bits and frozen bits. We propose the new nodes called the combined nodes and the other node in this paper. The combined nodes that combine redundant operations of the fast-simplified SC (fast-SSC) algorithm can increase area efficiency. The other node with bit patterns other than the node types of the fast-SSC algorithm performs an 8-bit multibit decision to reduce the number of decoding cycles. Latency is further reduced by applying a sphere decoding method to the other node. In addition, a sorter is proposed to reduce the critical path delay. As a large number of path metrics causes sorter delays, the proposed sorter can achieve high throughput with the small area. The proposed (1024, 512) SCL decoder showed negligible performance degradation in the simulation using Matlab and was synthesized using 65 nm CMOS technology. The proposed decoder achieves about 1.3Gbps with the small area. As a result, the area-throughput efficiency is at least 1.4 times higher than the state-of-the-art works over 1 Gbps.

  相似文献   

2.
In a typical clock distribution scheme, a central clock signal is distributed to several sites on the integrated circuit (IC). Local regenerators at these sites buffer the clock signal for the logic in regions close to the regenerator. Minimizing the skew between the clocks at these regeneration sites is critical. In recent times, this is becoming harder due to increasing intra-die processing variations. In this paper, we describe a novel technique to distribute a clock signal from a central location to several sites on a VLSI IC. Our technique uses a buffered H-tree and includes circuitry to dynamically remove any skew that may result due to intra-die processing variations. While existing approaches to deskewing a clock tree have utilized several phase detection circuits (number of phase detectors dependent on the number of clock regenerators), our method requires only one phase detector. Also, in our approach, the resolution of the phase detector is inconsequential unlike existing techniques. Our deskewing technique can be applied dynamically, either at boot time or periodically during the operation of the IC. Using a six-level H-tree clock distribution network with process variations deliberately included, we demonstrate that our technique can reduce skews as high as 300 ps down to just 3 ps. We compare our clock tree with traditional buffered and unbuffered H-tree networks.   相似文献   

3.
提出并实现了一种高速缓存的V-LRU RAM单周期清零技术。运行操作系统的CPU在不同任务之间切换时,需要对V-LRU RAM清零。使用传统的计数器依次清空V-LRU RAM的各行,CPU会白白浪费很多个时钟周期。在一个时钟周期对V-LRU RAM清空,可以大大提高CPU的性能。在四路组相联的高速缓存设计中,容量为16k、8k和4k字节时,使用该技术可以将以前的256、128和64个时钟周期降低到只有1个时钟周期。基于SMIC 0.13μm工艺,实现该技术的硬件电路面积为6 312.8μm2,且高速缓存的缺失率保持在非常低的水平。这种技术同样适用于对RAM需要单周期清空的场合。  相似文献   

4.
设计了一种超高速高精度时钟占空比校准电路。采用一种新的脉冲宽度校准单元,通过控制电压调整时钟上升、下降时间来实现占空比调整。同时,设计了一种时钟放大模块,降低了占空比校准单元对输入时钟幅度的要求,提高了占空比校准精度。分析了各电路模块的作用以及对整体性能的影响。采用SMIC 65 nm CMOS工艺,在1.8 V电源电压下对各模块以及整体电路进行仿真验证。仿真结果表明,该时钟占空比校准电路能对输入频率为1~4 GHz、占空比为20%~80%的时钟进行精确校准,校准后的占空比为(50±1)%,系统稳定时间为200个输入时钟周期,功耗为10 mW。  相似文献   

5.
Three circuit techniques for a 1.5 V, 512 Mb graphic DDR4 (GDDR4) SDRAM using a 90-nm DRAM process have been developed. First, a dual-clock system increases clocking accuracy and expands internal timing margins for harmonious core operation regardless of external clock frequency. Second, a four-phase data input strobe scheme helps to increase the input data valid window. Third, a fully analog delay-locked loop which provides a stable I/O clock and has 31.67 ps peak-to-peak jitter characteristics is designed. On the basis of these circuit techniques, the data rate is 3.2 Gbps/pin, which corresponds to 12.8 Gbps in times32 GDDR4-based I/O. Also, a multidivided architecture consisting of four independent 128 Mb core arrays is designed to reduce power line and output noise.  相似文献   

6.
司慧玲 《电子设计工程》2012,20(23):41-43,46
针对数字钟双面板设计较为复杂的问题,利用国内知名度最高、应用最广泛的电路辅助设计软件Proteldxp2004进行了电路板设计,本文提供了设计的一些方法和技巧,快速、准确地完成数字钟双面电路板的设计,采用双面板设计.布线面积是同样大小的单面板面积的两倍,其布线可以在两面间互相交错,所以更节省空间。  相似文献   

7.
We propose a fast hardware algorithm for division in $hbox{GF}(2^{m})$ based on the extended Euclid's algorithm. The algorithm requires only one iteration to perform the operations that correspond to the ones performed in two iterations of previously reported division algorithms. Since the algorithm performs modular reductions in parallel by changing the order of execution of the operations, a circuit based on this algorithm has almost the same critical path delay as the previously proposed ones. The circuit computes division in $m$ clock cycles, whereas the previously proposed circuits take $2m - 1$ or more clock cycles.   相似文献   

8.
Timing analysis of network on chip architectures for MP-SoC platforms   总被引:1,自引:0,他引:1  
Recently, the use of multiprocessor system-on-chip (MP-SoC) platforms has emerged as an important integrated circuit design trend for high-performance computing applications. As the number of reusable intellectual property (IP) blocks on such platforms continues to increase, many have argued that monolithic bus-based interconnect architectures will not be able to support the clock cycle requirements of these leading-edge SoCs. While hierarchical system integration using multiple smaller buses connected through repeaters or bridges offer possible solutions, such approaches tend to be ad hoc in nature, and therefore, lack generality and scalability. Instead, many different forms of network on chip (NoC) architectures have been proposed in the past few years to specifically address this problem. We believe that the NoC approach will ultimately be the preferred communication fabric for next generation designs. To support this conjecture, this paper demonstrates, through detailed circuit design and timing analysis that different proposed NoC architectures to date are guaranteed to achieve the minimum possible clock cycle times in a given CMOS technology, usually specified in normalized units as 10-15 FO4 delays. This is contrasted with the bus-based approach, which may require several design iterations to deliver the same performance when the number of IP blocks connected to the bus exceeds certain limits.  相似文献   

9.
The authors propose and evaluate the performance of a 2N times clock multiplier that controls memory components for high-speed data communications. To improve the reliability of the circuit, a symmetric circuit structure is used, while to verify circuit operation by means of a simple method, an MVU estimator is found from simulation data. The proposed circuit can provide clock rates, which are usually required in the multiple phase shift keying (MPSK) or multiple quadrature amplitude modulation (MQAM) modulation schemes, of 2 to 2N times that of the input clock  相似文献   

10.
High-performance and power-efficient CMOS comparators   总被引:1,自引:0,他引:1  
Several design techniques for high-performance and power-efficient CMOS comparators are proposed. First, the comparator is based on the priority-encoding (PE) algorithm, and the dynamic circuit technique developed specifically for the priority encoder can be applied. Second, the PE function and the subsequent logic functions are merged and efficiently realized in the multiple output domino logic (MODL) to result in a shortened logic depth. The circuit in MODL CMOS is also compact and power efficient because few transistors are needed. Third, the multilevel look-ahead technique is used to shorten the path of priority-token propagation. Finally, the circuit is realized with a latch-based two-stage pipelined structure, and the comparison function is partitioned into two parts, with each part executed in each half of the clock cycle in a delay-balanced manner. Post-layout simulation results show that a 64-b comparator designed with the proposed techniques in a 3-V 0.6-/spl mu/m CMOS technology is 16% faster, 50% smaller, and 79% more power efficient as compared with the all-n-transistor comparator, which is the fastest among the conventional comparators. Measurement results of the test chip conform with simulation results and prove the feasibility of the proposed techniques.  相似文献   

11.
The load/store pipe for a low-power 1-GHz embedded processor is described. For area savings and logic complexity reduction, the load/store pipe is clocked at twice the frequency of the processor core. It can sustain two load or store operations per core clock cycle with zero load to use issue latency. The address generation unit for one of the two load/store pipes takes advantage of the common addressing mode in MIPS 64 ISA to generate the address within a core clock phase. Phase borrowing is employed in the translation lookaside buffer (TLB) design to enable a lookup process within a core clock phase. The data cache design enables the activation of a minimum number of data bank arrays for power savings. Small-swing differential buses are used for multiple address and data buses for improved signal transmission latency. The quadrature clocks used to derive the 2/spl times/ clock are generated with a novel 4-to-1 divider and distributed with matched paths, all to reduce the duty cycle variation of the 2/spl times/ clock phase. The design has been implemented in a 0.13-/spl mu/m CMOS process.  相似文献   

12.
周佳筠  沈海斌 《微电子学》2006,36(4):506-509
在一些复杂的SoC中,往往要使用嵌入式存储器,而双边访问的嵌入式存储器(DARAM)常用于许多低功耗的场合。这样,用时钟的双边沿来控制存储器的读写数据是不可避免的。这种时钟用作数据(clock as data)的情况通常会在SoC设计的逻辑物理综合阶段产生很多时序收敛的棘手问题,时钟隔离电路恰好能解决这个问题。实践证明,这种改进的时钟电路结构大大减少了设计的时序收敛时间和设计流程的复杂度。  相似文献   

13.
Energy-Efficient GHz-Class Charge-Recovery Logic   总被引:3,自引:0,他引:3  
In this paper, we present Boost Logic, a charge- recovery circuit family that can operate efficiently at clock frequencies in excess of 1 GHz. To achieve high energy efficiency, Boost Logic relies on a combination of aggressive voltage scaling, gate overdrive, and charge-recovery techniques. In post-layout simulations of 16-bit multipliers with a 0.13-mum CMOS process at 1GHz, a Boost Logic implementation achieves 5 times higher energy efficiency than its minimum-energy pipelined, voltage-scaled, static CMOS counterpart at the expense of 3 times longer latency. In a fully integrated test chip implemented using a 0.13-mum bulk silicon process and on-chip inductors, chains of Boost Logic gates operate at clock frequencies up to 1.3 GHz with a 1.5-V supply. When resonating at 850 MHz with a 1.2-V supply, the Boost Logic test chip achieves 60% charge-recovery  相似文献   

14.
The Pentium/spl reg/ 4 processor architecture uses a 2/spl times/ frequency core clock to implement low latency integer operations. Low-voltage-swing (LVS) logic circuits implemented in 90-nm technology meet the frequency demands of a third-generation integer-core design.  相似文献   

15.
Many methodologies for clock mesh networks have been introduced for two‐dimensional integrated circuit clock distribution networks, such as methods to reduce the total wirelength for power consumption and to reduce the clock skew variation through consideration of buffer placement and sizing. In this paper, we present a methodology for clock mesh to reduce both the clock skew and the total wirelength in three‐dimensional integrated circuits. To reduce the total wirelength, we construct a smaller mesh size on a die where the clock source is not directly connected. We also insert through‐silicon vias (TSVs) to distribute the clock signal using an effective clock TSV insertion algorithm, which can reduce the total wirelength on each die. The results of our proposed methods show that the total wirelength was reduced by 12.2%, the clock skew by 16.11%, and the clock skew variation by 11.74%, on average. These advantages are possible through increasing the buffer area by 2.49% on the benchmark circuits.  相似文献   

16.
Conventional two-step algorithm, long latency of interpolation and various motion vectors are three factors that mainly induce high computation complexity of fractional motion estimation and also prevent it from encoding high-definition video. In order to overcome these obstacles, a high performance fractional motion engine is proposed in this paper with three techniques. First, based on high correlation between motion vector of a block and its up-layer as well as relationship of integer candidates, one-step algorithm is proposed. Second, an 8×4 element block processing is adopted, which not only eliminates almost redundancies in interpolation, but also still ensures hardware reusability. Finally, a scheme of processing 4×4 and 4×8 block with free of cycles is presented, so that the number of motion vectors can be reduced up to 59%. Experimental results show that the proposed design just needs 50% of gate count and 56% of cycles when compared with previous design while nearly maintaining the coding performance.  相似文献   

17.
A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented. The architecture supports correct operation in applications where multiple clock cycles of latency exist between the data producer, FIFO, and the data consumer; and with arbitrary clock frequency changes, halting, and restarting in either or both clock domains. The architecture is demonstrated in both a 0.18- mum CMOS full-custom design and a 0.18-mum CMOS standard cell design used in a globally asynchronous locally synchronous array processor. It achieves 580-MHz operation and 10.3-mW power dissipation while performing simultaneous FIFO read and write operations at 1.8 V.  相似文献   

18.
The constant-ratio-coupled multi-grain digital synchronizer (CRC-MGsynchronizer) is proposed as a means for making high-speed connections with very low power consumption, both among multiple chips such as processors, controllers, and storage devices, and among on-chip modules. The synchronizer not only provides a wide range of operating frequencies, but is fast locking and only occupies a small area on chip. Therefore, it contributes to large reductions in power consumption and costs. It is suitable for use in various low-power systems (e.g., battery-hungry mobile appliances and low-cost consumer electronic products). Three major techniques were applied to the design: 1)a multi-grain structure for the delay elements, which greatly reduces the number of gates while facilitating locking in a very small number of clock cycles;2) constant-ratio-coupled (CRC) delay lines (measurement versus generation)for flexible selection of the input-output delay; and 3) a new lock stage decision circuit (LSDC) scheme, conferring excellent testability. Moreover,the architecture is all-digital, and thus it has high process portability. By applying these techniques to a DDR memory interface circuit for a mobile application processor fabricated in 130-nm technology, we were able to reduce power consumption by 42% and chip area by 65% compared with a conventional implementation. Furthermore, the novel design spans a frequency range covering 12 times the minimum frequency.  相似文献   

19.
This paper reviews research and development in NTT Laboratories on IC's faster than 10 Gb/s for future optical communication systems. Novel design and circuit techniques achieve such high-speed IC's and stable operation even in packages and modules. High-bit-rate operation of 10 Gb/s (10-GHz equalizing amplifier circuit, a 10-GHz clock recovery circuit, 10-Gb/s decision circuits, and 10-Gb/s multiplexers and demultiplexers) is obtained. 20-Gb/s operation is also achieved for some IC's. Future improvements using advanced device and circuit technologies are discussed, and bit rates over 40 Gb/s are predicted  相似文献   

20.
A nonfeedback CMOS digital-clock-generator, direct-skew-detect synchronous-mirror-delay (direct SMD) circuit has been developed that achieves clock-skew suppression in only two clock cycles for application-specific integrated circuits having unfixed and various clock paths. The direct SMD circuit detects both clock skew and clock cycle by using a direct-skew detector and clock-suspension circuitry. The skew-detection scheme removes the phase errors caused by delay in the clock-driver circuit. Measurements demonstrated that the direct SMD circuit eliminates various amounts of clock skew (2.0-3.0 ns) at 200 MHz in two clock cycles  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号