共查询到20条相似文献,搜索用时 15 毫秒
1.
Leung O.Y.-H. Chi-Ying Tsui Cheng R.S.-K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):34-41
Turbo-code becomes popular for the next generation wireless communication systems because of its remarkable coding performance. One of the problems for decoding turbo-code in the receiver is the complexity and the high power consumption since multiple iterations of Soft Output Viterbi Algorithm (SOVA) or Maximum a posteriori (MAP) decoding have to be carried out to decode a data frame. To reduce the complexity of the turbo-code decoder, adaptive iteration based on cyclic redundancy checking (CRC) and output convergence approaches has been proposed to reduce the average number of iterations required for decoding a data frame. This results in a system that has variable workload since the amount of computation required for decoding each data frame is different. In this work, we propose a dynamic voltage scaling approach to further reduce the power consumption. Different from other variable workload systems, the workload here is not known at the time when the data is being decoded. Thus, optimum voltage assignment is not feasible. We propose several heuristic algorithms to assign supply voltage for different decoding iterations. Simulation results show that significant reduction of power consumption is achieved comparing with the system using fixed supply voltage 相似文献
2.
In this paper, we propose a low complexity decoder architecture for low-density parity-check (LDPC) codes using a variable quantization scheme as well as an efficient highly-parallel decoding scheme. In the sum-product algorithm for decoding LDPC codes, the finite precision implementations have an important tradeoff between decoding performance and hardware complexity caused by two dominant area-consuming factors: one is the memory for updated messages storage and the other is the look-up table (LUT) for implementation of the nonlinear function Ψ(x). The proposed variable quantization schemes offer a large reduction in the hardware complexities for LUT and memory. Also, an efficient highly-parallel decoder architecture for quasi-cyclic (QC) LDPC codes can be implemented with the reduced hardware complexity by using the partially block overlapped decoding scheme and the minimized power consumption by reducing the total number of memory accesses for updated messages. For (3, 6) QC LDPC codes, our proposed schemes in implementing the highly-parallel decoder architecture offer a great reduction of implementation area by 33% for memory area and approximately by 28% for the check node unit and variable node unit computation units without significant performance degradation. Also, the memory accesses are reduced by 20%. 相似文献
3.
In this paper, we propose hardware architecture for a high‐speed context‐adaptive variable length coding (CAVLC) decoder in H.264. In the CAVLC decoder, the codeword length of the current decoding block is used to determine the next input bitstreams (valid bits). Since the computation of valid bits increases the total processing time of CAVLC, we propose two techniques to reduce processing time: one is to reduce the number of decoding steps by introducing a lookup table, and the other is to reduce cycles for calculating the valid bits. The proposed CAVLC decoder can decode 1920×1088 30 fps video in real time at a 30.8 MHz clock. 相似文献
4.
Seong‐Min Kim Ju‐Hyun Park Seong‐Mo Park Bon‐Tae Koo Kyoung‐Seon Shin Ki‐Bum Suh Ig‐Kyun Kim Nak‐Woong Eum Kyung‐Soo Kim 《ETRI Journal》2003,25(6):489-502
This paper presents an MPEG‐4 video codec, called MoVa, for video coding applications that adopts 3G‐324M. We designed MoVa to be optimal by embedding a cost‐effective ARM7TDMI core and partitioning it into hardwired blocks and firmware blocks to provide a reasonable tradeoff between computational requirements, power consumption, and programmability. Typical hardwired blocks are motion estimation and motion compensation, discrete cosine transform and quantization, and variable length coding and decoding, while intra refresh, rate control, error resilience, error concealment, etc. are implemented by software. MoVa has a pipeline structure and its operation is performed in four stages at encoding and in three stages at decoding. It meets the requirements of MPEG‐4 SP@L2 and can perform either 30 frames/s (fps) of QCIF or SQCIF, or 7.5 fps (in codec mode) to 15 fps (in encode/decode mode) of CIF at a maximum clock rate of 27 MHz for 128 kbps or 144 kbps. MoVa can be applied to many video systems requiring a high bit rate and various video formats, such as videophone, videoconferencing, surveillance, news, and entertainment. 相似文献
5.
主要针对当前H.264/AVC中CAVLC中的标准解码方法 TLSS查表时存在查表时间长的问题,提出了一种全新的基于哈希表快速查询的CAVLC解码查表优化方法。在CAVLC解码查表中引入哈希表查找技术,提高了CAVLC解码查表速度,降低了CAVLC解码中不规则可变长码表(UVLCT)的码字获取时间,从而减少CAVLC解码查表时间。实验仿真结果表明,在没有丝毫降低视频解码质量前提下,相比于标准TLSS方法,提出的新算法可以提高约18%~22%的表查找时间。 相似文献
6.
光通信系统中低密度奇偶校验(Low-density Parity-check,LDPC)码采用对数似然比置信传播(Log-likelihood Ratio Belief Propagation,LLR-BP)算法进行译码时,在高信噪比区域迭代译码过程中会出现变量节点外部信息振荡不收敛而导致译码纠错性能的降低.为满足光通信系统的要求,提出了一种削弱外部消息振荡的改进LLR-BP译码算法.该算法通过引入加权系数平衡前后两次迭代之间变量节点传递的外部信息,明显减缓了外部信息的振荡现象.仿真结果表明:与传统LLR-BP译码算法相比,该改进LLR-BP算法具有更佳的误码性能,同时降低变量节点外部信息振荡现象并加快了译码的收敛速度. 相似文献
7.
Liu T.-M. Lin T.-A. Wang S.-Z. Lee W.-P. Yang J.-Y. Hou K.-C. Lee C.-Y. 《Solid-State Circuits, IEEE Journal of》2007,42(1):161-169
A low-power dual-standard video decoder has been developed for mobile applications. It supports MPEG-2 SP@ML and H.264/AVC BL@L4 video decoding in a single chip and features a scalable architecture to reach area/power efficiency. This chip integrates diverse algorithms of MPEG-2 and H.264/AVC to reduce silicon area. Three low-power techniques are proposed. First, a domain-pipelined scalability (DPS) technique is used to optimize the pipelined structure according to the number of processing cycles. Second, bandwidth scalability is implemented via a line-pixel-lookahead (LPL) scheme to improve the external bandwidth and reduce the internal memory size, leading to 51% of memory power reduction compared to a conventional design. Third, low-power motion compensation and deblocking filter are designed to reduce the operating frequency without degrading system performance. A test chip is fabricated in a 0.18mum one-poly six-metal CMOS technology with an area of 15.21 mm2. For mobile applications, H.264/AVC and MPEG-2 video decoding of quarter-common intermediate format (QCIF) sequences at 15 frames per second are achieved at 1.15 MHz clock frequency with power dissipation of 125 muW and 108 muW, respectively, at 1V supply voltage 相似文献
8.
9.
Edgar Holmann Toyohiko Yoshida Akira Yamada Shinichi Uramoto 《The Journal of VLSI Signal Processing》1998,18(2):155-165
A single chip system for real–time MPEG–2 decoding can be created by integrating a general purpose dual–issue RISC processor, with a small dedicated hardware for the variable length decoding (VLD) and block loading processes; a 32KB instruction RAM; and a 32KB data RAM. The VLD hardware performs Huffman decoding on the input data. The block loader performs the half–sample prediction for motion compensation and acts as a direct memory access (DMA) controller for the RISC processor by transferring data between an external 2MB DRAM and the internall 32 KB data RAM. The dual-issue RISC processor, running at 250MHz, is enhanced with a set of key sub-word and multimedia instructions for a sustained peak performance of 1000 MOPS. With this setup for MPEG-2 decoding applications, bi-directionally predicted non-intra video blocks are decoded in less than 800 cycles, leading to a single-chip, real-time MPEG-2 decoding system. 相似文献
10.
Kubosawa H. Takahashi H. Ando S. Asada Y. Asato A. Suga A. Kimura M. Higaki N. Miyake H. Sato T. Anbutsu H. Tsuda T. Yoshimura T. Amano I. Kai M. Mitarai S. 《Solid-State Circuits, IEEE Journal of》1998,33(11):1640-1648
We have designed a microprocessor that is based on a single instruction multiple data stream (SIMD) architecture. It features a two-way superscalar architecture for multimedia embedded systems that need to support especially MPEG2 video decoding/encoding and 3DCG image processing. This microprocessor meets all requirements of embedded systems, including (a) MPEG2 (MP@ML) decoding and graphic processing capabilities for three-dimensional images, (b) programming flexibility, and (c) low power consumption and low manufacturing cost. High performance was achieved by enhanced parallel processing capabilities while adopting a SIMD architecture and a two-way superscalar architecture. Programming flexibility was increased by providing 170 dedicated multimedia instructions. Low power consumption was achieved by utilizing advanced process technology and power-saving circuits. The processor supports a general-purpose RISC instruction set. This feature is important, as the processor will have to work as a controller of various target systems. The processor has been fabricated by 0.21-μm CMOS four-metal technology on a 9.84×10.12 mm die. It performs 2.16 GOPS/720 MFLOPS at an operating frequency of 180 MHz, with a power consumption of 1.2 W and a power supply of 1.8 V 相似文献
11.
《Solid-State Circuits, IEEE Journal of》2009,44(11):2943-2956
12.
Optimization of EDGE terminal power amplifiers using memoryless digital predistortion 总被引:1,自引:0,他引:1
This paper describes a lookup-table (LUT)-based digital predistortion system usable for enhanced data for global system for mobile evolution (EDGE) handset transmitters. The system is memoryless and capable of improving average efficiency and performance in terms of the leakage power at offset frequencies and error vector magnitude. The obtainable efficiency at maximum linear output power is comparable, but at backoffs superior to commercial EDGE power amplifiers (PAs). Minimum system requirements on word length and LUT size have been investigated, which shows that a LUT having approximately 500 coefficients and a system word length of 13 bits are sufficient for EDGE. The proposed system is simple compared to basestation implementations comprising PA memory compensation and can be easily implemented in handsets in order to improve the overall system performance. The effects of antenna mismatch on system performance have been investigated 相似文献
13.
The current forward error correction (FEC) scheme for very high bit-rate digital subscriber line (VDSL) systems in the ANSI
standard employs a 16-state four-dimensional (4D) Wei code as the inner code and the Reed-Solomon (RS) code as the outer code.
The major drawback of this scheme is that further improvement cannot be achieved without a substantial increase in the complexity
and power penalty. Also, a VDSL system employing the 4D Wei-RS scheme operates far below the channel capacity. In 1993, powerful
turbo codes were introduced whose performance closely approaches the Shannon limit. In this paper, we propose a bandwidth
and power efficient turbo coding scheme for VDSL modems in order to obtain high data rates, extended loop reach and increased
transmission robustness. We also propose a pipelined decoding scheme to reduce the latency at the receiver end. The objective
of the proposed scheme is to provide a higher coding gain than that given by the 4D Wei-RS scheme, resulting in an improved
performance of the VDSL modems in terms of bit rate, loop length and transmitting power. The scheme is investigated for various
values of transmitting power, signaling frequencies and numbers of crosstalkers for a targeted bit error rate of 10−5 and is implemented in a system with a quadrature amplitude modulation in which a mixed set partitioning mapping is employed
to reduce the decoding complexity. The effects of code complexity, interleaver length, the number of decoding iterations and
the level of modulation on the performance of VDSL modems are explored. Simulation results are presented and compared to those
of the 4D Wei-RS scheme. The results show that the choice of turbo codes not only provides a significant coding gain over
the standard FEC scheme but also efficiently maximizes the loop length and bit rate at a very low transmitting power in the
presence of dominant far-end crosstalk and intersymbol interference. In order to compare the hardware complexity, we synthesize
the proposed and 4D Wei-RS schemes using SYNOPSYS with the target technology of Xilinx 4020e-3. The Xilinx field programmable
gate array statistics of the proposed scheme is compared with that of the 4D Wei-RS scheme. 相似文献
14.
依据SCG-LDPC码的结构特点提出了一种高效的分层可靠置信传播(HRBP)译码算法,该算法结合分层迭代与可靠度判决测量有效降低后续迭代过程中的变量节点数,同时加快了收敛速度。针对适用于光传输系统的SCG-LDPC(3 969,3 720)码进行仿真,仿真结果表明HRBP算法与传统的BP算法相比,在保证性能的同时大大降低了运算量,在阈值为15时,HRBP译码算法误码率性能与BP译码算法相当,但是后续迭代的变量节点数在高信噪比下相比BP译码算法减少约69%,当阈值进一步增大时,HRBP算法将逐步退化为分层置信传播(Layered-BP)译码算法。 相似文献
15.
Szu-Wei Lee C.-C. Jay Kuo 《Journal of Visual Communication and Image Representation》2011,22(6):557-562
In this work, we propose a novel entropy coding mode decision algorithm to balance the tradeoff between the rate-distortion (R-D) performance and the entropy decoding complexity for the H.264/AVC video coding standard. Context-based adaptive binary arithmetic coding (CABAC), context-based adaptive variable length coding (CAVLC), and universal variable length coding (UVLC) are three entropy coding tools adopted by H.264/AVC. CABAC can be used to encode the texture and the header data while CAVLC and UVLC are employed to encode the texture and the header data, respectively. Although CABAC can provide better R-D performance than CAVLC/UVLC, its decoding complexity is higher. Thus, by taking the entropy decoding complexity into account, CABAC may not be the best tool, which motivates us to examine the entropy coding mode decision problem in depth. It will be shown experimentally that the proposed mode decision algorithm can help the encoder generate the bit streams that can be decoded at much lower complexity with little R-D performance loss. 相似文献
16.
An effective hierarchical reliable belief propagation (HRBP) decoding algorithm is proposed according to the struc- tural characteristics of systematically constructed Gallager low-density parity-check (SCG-LDPC) codes. The novel decoding algorithm combines the layered iteration with the reliability judgment, and can greatly reduce the number of the variable nodes involved in the subsequent iteration process and accelerate the convergence rate. The result of simulation for SCG-LDPC(3969,3720) code shows that the novel HRBP decoding algorithm can greatly reduce the computing amount at the condition of ensuring the performance compared with the traditional belief propagation (BP) algorithm. The bit error rate (BER) of the HRBP algorithm is considerable at the threshold value of 15, but in the sub- sequent iteration process, the number of the variable nodes for the HRBP algorithm can be reduced by about 70% at the high signal-to-noise ratio (SNR) compared with the BP algorithm. When the threshold value is further increased, the HRBP algorithm will gradually degenerate into the layered-BP algorithm, but at the BER of 10-7 and the maximal iteration number of 30, the net coding gain (NCG) of the HRBP algorithm is 0.2 dB more than that of the BP algo- rithm, and the average iteration times can be reduced by about 40% at the high SNR. Therefore, the novel HRBP de- coding algorithm is more suitable for optical communication systems. 相似文献
17.
Viterbi decoder is a common module in communication system, which has the requirement of low power and low decoding latency.
The conventional register exchange (RE) algorithm and memory-based trace-back (TB) algorithm cannot meet both constraints
of power and decoding latency. In this paper, we propose a new Survivor Memory Unit (SMU) algorithm, named State Exchange
(SE) algorithm. The SE algorithm uses the trace-forward unit (TFU) to run the decoding operation for low decoding latency. Besides, we enhance the SE algorithm by the concept of
the trace-back (TB). Based on this enhancement, we propose two types of SE-SMU. Proposed type-I SE-SMU has lower register requirement with
a long critical path. Proposed type-II SE-SMU can support the high speed requirement with the cost of additional TFUs and
latency. Both two proposed SE-SMUs have the decoding latency slightly higher than the decoding latency of RE-SMU. We synthesized
the proposed architecture in TSMC 0.13 um technology. Both two approaches have fewer active registers as decoding. From the power analysis, proposed SE-SMUs can give
a 70% power reduction comparing with RE-SMU at 100 MHz with the decoding length = 96. The power saving ration will increase
further with the longer decoding length. 相似文献
18.
Sang‐Hyo Kim Hosung Park Jong‐Seon No Dong‐Joon Shin 《International Journal of Communication Systems》2013,26(11):1475-1484
Under severely unreliable channel, decoding of error‐correcting codes frequently fails, which requires a lot of computational complexity, especially, in the iterative decoding algorithm. In hybrid automatic repeat request systems, most of computation power is wasted on failed decoding if a codeword is retransmitted many times. Therefore, early stopping of iterative decoding needs to be adopted. In this paper, we propose a new stopping algorithm of iterative belief propagation decoding for low‐density parity‐check codes, which is effective on both high and low signal‐to‐noise ratio ranges and scalable to variable code rate and length. The proposed stopping algorithm combines several good stopping criteria. Each criterion is extremely simple and will not be a burden to the overall system. With the proposed stopping algorithm, it is shown via numerical analysis that the decoding complexity of hybrid automatic repeat request system with adaptive modulation and coding scheme can be fairly reduced. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
19.
《Broadcasting, IEEE Transactions on》2002,48(3):237-245
Techniques using Reed-Solomon (RS) codes to recover lost packets in digital video/audio broadcasting and packet switched network communications are reviewed. Usually, different RS codes and their corresponding encoders/decoders are designed and utilized to meet different requirements for different systems and applications. We incorporate these techniques into a variable RS code and present encoding and decoding algorithms suitable for the variable RS code. A mother RS code can be used to produce a variety of RS codes and the same encoder/decoder can be used for all the derivative codes, with adding/detecting zeros, removing some parity symbols and adding erasures. A VLSI implementation for erasure decoding of the variable RS code is described and the achievable performance is quantitatively analyzed. A typical example shows that the signal processing speed is up to 2.5 Gbits/second and the processing delay is less than one millisecond, when integrating the decoder on a single chip. Therefore, the proposed algorithm and the encoder/decoder can universally be utilized for different applications with various requirements, such as transmission data rate, packet length, packet loss protection capacity, as well as layered protection and adaptive redundancy protection in DVB/DAB, Internet and mobile Internet communications. 相似文献