首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A high throughput parallel decoding method is developed for context‐based adaptive variable length codes. In this paper, several new design ideas are devised and implemented for scalable parallel processing, a reduction in area, and a reduction in power requirements. First, simplified logical operations instead of memory lookups are used for parallel processing. Second, the codes are grouped based on their lengths for efficient logical operation. Third, up to M bits of the input stream can be analyzed simultaneously. For comparison, we designed a logical‐operation‐based parallel decoder for M=8 and a conventional parallel decoder. High‐speed parallel decoding becomes possible with our method. In addition, for similar decoding rates (1.57 codes/cycle for M=8), our new approach uses 46% less chip area than the conventional method.  相似文献   

2.
主要针对当前H.264/AVC中CAVLC中的标准解码方法 TLSS查表时存在查表时间长的问题,提出了一种全新的基于哈希表快速查询的CAVLC解码查表优化方法。在CAVLC解码查表中引入哈希表查找技术,提高了CAVLC解码查表速度,降低了CAVLC解码中不规则可变长码表(UVLCT)的码字获取时间,从而减少CAVLC解码查表时间。实验仿真结果表明,在没有丝毫降低视频解码质量前提下,相比于标准TLSS方法,提出的新算法可以提高约18%~22%的表查找时间。  相似文献   

3.
Polar codes recently received high attention by researchers as proven to approach channel capacity at higher codeword length. However, the decoding latency grows significantly with codeword length, rendering implementation for latency constrained applications impossible. To tackle this problem, this paper proposes a polar decoder architecture based on radix-4 processing units with a special last stage processing unit to decode up to 16 bits in the same clock. In addition, it proposes decoding extended special subcodes to reduce latency. Moreover, it uses partial sum look-ahead technique, resulting in a high throughput low latency decoding architecture.  相似文献   

4.
CAVLC(基于上下文的自适应变长编码)由于码字的长度不固定,其解码器的设计往往是整个视频解码器的难点之一。文中对H.264熵解码模块进行了研究,利用分组优化查表思想,在分析了CAVLC码表特征后,提出了一种将CAVLC码字分为前缀和后缀两部分,根据前缀分组,利用后缀信息查表得到对应比特串的分组解码优化方法。结果表明,所提出的CAVLC分组解码优化算法在节省存储空间和提高解码速度方面具有优异的性能。  相似文献   

5.
In this paper, we propose a flexible turbo decoding algorithm for a high order modulation scheme that uses a standard half‐rate turbo decoder designed for binary quadrature phase‐shift keying (B/QPSK) modulation. A transformation applied to the incoming I‐channel and Q‐channel symbols allows the use of an off‐the‐shelf B/QPSK turbo decoder without any modifications. Iterative codes such as turbo codes process the received symbols recursively to improve performance. As the number of iterations increases, the execution time and power consumption also increase. The proposed algorithm reduces the latency and power consumption by combination of the radix‐4, dual‐path processing, parallel decoding, and early‐stop algorithms. We implement the proposed scheme on a field‐programmable gate array and compare its decoding speed with that of a conventional decoder. The results show that the proposed flexible decoding algorithm is 6.4 times faster than the conventional scheme.  相似文献   

6.
邵振  郑世宝  杨宇红 《电视技术》2006,(3):21-23,27
介绍了SoC的发展概况和趋势,提出了一种基于SoC平台的H.264解码器优化设计架构。在设计中采取了灵活的帧场自适应解码策略,对于总线时序需求较高的模块采用了流水线设计,对总线进行了时分复用;在可变长解码部分.对各个功能模块进行了控制分离,这些优化除了可有效地减小时钟频率需求外,还可在一定程度上兼容其它的视额压缩标准.如MPEG-2。最后实现了这个设计,并给出了实验结果。  相似文献   

7.
A low-complexity design architecture for implementing the Successive Cancellation (SC) decoding algorithm for polar codes is presented. Hardware design of polar decoders is accomplished using SC decoding due to the reduced intricacy of the algorithm. Merged processing element (MPE) block is the primary area occupying factor of the SC decoder as it incorporates numerous sign and magnitude conversions. Two’s complement method is typically used in the MPE block of SC decoder. In this paper, a low-complex MPE architecture with minimal two’s complement conversion is proposed. A reformulation is also applied to the merged processing elements at the final stage of SC decoder to generate two output bits at a time. The proposed merged processing element thereby reduces the hardware complexity of the SC decoder and also reduces latency by an average of 64%. An SC decoder with code length 1024 and code rate 1/2 was designed and synthesized using 45-nm CMOS technology. The implementation results of the proposed decoder display significant improvement in the Technology Scaled Normalized Throughput (TSNT) value and an average 48% reduction in hardware complexity compared to the prevalent SC decoder architectures. Compared to the conventional SC decoder, the presented method displayed a 23% reduction in area.  相似文献   

8.
为了提高CAVLC解码器的解码速率,提出了一种优化的CAVLC解码器结构,主要包括level解码模块和RunBefore解码模块。level解码模块采用伪并行的结构解码幅值,实现了半个周期解码一个幅值;采用RunBefore与level快速合并的方法,在RunBefore解码完成的同时形成残差系数。建立了该优化结构的RTL模型,并验证了其功能的正确性。利用Xilinx公司的ISE13.3对该设计进行综合,结果显示该设计可以支持1 080 p高清视频的实时解码。  相似文献   

9.
H.264标准在编解码模块定义了一种基于内容的变长编码(CAVLC),对于实时处理来说,若该部分计算量过大,将影响整个系统的处理速度。对H.264熵解码模块进行了研究,在分析了CAVLC码表特征后,利用分组优化查表思想,提出了码头分组的快速变长熵解码方法。结果表明该方法使得H.264解码器在熵解码模块质量没有下降的情况下,速度提高了4倍以上。  相似文献   

10.
In this work, we propose a novel entropy coding mode decision algorithm to balance the tradeoff between the rate-distortion (R-D) performance and the entropy decoding complexity for the H.264/AVC video coding standard. Context-based adaptive binary arithmetic coding (CABAC), context-based adaptive variable length coding (CAVLC), and universal variable length coding (UVLC) are three entropy coding tools adopted by H.264/AVC. CABAC can be used to encode the texture and the header data while CAVLC and UVLC are employed to encode the texture and the header data, respectively. Although CABAC can provide better R-D performance than CAVLC/UVLC, its decoding complexity is higher. Thus, by taking the entropy decoding complexity into account, CABAC may not be the best tool, which motivates us to examine the entropy coding mode decision problem in depth. It will be shown experimentally that the proposed mode decision algorithm can help the encoder generate the bit streams that can be decoded at much lower complexity with little R-D performance loss.  相似文献   

11.
Context-based adaptive variable length coding (CAVLC) and universal variable length coding (UVLC) are two entropy coding tools that are supported in all profiles of H.264/AVC coders. In this paper, we investigate the relationship between the bit rate and the CAVLC/UVLC decoding complexity. This relationship can help the encoder choose the best coding parameter to yield the best tradeoff between the rate, distortion, and the decoding complexity performance. A practical application of CAVLC/UVLC decoding complexity reduction is also discussed.  相似文献   

12.
In this paper, we propose and present implementation results of a high‐speed turbo decoding algorithm. The latency caused by (de)interleaving and iterative decoding in a conventional maximum a posteriori turbo decoder can be dramatically reduced with the proposed design. The source of the latency reduction is from the combination of the radix‐4, center to top, parallel decoding, and early‐stop algorithms. This reduced latency enables the use of the turbo decoder as a forward error correction scheme in real‐time wireless communication services. The proposed scheme results in a slight degradation in bit error rate performance for large block sizes because the effective interleaver size in a radix‐4 implementation is reduced to half, relative to the conventional method. To prove the latency reduction, we implemented the proposed scheme on a field‐programmable gate array and compared its decoding speed with that of a conventional decoder. The results show an improvement of at least five fold for a single iteration of turbo decoding.  相似文献   

13.

Successive-cancellation list (SCL) decoding for polar codes has the disadvantage of high latency owing to serial operations. To improve the latency, several algorithms with additional circuits have been proposed, but the area becomes larger. This paper proposes a fast multibit decision method having-high area efficiency based on the SCL decoding algorithm. First, multiple bits can be determined to reduce clock cycles using new nodes represented by the information bits and frozen bits. We propose the new nodes called the combined nodes and the other node in this paper. The combined nodes that combine redundant operations of the fast-simplified SC (fast-SSC) algorithm can increase area efficiency. The other node with bit patterns other than the node types of the fast-SSC algorithm performs an 8-bit multibit decision to reduce the number of decoding cycles. Latency is further reduced by applying a sphere decoding method to the other node. In addition, a sorter is proposed to reduce the critical path delay. As a large number of path metrics causes sorter delays, the proposed sorter can achieve high throughput with the small area. The proposed (1024, 512) SCL decoder showed negligible performance degradation in the simulation using Matlab and was synthesized using 65 nm CMOS technology. The proposed decoder achieves about 1.3Gbps with the small area. As a result, the area-throughput efficiency is at least 1.4 times higher than the state-of-the-art works over 1 Gbps.

  相似文献   

14.
In this paper we present a Base-matrix based decoder architecture for multi-rate QC-LDPC codes proposed in broadband broadcasting system. We use the Modified Min-Sum Algorithm (MMSA) as the decoding algorithm in this architecture, which lowers the complexity of the LDPC decoder while keeping almost the same performance or even better. Based on this algorithm, we designed a novel check node processing unit to reduce the complexity of the decoder and facilitate the multiplex of the processing units. The decoder designed with hardware constraints is not only scalable in throughput, but also easily configurable to support different QC-LDPC codes flexible in code rate and code length.  相似文献   

15.
The problem of low complexity linear programming (LP) decoding of low-density parity-check (LDPC) codes is considered. An iterative algorithm, similar to min-sum and belief propagation, for efficient approximate solution of this problem was proposed by Vontobel and Koetter. In this paper, the convergence rate and computational complexity of this algorithm are studied using a scheduling scheme that we propose. In particular, we are interested in obtaining a feasible vector in the LP decoding problem that is close to optimal in the following sense. The distance, normalized by the block length, between the minimum and the objective function value of this approximate solution can be made arbitrarily small. It is shown that such a feasible vector can be obtained with a computational complexity which scales linearly with the block length. Combined with previous results that have shown that the LP decoder can correct some fixed fraction of errors we conclude that this error correction can be achieved with linear computational complexity. This is achieved by first applying the iterative LP decoder that decodes the correct transmitted codeword up to an arbitrarily small fraction of erroneous bits, and then correcting the remaining errors using some standard method. These conclusions are also extended to generalized LDPC codes.   相似文献   

16.
Turbo译码器在数据协调中的应用与仿真   总被引:1,自引:0,他引:1  
Turbo码以其几乎接近Shannon理论极限的译码性能而成为目前为止最好的信道编码方案。为了减少信道传输引起的误码,提高传输可靠性,设计了一种基于SOVA算法的Turbo译码器,并介绍了Turbo译码器在数据协调中的应用。与传统的Turbo译码器相比,增加了两个权重模块,这样可提高译码的性能。同时,通过Matlab仿真,验证了所设计的Turbo译码器功能的正确性。  相似文献   

17.
H.264解码器的系统设计及CAVLC的硬件实现   总被引:1,自引:0,他引:1  
设计了一种软硬件协同处理的H.264解码器系统方案,基于该方案给出CAVLC解码模块的硬件实现结构,采用有限状态机实现解码的流程控制,并对其查表部分进行优化.验证结果表明,在尽量降低硬件资源损耗的基础上,该方案能满足H.264基本框架4CIF格式图片30 f/s(帧/秒)实时解码的要求.  相似文献   

18.
A lower bound to the distribution of computation for sequential decoding   总被引:1,自引:0,他引:1  
In sequential decoding, the number of computations which the decoder must perform to decode the received digits is a random variable. In this paper, we derive a Paretian lower bound to the distribution of this random variable. We show thatP [C > L]L^{-rho}, whereCis the number of computations which the sequential decoder must perform to decode a block ofLambdatransmitted bits, and is a parameter which depends on the channel and the rate of the code. Our bound is valid for all sequential decoding schemes and all discrete memoryless channels. In Section II we give an example of a special channel for which a Paretian bound can be easily derived. In Sections III and IV we treat the general channel. In Section V we relate this bound to the memory buffer requirements of real-time sequential decoders. In Section VI, we show that this bound implies that certain moments of the distribution of the computation per digit are infinite, and we determine lower bounds to the rates above which these moments diverge. In most cases, our bounds coincide with previously known upper bounds to rates above which the moments converge. We conclude that the performance of systems using sequential decoding is limited by the computational and buffer capabilities of the decoder, not by the probability of making a decoding error. We further note that our bound applies only to sequential decoding, and that, in certain special cases (Section II), algebraic decoding methods prove superior.  相似文献   

19.
This paper proposes two optimization methods based on dataflow representations and dynamic compilation that enhance flexibility and performance of multimedia applications. These optimization methods are intended to be used in an adaptive decoding context, or, in other terms, where decoders have the ability to adapt their decoding processes according to a bitstream. This adaptation is made possible by coupling the decoding information to process a stream inside a coded stream. In this paper, we use dataflow representations from the upcoming MPEG Reconfigurable Media Coding (RMC) standard to supply the decoding information to adaptive decoders. The benefits claimed by MPEG RMC are a reuse of coding tools between different specifications of decoder and an execution scalability on different processing units with a single specification, which can target either hardware and/or software platforms. These benefits are not yet achievable in practice as these specifications are not used at the receiver side in MPEG RMC. We valid these benefits and propose two optimizations for the generation and the execution of dataflow models: the first optimization takes benefits of the reuse of coding tools to reduce the time to obtain—configure—enforceable decoders. The second provides an efficient, dynamic, and scalable execution according to the features of the execution platform. We show the practical impact of these two optimizations on two decoder representations compliant with the MPEG-4 part 2 Simple Profile standard and the MPEG-4 Advanced Video Coding standard. The results shows that configuration time can be reduced by 3 and the performance of decoders can be increased by 50 %.  相似文献   

20.
云飞龙  朱宏鹏  吕晶  杜锋 《通信技术》2015,48(11):1228-1233
针对具有准循环结构的LDPC码,设计了一种低复杂度译码器。利用校验矩阵的循环特性以及分层迭代的译码算法,对一般的分层迭代架构进行改进,实现了译码器流水线处理,有效的减少迭代时间,提高吞吐量,最后针对码长为1200的LDPC码,基于FPGA平台Kintex7 xc7k325的芯片实现了该架构设计,结果表明,该译码器只消耗了100多个Slices和几块RAM,有效节省了硬件资源,同时译码时间比一般的分层架构减少了2/3左右,吞吐量提高了约2倍,研究成果具有重要的实用价值,可应用于资源有限的低速通信领域。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号