共查询到20条相似文献,搜索用时 760 毫秒
1.
2.
可重构系统具有领域内灵活,性能和专用电路接近的优点,是视频解码的优秀硬件方案。然而在可重构系统上进行高清实时解码还有一定的难度,其中占80%计算量的主要是IDCT(反离散余弦变换)、MC(运动补偿)、Intra-prediction(帧内预测)、deblocking(去块效率滤波)等计算密集型任务。本文基于一款粗粒度可重构处理器,提出了上述计算密集型算法的映射方案,性能优于M.Ganesan与D.Peng在2007、2009年的方案,满足H.264高清实时解码的要求。 相似文献
3.
4.
设计通用的宏块并行的H.264帧内解码次序,避免了解码时的数据冲突,进而设计了存储器及计算单元可复用的帧内预测宏块并行解码单元,在解码速度提高的同时,尽量避免了资源的开销.通过对设计的并行解码器速度的测试及DC综合的结果,验证了设计的可复用的宏块并行帧内解码器的VLSI结构有效性,每个宏块解码平均速度到达了113cycles. 相似文献
5.
6.
文章分析了主要分组密码算法操作特征以及处理结构的特点,结合可重构处理结构的设计方法,提出一种可重构密码处理结构.设计实现了基于可重构密码处理结构的验证原型.分析结果表明,在验证原型上执行的分组密码算法都可达到较高的性能. 相似文献
7.
8.
在无反馈分布式视频编码系统中,提出了一种Wyner-Ziv帧的顽健重构算法。针对比特面解码错误带来的视频质量下降问题,对DC系数和AC系数使用不同重构方法,特别是对于解码失败的DC系数量化值,利用编码端原始图像的相关信息自适应地调整边信息量化值和解码失败量化值对重构的贡献,从而完成重构。实验结果表明,与最小均方误差重构算法相比,该算法可以有效提高解码视频的平均PSNR(peak signal-to-noise ratio),且解码视频图像的主观质量有明显改善。 相似文献
9.
10.
11.
作为计算量最多的模块之一,运动补偿占用了解码器与片外数据存储器之间约70%的带宽,是实现超高清视频解码的瓶颈。通过所设计的基于Cache的HEVC运动补偿模块,在保证实时解码数据吞吐量的同时,有效减少了80%的带宽消耗。首先,利用由可复用滤波器构成的插值计算模块和2D Cache设计了可并行化流水线数据处理的运动补偿模块,满足计算过程中高数据吞吐量需求。其次,设计高效内部存储器RAM结构,并提出片内Cache功耗降低的有效解决方案。最后,利用了参考帧数据相关性,设计插值顺序重排,将Cache的硬件开销减少了87.5%。基于HM9.0的HEVC标准测试视频序列实验结构表明,该设计显著地减少了带宽消耗和硬件开销。 相似文献
12.
Tie-Bin Wu Heng-Zhu Liu Peng-Xia Liu Dong-Sheng Guo Hai-Ming Sun 《Journal of Electronic Testing》2013,29(4):585-600
Input vector monitoring concurrent on-line BIST based on multilevel decoding logic is an attractive approach to reduce hardware overhead. In this paper, a novel optimization scheme is proposed for further reducing the hardware overhead of the decoding structure, which refers to improved decoding, input reduction, and simulated annealing inputs swapping approaches. Furthermore, utilizing similar multilevel decoding logic as the responses verifier, a novel cost-efficient input vector monitoring concurrent on-line BIST scheme is presented. The proposed scheme is applicable to the concurrent on-line testing for the CUT, the detail of which can not be obtained, such as hard IP cores. Experimental results indicate that the proposed optimization approaches can significantly reduce the hardware overhead of the decoding structure, and the proposed scheme costs lower hardware than other existing schemes. 相似文献
13.
Michalis D. Galanis Athanassios Milidonis Athanassios P. Kakarountas Costas E. Goutis 《Microelectronics Journal》2006,37(6):554-564
In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications. 相似文献
14.
Benaissa M. Yiqun Zhu 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(3):555-565
A novel reconfigurable sequential decoder architecture based on the Fano algorithm is presented in which the constraint length, the threshold spacing, and the time-out threshold are all run time reconfigurable. To maximize decoding performance, a maximum possible backward depth (of a whole frame) is performed. This is achieved by using shift registers combined with memory to store the information of an entire visited path. A field-programmable gate array) prototype of the decoder is built and actual hardware decoding performances in terms of decoding speeds, bit error rates (BERs), and buffer overflow rates, are obtained and comparisons made. To overcome the decoding delay that is inherent in sequential decoders, a hybrid scheme, including simple block codes and cyclic redundancy check is proposed to limit the number of backward search operations that the sequential decoder has to execute. As a result, a significant reduction in decoding delay and buffer overflow rate is achieved while maintaining comparative decoding performance in terms of BER 相似文献
15.
16.
Turbo码具有逼近Shannon容量限的优异性能,介绍了应用于深空通信的Turbo码编码方案和相应的译码算法,并给出了采用修正Max-Log-Map译码算法的深空CCSDS标准Turbo码的软件仿真性能和硬件系统实测性能。通过计算机仿真和硬件实测结果表明,采用该修正Max-Log-Map译码算法的Turbo码译码器易于硬件实现,同时Turbo码仿真性能和实际性能一致,适用于实际工程应用。 相似文献
17.
Turbo乘积码是一类易于硬件实现高速迭代译码的分组码。对Turbo乘积码软输入软输出迭代译码算法进行了分析。将Turbo乘积码与QAM调制结合起来,提出了一种简化的、便于硬件实现的联合解调译码方案。仿真结果表明这种简化方案对译码性能影响很小。 相似文献
18.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):135-138
This paper presents novel very large scale integration (VLSI) architectures in support of an efficient implementation of Leighton's well-known Columnsort. The designs take advantage of reconfigurable bus architectures enhanced with simple shift switches. Our first main contribution is to show that Columnsort can be partitioned into two components: a hardware scheme involving the task of sorting arrays of small size and a hardware or software scheme that involves simple data movement tasks. Our second main contribution is to demonstrate that the dynamically reconfigurable mesh architecture can be exploited to obtain a small and efficient hardware sorter. The resulting architectures feature high regularity of circuitry, simplicity of control structure, and adaptability. Both theoretical analyses and simulation tests have shown that the proposed VLSI architectures for sorting are superior to existing designs in the context of sorting small and moderate size arrays 相似文献
19.
介绍了ITU-G.729语音压缩标准的编、解码原理,提出了一种基于DSP的软、硬件设计方案,并着重讨论了在实现过程中的几项关键技术。 相似文献