期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

ReSSIM: a mixed-level simulator for dynamic coarse-grained reconfigurable processor

LIU LeiBo JIA Wen YIN ShouYi WANG Dong SUN GuanYi TANG Eugene WEI ShaoJun 《中国科学:信息科学(英文版)》2013,56(6):1-16

This paper proposes a mixed-level simulator for dynamic coarse-grained reconfigurable processor(CGRP),called ReSSIM(reconfigurable system simulation implementation mechanism),and the corresponding simulation tool-chain,including task compiler,profiler and debugger.A generic modeling methodology supporting convenient extension of on-chip modules is also proposed.In order to explore the details of the interested modules while maintaining reasonable simulation speed,RCA(reconfigurable computing array),the key reconfigurable device in ReSSIM,is modeled on cycle-accurate level,while the other modules are modeled on transaction level.The typical parameters of RCA are scalable and adjustable,which helps the architects to explore the massive details of the reconfigurable device.Experiment shows that simulation speedup achieved ranges from 9.26× to 18.39× compared with VCS(Synopsys verilog compiler simulator) when running three computingintensive kernel tasks of H.264 decoding algorithm-IDCT(inverse discrete cosine transform),deblocking and MC-chroma(motion compensation).Simulation speed for a set of real applications,such as MPEG4,G.729 and EFR,is 35× slower than the corresponding native executions(i.e.measured from the real chip).And the relative simulation errors are 11% less than the measured IPC(instructions per cycle) of the real chip. 相似文献

2.

H.264及AVS双模视频解码器中帧内预测的硬件设计与实现

姜弢周佩海 MIN Bahadur K.C 《电子技术应用》2007,33(9):41-44

根据H.264/AVC及AVS的特点,设计出一种适合于帧内预测解码的硬件实现方式,并根据H.264和AVS帧内预测运算上的相似性提出了基于可重构的并行结构,有利于提高解码速度,并将该结构配合其他设计好的解码器模块,在FPGA上实现了高准清晰度的H.264及AVS视频的实时解码。相似文献

3.

Relay selection with feedback beamforming information through designed sector sweep report frame for mmWave WPANs

HongYun Chu PingPing Xu Lin Sun SuHeng Zhang 《中国科学:信息科学(英文版)》2014,57(8):1-14

This paper proposes a novel relay selection strategy based on the feedback beamforming （BF） information through designed sector sweep （SSW） report frame for millimeter-wave （mmWave） wireless personal networks （WPANs）. First, an SSW report frame compatible with IEEE 802.11ad standard is designed. Second, an approach collecting instantaneous channel state information （CSI） overheard during BF is devised. Third, with the aim of minimizing the outage probability and maximizing the overall network throughput capacity, the optimal relay selection issue for non-line-of-sight （NLoS） links is formulated as a bipartite graph, and Kuhn Munkres （KM） algorithm is provided to resolve it. Both theoretical analysis and simulation results show, with CSI considering NLoS conditions and selected relays according to the overall network throughput capacity maximization principle, the improvements achieved over opportunistic relay selection strategy in terms of overall network throughput capacity and outage probability with minimal modifications to IEEE 802.1lad. 相似文献

4.

Configuration Reusing in On-Line Task Scheduling for Reconfigurable Computing Systems

下载免费PDF全文

Maisam Mansub Bassiri Hadi Shahriar Shahhoseini 《计算机科学技术学报》2011,26(3):463-473

Reconfigurable computing systems can be reconfigured at runtime and support partial reconfigurability which makes us able to execute tasks in a true multitasking manner.To manage such systems at runtime,a reconfigurable operating system is needed.The main part of this operating system is resource management unit which performs on-line scheduling and placement of hardware tasks at runtime.Reconfiguration overhead is an important obstacle that limits the performance of on-line scheduling algorithms in reconfigurable computing systems and increases the overall execution time.Configuration reusing (task reusing) can decrease reconfiguration overhead considerably,particularly in periodic applications or the applications in which the probability of tasks recurrence is high.In this paper,we present a technique called reusing-based scheduling (RBS),for on-line scheduling and placement in which configuration reusing is considered as a main characteristic in order to reduce reconfiguration overhead and decrease total execution time of the tasks.Several experiments have been conducted on the proposed algorithm.Obtained results show considerable improvement in overall execution time of the tasks. 相似文献

5.

关键循环到可重构阵列映射中的时序参数分析

下载免费PDF全文

朱敏刘雷波尹首一王星魏少军《计算机工程》2012,38(22):260-262

通过定义算法关键循环到可重构阵列映射的建立时间、保持时间等核心时序参数,分析存储器带宽有限、算法数据流图拓扑不规则等实际问题,给出配置时序模型的优化算法,提出路径特征等参数的描述形式,为可重构自动编译提供新的处理方式。验证结果表明,在视频算法H.264关键循环deblocking的映射过程中,该优化映射方法使得性能在原有基础上提升43%。相似文献

6.

Dynamically reconfigurable architecture for symmetric ciphers

Bo Wang Leibo Liu 《中国科学:信息科学(英文版)》2016,59(4):042403

In this paper, a very large scale integration (VLSI) architecture for a reconfigurable cryptographic processor is presented. Several optimization methods have been introduced into the design process. The interconnection tree between rows (ICTR) method reduces the interconnection complexity and results in a small area overhead. The hierarchical context organization (HCO) scheme reduces the total context size and increases the dynamic configuration speed. Most symmetric ciphers, including AES, DES, SHACAL-1, SMS4, and ZUC, can be implemented using the proposed architecture. Experimental results show that the proposed architecture has obvious advantages over current state-of-the-art architectures reported in the literature in terms of performance, area efficiency (throughput/area) and energy efficiency (throughput/power). 相似文献

7.

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

Youngsub Ko Youngmin Yi Soonhoi Ha 《Journal of Real-Time Image Processing》2014,9(1):5-18

H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research efforts to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation, mainly due to significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion-estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. Further, we incorporate frame-level parallelization technique to improve the overall throughput. Experimental results show that our proposed H.264 encoder has higher performance than x264 encoder. 相似文献

8.

动态重构硬件加速中的性能开销建模

下载免费PDF全文

苑福利宫磊娄文启陈香兰《计算机工程与应用》2022,58(6):69-79

近年来,随着可重构计算方法和可重构硬件特性的不断演进,基于FPGA动态部分重构技术构建运行时可重构加速器已经成为解决传统加速器设计中硬件资源限制问题的重要途径.然而,区别于传统静态重构加速器,FPGA的动态重构开销是影响硬件加速整体性能的重要因素,而目前尚缺少能够在可重构硬件设计的早期阶段进行动态重构开销精确估算的相关... 相似文献

9.

基于Trimedia DSP的H.264解码算法优化 总被引：3，自引：0，他引：3

林冰冯艳李学明《计算机工程与应用》2005,41(31):41-45,89

H.264是最新的视频编码标准,具有非常优良的编码性能,但它的算法复杂度也很高,很难满足实时应用的需要。论文详细分析了影响H.264解码速度的因素,提出了基于TrimediaDSP平台的优化方案。该方案通过缩减不必要的判断、避免频繁的内存访问、优化内存的分配与使用、合理使用循环展开以及采用DSP专用指令等方法来提高H.264解码算法的运算速度。测试结果表明:优化后的代码运行速度平均提高了8倍,在主频为200MHz的TrimediaDSP上能实时解码CIF格式的H.264基本码流。相似文献

10.

CAVLC解码的一种有效方法

下载免费PDF全文

曹宁梅侠《中国图象图形学报》2008,13(2):230-233

已有的CAVLC解码方法包括二叉树解码方法、全码表解码方法和Hashem ian解码方法等,但是这些解码方法都只关注解码性能的一个方面:解码速度或存储空间,因而无法有效地提高整体性能。针对这一问题提出了一种快速的解码方法。该方法通过自动码表分配技术和码表地址转移技术来提高限定存储空间条件下的解码速度。实验结果表明,使用相同的存储空间,该方法的速度是传统解码方法的1.5倍,更加适用于H.264标准。相似文献

11.

Real-time H.264/AVC baseline decoder implementation on TMS320C6416

Imen Werda Taheni Dammak Thierry Grandpierre Mohamed Ali Ben Ayed Nouri Masmoudi 《Journal of Real-Time Image Processing》2012,7(4):215-232

The H.264/AVC Advanced Video Coding standard (AVC) is poised to enable a wide range of applications. However, its increased complexity creates a big challenge for efficient software implementations. This work develops and optimises the H.264/AVC video decoder level two on the TMS320C6416 Digital Signal Processor (DSP) for video conference applications. In order to accelerate the decoding speed, several algorithmic optimisations have been ported to inverse entropy decoding and intra-prediction modules. The parallelism between algorithm execution and data transfers was fully exploited using Enhanced Direct Memory Access (EDMA) engine. Furthermore, based on the DSP architectural features, various core-specific optimisation techniques were adopted leading to an increase in speed by up to 70%. Intensive experimental tests prove that a real-time decoding on TMS320C6416 DSP running at 720?MHz is obtained for Common Intermediate Format resolution (CIF 352?×?288). 相似文献

12.

用于移动终端的低复杂度快速模式选择算法

金智鹏郁梅《计算机工程与应用》2014,50(5):151-155

针对移动终端硬件计算能力不足的问题,提出了一种高性能、低复杂度的帧间快速模式选择算法。根据4×4整数DCT变换的特点,提出了一种基于全零块检测的运动估计提前中止策略;并针对H.264多模式预测编码的特点,进一步将宏块的全零块检测作为帧间编码模式快速选择的依据。实验测试表明,该算法可以在不损失率失真性能的情况下,有效降低H.264编码计算量,平均能降低31.84%的编码时间,最大能降低63.82%的编码时间。相似文献

13.

基于H.264的多视点立体视频解码器优化算法研究*

韩晶晶李素梅王宝亮高得鑫《计算机应用研究》2012,29(2):749-753

与平面(单视点)视频相比,多视点立体视频的数据量成倍地增加,对解码速度以及播放的流畅性影响很大,成为限制其广泛应用的重要因素之一。为了提高多视点立体视频的解码速度,基于H.264/MVC标准,根据码字前缀的特点,将原有码表划分为若干区域,精确了查表范围,优化了熵解码中CAVLC的查表过程,并将优化后的解码器移植到播放器中。实验结果表明,提出的优化算法使查表部分的速度提高70%左右,整体解码时间提高了5.9%。最终达到了一定的解码优化效果,并实现了播放器对264格式8视点立体视频文件的解码及播放功能。相似文献

14.

基于协处理器的H .264解码器SOC架构及设计

张志勋王永栋王娟《自动化与仪器仪表》2014,(1):42-44

描述了一种H.264解码器SOC系统架构和实现。该SOC系统采用基于协处理器的软硬件划分方式,通过分析H.264解码过程中的各运算环节,设计了相应协处理器的硬件运算单元及软件指令,在满足一定性能的条件下,具有很高的灵活性,便于系统的后续升级和扩展。验证结果表明该SOC系统将解码时间提高了约75%以上,有效的加速了解码器的运行速度。相似文献

15.

一种H.264/AVC的自适应去块效应滤波快速算法 总被引：1，自引：0，他引：1

下载免费PDF全文

颜洪奎朱珍民沈燕飞肖建华《计算机工程与应用》2008,44(29):65-68

去块效应滤波在H.264视频编解码中起到了很重要的作用,对H.264中去块效应滤波的理论进行了再分析,提出了一种以4×4块为单位,对帧内预测帧和帧间预测帧分别计算边缘强度（Bs）的快速去块效应滤波算法。实验仿真结果表明,该算法同时适应于编码和解码中的去块效应滤波,在有效提高去块效应滤波效率的同时不影响已有编解码的码流和图像质量。相似文献

16.

基于纹理特征的H.264帧内预测快速算法

下载免费PDF全文

张志涛梁光明陈明生刘东华王立松《中国图象图形学报》2011,16(8):1369-1373

H.264/AVC是最新的视频压缩编码标准,在帧内预测过程中,采用了率失真优化技术（RDO）进行预测模式的选择,使编码性能得到显著提高,但同时编码复杂度和计算量也明显增加。研究了现有的典型快速帧内预测算法,并提出一种融合宏块平坦性特征和4×4块纹理特征的快速帧内预测算法。算法通过判断宏块的平坦性提前选定块大小,根据4×4块内部纹理特征,确定预测模式集,降低算法复杂度。实验结果表明,较之JM95,在峰值信噪比(PSNR)基本不变,输出码率略有升高的情况下,本文算法对一个宏块的RDO计算次数平均降低了71.3%。相似文献

17.

H.264帧内4×4块预测模式选择快速算法研究

下载免费PDF全文

韩青李莉应骏《中国图象图形学报》2007,12(10):1745-1748

在H．264视频编码过程中，编码时间受诸多因素的影响，如帧间／帧内模式选择、运动估计（ME）、率失真优化（RDO）等。为了以较快速度和较好质量进行编码，针对H．264帧内模式选择，提出了一种适用于H．264帧内4×4块预测的模式选择快速算法。该算法利用帧内4×4块最优预测模式与和它相邻的预测模式之间率失真代价（RDCost）的高相关性，以及绝对变换误差和（SATD）与率失真（RD）性能之间的强相关性，有效地跳过一些不太可能的预测模式，从而使帧内4×4块模式选择过程只需进行4次率失真代价计算即可。实验结果显示，该算法在编码性能和编码速度之间取得了很好的折衷。相似文献

18.

一种4×4整数变换的设计实现

胡红旗孙景楠《计算机工程与应用》2005,41(33):112-114

在H.264编码算法中,引入了一种基于4×4像素块的整数变换。文章给出了一种4x4整数变换的快速VLSI实现方式。通过引入分步式算法(DA),实现了基于LUT的快速4×4整数变换。该算法采用流水操作,提高了视频编解码整体系统的处理能力。仿真结果显示设计满足功能和时序要求,并且大幅提高了处理速度。相似文献

19.

A Highly Efficient VLSI Architecture for H.264/AVC CAVLC Decoder

Heng-Yao Lin Ying-Hong Lu Bin-Da Liu Jar-Ferr Yang 《Multimedia, IEEE Transactions on》2008,10(1):31-42

In this paper, an efficient algorithm is proposed to improve the decoding efficiency of the context-based adaptive variable length coding (CAVLC) procedure. Due to the data dependency among symbols in the decoding flow, the CAVLC decoder requires large computation time, which dominates the overall decoder system performance. To expedite its decoding speed, the critical path in the CAVLC decoder is first analyzed and then reduced by forwarding the adaptive detection for succeeding symbols. With a shortened critical path, the CAVLC architecture is further divided into two segments, which can be easily implemented by a pipeline structure. Consequently, the overall performance is effectively improved. In the hardware implementation, a low power combined LUT and single output buffer have been adopted to reduce the area as well as power consumption without affecting the decoding performance. Experimental results show that the proposed architecture surpassing other recent designs can approximately reduce power consumption by 40% and achieve three times decoding speed in comparison to the original decoding procedure suggested in the H.264 standard. The maximum frequency can be larger than 210 MHz, which can easily support the real-time requirement for resolutions higher than the HD1080 format. 相似文献

20.

Simplified algorithms for rate-distortion optimization in high efficiency video coding

《Displays》2015

HEVC is the latest coding standard to improve the coding efficiency by a factor of two over the previous H.264/AVC standard at the cost of the increased complexity of computation rate-distortion optimization (RDO) is one of the computationally demanding operations in HEVC and makes it difficult to process the HEVC compression in real time with a reasonable computing power. This paper aims to present various simplified RDO algorithms with the evaluation of their RD performance and computational complexity. The algorithms for the simplified estimation of the sum of squared error (SSE) and context-adaptive binary arithmetic coding (CABAC) proposed for H.264/AVC are reviewed and then they are applied to the simplification of HEVC RDO. By modifying the previous algorithm for H.264/AVC, a new simplified RDO algorithm is proposed for modifying the previous algorithm for H.264/AVC to be optimized for the hierarchical coding structure of HEVC. Further simplification is attempted to avoid the transforms operations in RDO. The effectiveness of the existing H.264/AVC algorithms as well as the proposed algorithms targeted for HEVC is evaluated and the trade-off relationship between the RD performance and computational complexity is presented for various simplification algorithms. Experimental results show that reasonable combinations of RDO algorithms reduce the computation by 80–85% at the sacrifice of the BD-BR by 3.46–5.93% for low-delay configuration. 相似文献