首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Recently, a new video coding standard called HEVC has been developed to deal with the nowadays media market challenges, being able to reduce to the half, on average, the bit stream size produced by the former video coding standard H.264/AVC at the same video quality. However, the computing requirements to encode video improving compression efficiency have significantly been increased. In this paper, we focus on applying parallel processing techniques to HEVC encoder to significantly reduce the computational power requirements without disturbing the coding efficiency. So, we propose several parallelization approaches to the HEVC encoder which are well suited to multicore architectures. Our proposals use OpenMP programming paradigm working at a coarse grain level parallelization which we call GOP-based level. GOP-based approaches encode simultaneously several groups of consecutive frames. Depending on how these GOPs are conformed and distributed, it is critical to obtain good parallel performance, taking also into account the level of coding efficiency degradation. The results show that near ideal efficiencies are obtained using up to 12 cores.  相似文献   

2.
软硬件划分是软硬件协同设计的关键环节,划分的结果直接影响目标系统的设计质量。因此,对于一个给定的应用程序,为了使得目标系统快速执行且成本低廉,合理的划分策略十分重要。由于单个任务具有多种不同的硬件实现方式,与传统的单一硬件实现方式的软硬件划分问题相比,多选择的软硬件划分更能客观地反映现实应用。这导致问题的求解更具挑战性,它们已被证明是NP完全问题。基于多核处理器片上系统并针对任务图为二叉树的应用,建立了多选择软硬件划分问题的计算模型,并提出了解决该问题的动态规划算法。实验结果表明,当问题规模适中时,所提动态规划算法能够有效地获得精确解,并展示了算法的计算能力与硬件面积限制之间的关系。  相似文献   

3.
4.
冯鑫  郭炜 《计算机仿真》2007,24(10):257-260
随着SoC(System on Chip)系统设计复杂度的不断提高,设计前期在系统级别进行软硬件规划对SoC 性能的影响日趋增加,在复杂视频解码SoC 设计中迫切需要高效的性能分析和验证平台从架构层次上优化性能.将基于电子系统级设计(Electronic System Level , ESL)仿真方法在MPEG-4 视频解码SoC 软硬件协同设计中的应用,利用ARM SoC-Designer ESL 平台分析软件算法的瓶颈,实现软硬划分.通过SystemC 对硬件单元周期精确建模,最终实现了MEPG-4 解码软硬件协同仿真验证.实践证明利用ESL 进行系统设计不仅可以有效提高仿真速度而且设计的视频解码硬件能有效改善系统的性能.  相似文献   

5.
面向SoC的软硬件协同验证平台设计   总被引:1,自引:1,他引:0  
鲍华  洪一  郭二辉 《计算机工程》2009,35(8):271-273
针对SoC设计验证的实际需求,介绍一种面向SoC设计的软硬件协同验证平台。平台中软硬件模型分别在不同环境下运行,通过网络实现信息交互。硬件用硬件描述语言实现对系统事务级、RTL级的建模,软件用高级编程语言来编写,使用指令集仿真器完成对硬件的仿真。仿真过程使用不同的进程并行进行,应用进程间通信方式实现仿真器之间的信息交互。  相似文献   

6.
Sequential Monte Carlo (SMC) represents a principal statistical method for tracking objects in video sequences by on-line estimation of the state of a non-linear dynamic system. The performance of individual stages of the SMC algorithm is usually data-dependent, making the prediction of the performance of a real-time capable system difficult and often leading to grossly overestimated and inefficient system designs. Also, the considerable computational complexity is a major obstacle when implementing SMC methods on purely CPU-based resource constrained embedded systems. In contrast, heterogeneous multi-cores present a more suitable implementation platform. We use hybrid CPU/FPGA systems, as they can efficiently execute both the control-centric sequential as well as the data-parallel parts of an SMC application. However, even with hybrid CPU/FPGA platforms, determining the optimal HW/SW partitioning is challenging in general, and even impossible with a design time approach. Thus, we need self-adaptive architectures and system software layers that are able to react autonomously to varying workloads and changing input data while preserving real-time constraints and area efficiency. In this article, we present a video tracking application modeled on top of a framework for implementing SMC methods on CPU/FPGA-based systems such as modern platform FPGAs. Based on a multithreaded programming model, our framework allows for an easy design space exploration with respect to the HW/SW partitioning. Additionally, the application can adaptively switch between several partitionings during run-time to react to changing input data and performance requirements. Our system utilizes two variants of a add/remove self-adaptation technique for task partitioning inside this framework that achieve soft real-time behavior while trying to minimize the number of active cores. To evaluate its performance and area requirements, we demonstrate the application and the framework on a real-life video tracking case study and show that partial reconfiguration can be effectively and transparently used for realizing adaptive real-time HW/SW systems.  相似文献   

7.
HEVC即H.265,是目前最新的视频编码标准。相比于前一代视频编码标准,H.265/HEVC虽然能够明显改善视频压缩效率,但是却带来了更高的计算复杂度,尤其是在帧内预测过程中。为了解决这个问题,提出一种基于梯度的帧内预测硬件加速算法来跳过一些帧内预测模式和划分深度的预测过程,从而达到减少计算的目的。利用图像梯度信息来粗略估计编码单元的纹理方向和纹理复杂度,其中纹理方向用来估计编码单元的最优帧内预测方向,纹理复杂度用来判断是否跳过当前划分深度的预测编码过程。实验表明,相比于H.265/HEVC测试模型HM16.18,本文提出的算法能够减少6059%的编码时间,仅造成0.38 dB的BD PSNR减少和8.52%的BD-Rate增加。  相似文献   

8.
This paper presents a modelling-based methodology for embedded control system (ECS) design. Here, instead of developing a new methodology for ECS design, we propose to upgrade an existing one by bridging it with a methodology used in other areas of embedded systems design. We created a transformation bridge between the control-scheduling and the hardware/software (HW/SW) co-design tools. By defining this bridge, we allow for an automatic model transformation. As a result, we obtain more accurate timing-behaviour simulations, considering not only the real-time software, but also the hardware architecture’s impact on the control performance. We show an example with different model-evaluation results compared to real implementation measurements, which clearly demonstrates the benefits of our approach.  相似文献   

9.
Hardware–software partitioning (HW/SW) divides an application into software and hardware. It is one of the crucial steps in embedded system design. For a given task, hardware with different areas may provide different execution speeds due to the potential of parallel execution in hardware implementation. Thus, one task may have multiple-choice in hardware implementation according to the available hardware areas. Existing HW/SW partitioning approaches typically consider only a single implementation manner in hardware, overlooking the multiple-choice of hardware implementations. This paper presents a computing model to cater for the HW/SW partitioning problems with the multiple-choice implementation in hardware. An efficient heuristic algorithm is proposed to rapidly generate approximate solution, that is further refined by a tabu search algorithm also customized in this paper. Moreover, a dynamic programming algorithm is proposed for the exact solution of the relatively small problems. Extensive simulation results show that the approximate solutions are very close to the exact ones, and they can be refined by tabu search to the solutions with the error no more than 1.5% for all cases considered in this paper.  相似文献   

10.
This paper proposes a task-based hybrid parallel and hybrid pipeline(THPHP)scheme to implement multi-standard video algorithms,including MPEG-2,H.264,and audio video coding standard(AVS),on a heterogeneous coarse-grained reconfigurable processor,called the reconfigurable multimedia system(REMUS).The proposed schemes greatly improve decoding performance and satisfy the real-time requirements of various high-definition(HD)video decoding standards.In THPHP,we propose both a task-based hybrid parallel scheme,in which macro-block(MB)-level,block-level,and sub-block-level decoding tasks are parallelized to improve data processing throughput,and a hybrid pipeline scheme,in which slice-level,MB-level,block-level and sub-block-level computations are pipelined to improve efficiency.Computation-intensive tasks,such as motion compensation,intra prediction,inverse discrete cosine transform,reconstruction,and deblocking filter,are implemented on two reconfigurable processing units,which are the core computing engines of REMUS.Thanks to the proposed schemes,the implementations can achieve H.264 high profile(HP)1920×1080@30 fps streams,AVS Jizhun profile(JP)1920×1080@39 fps streams,and MPEG-2 main profile(MP)1920×1080@41 fps streams when working at 200 MHz frequency.Compared with XPP-III(a commercial reconfigurable processor),when implementing H.264 HD decoding,the performance and energy efficiency on REMUS are improved by1.81×and 14.3×,respectively.  相似文献   

11.
In this paper,a TPP(Task-based Parallelization and Pipelining)scheme is proposed to implement AVS(Audio Video coding Standard)video decoding algorithm on REMUS(REconfigurable MUltimedia System),which is a coarse-grained reconfigurable multimedia system.An AVS decoder has been implemented with the consideration of HW/SW optimized partitioning.Several parallel techniques,such as MB(Macro-Block)-based parallel and block-based parallel techniques,and several pipeline techniques,such as MB level pipeline and block level pipeline techniques are adopted by hardware implementation,for performance improvement of the AVS decoder.Also,most computation-intensive tasks in AVS video standards,such as MC(Motion Compensation),IP(Intra Prediction),IDCT(Inverse Discrete Cosine Transform),REC(REConstruct)and DF(Deblocking Filter),are performed in the two RPUs(Reconfigurable Processing Units),which are the major computing engines of REMUS.Owing to the proposed scheme,the decoder introduced here can support AVS JP(Jizhun Profile)1920×1088@39fps streams when exploiting a 200 MHz working frequency.  相似文献   

12.
13.
Hardware/software co-design for particle swarm optimization algorithm   总被引:1,自引:0,他引:1  
This paper presents a hardware/software (HW/SW) co-design approach using SOPC technique and pipeline design method to improve design flexibility and execution performance of particle swarm optimization (PSO) for embedded applications. Based on modular design architecture, a Particle Updating Accelerator module via hardware implementation for updating velocity and position of particles and a Fitness Evaluation module implemented either on a soft-cored processor or Field Programmable Gate Array (FPGA) for evaluating the objective functions are respectively designed to work closely together to carry out the evolution process at different design stages. Thanks to the design flexibility, the proposed approach can tackle various optimization problems of embedded applications without the need for hardware redesign. To further improve the execution performance of the PSO, a hardware random number generator (RNG) is also designed in this paper in addition to a particle re-initialization scheme to promote exploration search during the optimization process. Experimental results have demonstrated that the proposed HW/SW co-design approach for PSO algorithms has good efficiency for obtaining high-quality solutions for embedded applications.  相似文献   

14.
15.
视频解码芯片的结构因硬件强大的处理能力和软件灵活的可编程功能从硬件转向软硬件分区结构。该文针对AVS标准的算法和解码实现复杂程度,根据软硬件协同设计思想提出了一种结构划分合理的AVS高清视频解码器软硬件分区结构。根据AVS算法的特点该结构将宏块层以上部分的元素解析划归到软件解码中,将宏块层解码划为硬件处理。经验证,该结构设计可实现AVS高清码流解码,并在C语言编写的硬件平台仿真程序中得以实现。  相似文献   

16.
本文针对ARM946-S运用软硬件协同设计方法设计了一款低成本的MPEG-4解码系统芯片(SoC)。为了缩短验证时间和提高验证充分性,本文采用了基于C参考模型的验证方法。仿真结果表明芯片性能提升明显,针对MPEG-4 Simple Profile L3 Level最坏情况需130MHz就能实时解码。  相似文献   

17.
In heterogeneous system design, partitioning of the functional specifications into hardware (HW) and software (SW) components is an important procedure. Often, an HW platform is chosen, and the SW is mapped onto the existing partial solution, or the actual partitioning is performed in an ad hoc manner. The partitioning approach presented is novel in that it uses Bayesian belief networks (BBNs) to categorize functional components into HW and SW classifications. The BBNpsilas ability to propagate evidence permits the effects of a classification decision that is made about one function to be felt throughout the entire network. In addition, because BBNs have a belief of hypotheses as their core, a quantitative measurement as to the correctness of a partitioning decision is achieved. A methodology for automatically generating the qualitative structural portion of BBN and the quantitative link matrices is given. A case study of a programmable thermostat is developed to illustrate the BBN approach. The outcomes of the partitioning process are discussed and placed in a larger design context, which is called model-based codesign.  相似文献   

18.
Latest advancements in capture and display technologies demand better compression techniques for the storage and transmission of still images and video. High efficiency video coding (HEVC) is the latest video compression standard developed by the joint collaborative team on video coding (JCTVC) with this objective. Although the main design goal of HEVC is the compression of high resolution video, its performance in still image compression is at par with state-of-the-art still image compression standards. This work explores the possibility of incorporating the efficient intra prediction techniques employed in HEVC into the compression of high resolution still images. In the lossless coding mode of HEVC, sample- based angular intra prediction (SAP) methods have shown better prediction accuracy compared to the conventional block-based prediction (BP). In this paper, we propose an improved sample-based angular intra prediction (ISAP), which enhances the accuracy of the highly crucial intra prediction within HEVC. The experimental results show that ISAP in lossless compression of still images outclasses archival tools, state-of-the-art image compression standards and other HEVC-based lossless image compression codecs.  相似文献   

19.
夏新军  文宏  陈吉华 《计算机工程》2004,30(18):176-178
在SoC系统软硬件协同没计过程中,采用WISHBONE总线协议标准来构造虚部件级SoC系统,将经过软硬件划分后的软件和硬件在虚部件级进行协同仿真,再进行实部件级的综合。提出了一种基于ARMSim仿真内核的虚部件级微处理器(虚拟微处理器)的模型构造方法,可以简化SoC系统的设计。  相似文献   

20.
This paper describes the development of efficient hardware/software (HW/SW) neuro-fuzzy systems. The model used in this work consists of an adaptive neuro-fuzzy inference system modified for efficient HW/SW implementation. The design of two different on-chip approaches are presented: a high-performance parallel architecture for offline training and a pipelined architecture suitable for online parameter adaptation. Details of important aspects concerning the design of HW/SW solutions are given. The proposed architectures have been implemented using a system-on-a-programmable-chip. The device contains an embedded-processor core and a large field programmable gate array (FPGA). The processor provides flexibility and high precision to implement the learning algorithms, while the FPGA allows the development of high-speed inference architectures for real-time embedded applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号