期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

利用ARM NEON OpenMax DL技术优化面向Cortex-A8的H.264解码器

Tero Rintaluoma 《电子元器件资讯》2010,(2)

本文将展示如何通过ARM NEON技术提高和优化基于软件的H.264视频解码器的性能.对RealView中的ARMProfiler以及真实硬件进行了数种测量,并给出了H.264和MPEG-4解码器及MPEG-4编码器的对应数据.与编译至Cortex-A8处理器架构的原始ARM优化C代码相比,Profiler上H.264解码器的总体性能提高了54%. 相似文献

2.

MPEG-4 AAC解码器在NIOS II平台上的实现和优化

潘岳王建中戎蒙恬刘文江《电声技术》2006,(11)

给出了MEPG-4AAC实时解码器在NIOSII平台的实现方案,介绍了MPEG-4AAC-LC解码算法及各关键模块优化算法。在完成实时解码要求下,结合NIOSII平台特性,对解码器在软件代码与处理器上进行优化。实验结果表明CPU时钟为80MHz时能达到实时解码要求。相似文献

3.

基于OMAP平台的H.264解码器实现 总被引：1，自引：0，他引：1

张智玮《电子元器件应用》2008,10(9):62-65

给出了一种在OMAP5910平台上进行H.264解码器设计的实现方案。由于OMAP5910是双核处理器,本方案遵循它的编程模式,并结合具体结构进行了优化,最终通过ARM端客户程序负责控制DSP进行解码,并采用DSP端应用程序进行具体的解码处理,同时利用该解码器对图像进行了测试。实验结果表明,该解码器可以符合手持设备的应用需求。相似文献

4.

H.265解码器去块滤波并行化设计与性能优化

周建政刘华平《电视技术》2015,39(14):13-16

H.265继续沿用H.264编码架构,去方块滤波器也是H.265视频编码标准的一个重要选项,去除混合编码带来的块效应极大改善了视频的质量,但由于H.265超级宏块的存在,去方块效应滤波相关参数层层嵌入在每个小的处理单元中,这种结构不利于实现基于宏块行间的并行化,同时也很难高效地利用Cortex-A9架构SIMD优化性能.首先详细分析H.265标准去块滤波器的处理过程以及并行处理的困难,进而提出一种便于实现基于宏块行间的并行去块滤波结构,然后进行Cortex-A9汇编优化.基于HM14.0实验,改进去方块效应滤波器计算复杂度从占整个解码器25％降至14％,大大提升了解码器性能,为移动设备上实现H.265大分辨率视频实时播放奠定基础. 相似文献

5.

意法多媒体处理器添新翼，Nomadik突出下一代手机视频功能

《电子产品世界》2005,(8B):18-18

意法半导体（STMicroelectronics，ST）宣布与Mcube Works签署移动解码器软件全球授权协议，ST将在其Nomadik移动多媒体应用处理器中使用Mcube Works公司的H．264高性能移动解码器软件。H．264是数字视频系统为在有限的数据带宽上提高图像质量而广泛使用的视频压缩标准，ST的Nomadik处理器与Mcube Works的H．264解码器组成的移动视频平台，将为运行视频点播、数字多媒体广播和媒体播放器应用的视频手机提供强大动力。相似文献

6.

MPEG-4 AAC解码器在NIOS Ⅱ平台上的实现和优化

潘岳王建中戎蒙恬刘文江《电声技术》2006,(11):46-49

给出了MEPG-4AAC实时解码器在NIOSⅡ平台的实现方案，介绍了MPEG-4AAC—LC解码算法及各关键模块优化算法。在完成实时解码要求下，结合NIOSⅡ平台特性，对解码器在软件代码与处理器上进行优化。实验结果表明CPU时钟为80MHz时能达到实时解码要求。相似文献

7.

基于TMS320DM642的H.264解码器优化

陈梅芳《现代电子技术》2006,29(3):112-115

通过分析H.264软件解码器的结构和复杂度,确定了解码器在优化过程中的重点和难点,并结合TMS320DM642DSP性能特点,详细讨论了在TMS320DM642DSP平台上H.264解码器所采用的优化方法。这些方法主要涉及提高程序代码的并行性和增强存储器访问的效率,重点是运动补偿、IDCT等关键模块的优化。通过实验结果表明,本解码器可以实现CIF格式视频流的实时解码。相似文献

8.

电子百科

《世界电子元器件》2012,(3):42

多媒体信号处理NEON技术ARM NEON技术是适用于ARM Cortex-A系列处理器的一种128位SIMD(Single Instruction,Multiple Data,单指令、多数据)扩展结构。从智能手机和移动计算设备到HDTV,它已被公认为是多媒体应用领域中最为优越的处理器之一。它采用专门设计,简化了软件在不同平台之间的移植,为类似Dolby Mobile的密集型多媒体应用提供相似文献

9.

基于DSP的H.264解码器的优化1

李杰蔡灿辉《信号处理》2005,21(Z1):312-315

该文讨论H.264解码器在TI公司的TMS320C64x系列DSP芯片上的实现方法,给出了在闻亭公司的DAM6416P处理平台上优化C语言代码的基本方法和在DAM6416P处理平台上对H.264解码器的C代码进行优化的具体措施.实验结果表明了该优化方法的合理性. 相似文献

10.

H.264软件解码器的优化 总被引：3，自引：0，他引：3

朱冬冬丁嵘尹亚光戴琼海《电视技术》2003,(12):4-6,9

分析了H．264软件解码器的结构，指出了影响速度的瓶颈，并给出了一种优化方案-从程序结构入手，结合MMX^TM技术，对H264软件解码器进行全面的优化。优化后的解码器在P3／800MHz以上的PC机上能够对于CIF格式的H．264序列进行实时解码。相似文献

11.

Application Specific Processor Design for H.264 Decoder with a Configurable Embedded Processor

Jin Ho Han Mi Young Lee Younghwan Bae Hanjin Cho 《ETRI Journal》2005,27(5):491-496

An application specific processor for an H.264 decoder with a configurable embedded processor is designed in this research. The motion compensation, inverse integer transform, inverse quantization, and entropy decoding algorithm of H.264 decoder software are optimized. We improved the performance of the processor with instruction‐level hardware optimization, which is tailored to configurable embedded processor architecture. The optimized instructions for video processing can be used in other video compression standards such as MPEG 1, 2, and 4. A significant performance improvement is achieved with high flexibility. Experimental results show that we could achieve 300% performance for the H.264 baseline profile level 2 decoder. 相似文献

12.

CELL/B.E.的高性能维特比译码

下载免费PDF全文

Lai Junjie Tang Jun Peng Yingning Chen Jianwen 《中国通信》2009,6(2):150-156

Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio （SR） systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element （SPE） of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform. 相似文献

13.

嵌入式机器视觉系统优化研究

付连锐王兆仲《电子设计工程》2012,20(14):179-182

介绍了基于ARM+DSP架构的嵌入式机器视觉系统的特性,分析了制约嵌入式机器视觉系统性能的因素。从操作系统和应用程序方面,讨论了嵌入式机器视觉系统的优化方案。通过对嵌入式Linux内核和文件系统进行裁剪,对应用程序代码进行大量的优化,并充分利用Cotex-A处理器独有的NEON加速技术,使系统开机启动时间缩短25 s,应用程序运行速度提高2.5倍。相似文献

14.

Design of a transport triggered vector processor for turbo decoding

Shahriar Shahabuddin Janne Janhunen Markku Juntti Amanullah Ghazi Olli Silvén 《Analog Integrated Circuits and Signal Processing》2014,78(3):611-622

In order to meet the requirement of high data rates for next generation wireless systems, efficient implementations of receiver algorithms are essential. On the other hand, faster time-to-market motivates the investigation of programmable implementations. This paper presents a novel design of a programmable turbo decoder as an application-specific instruction-set processor (ASIP) using transport triggered architecture (TTA). The processor architecture is designed in such a manner that it can be programmed with high level language to support different suboptimal maximum a posteriori (MAP) algorithms in a single TTA processor. The design enables the designer to change the algorithms according to the frame error rate performance requirement. A quadratic polynomial permutation interleaver is used for contention-free memory access and to make the processor 3GPP LTE compliant. Several optimization techniques to enable real time processing on programmable platforms are introduced. The essential parts of the turbo decoding algorithm are designed with vector function units. Unlike most other turbo decoder ASIPs, high level language is used to program the processor to meet the time-to-market requirements. With a single iteration, 68.35 Mbps decoding speed is achieved for the max-log-MAP algorithm at a clock frequency of 210 MHz on 90 nm technology. 相似文献

15.

Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia

Paulin P.G. Pilkington C. Langevin M. Bensoudane E. Lyonnard D. Benny O. Lavigueur B. Lo D. Beltrame G. Gagne V. Nicolescu G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(7):667-680

The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components-H/W or S/W- into a homogeneous platform programming environment. This leads to higher quality designs through encapsulation and abstraction. Two high-level parallel programming models are supported by the following MultiFlex platform mapping tools: a distributed system object component (DSOC) object-oriented message passing model and a symmetrical multiprocessing (SMP) model using shared memory. We demonstrate the combined use of the MultiFlex multiprocessor mapping tools, supported by high-speed hardware-assisted messaging, context-switching, and dynamic scheduling using the StepNP demonstrator multiprocessor system-on-chip platform, for two representative applications: 1) an Internet traffic management application running at 2.5 Gb/s and 2) an MPEG4 video encoder (VGA resolution, at 30 frames/s). For these applications, a combination of the DSOC and SMP programming models were used in interoperable fashion. After optimization and mapping, processor utilization rates of 85%-91% were demonstrated for the traffic manager. For the MPEG4 decoder, the average processor utilization was 88%. 相似文献

16.

在OMAP3530平台实现AVS解码器 总被引：1，自引：0，他引：1

展鹏飞张杰飞张刚《电视技术》2014,38(1)

详细讨论了OMAP架构的双核通信机制,分析了Codec Engine,DSPlink,CMEM等核心模块。然后实现了一种基于OMAP3530开发平台的AVS视频解码系统,实现了ARM处理器接收网络数据和显示图像,DSP处理器解码AVS码流。在Linux环境下将AVS视频解码器移植到3530开发板中,在分辨率D1下达到25 f/s(帧/秒)。相似文献

17.

H.264运动补偿解码在Blackfin533硬件平台上的实现

王凯干宗良朱秀昌《电视技术》2006,(10):30-32,35

通过采用C代码优化、DMA技术以及汇编改写等优化措施,在Blackfin533上实现了对H.264解码器中运动补偿解码模块的优化.实验测试结果表明,所采用的优化措施能在很大程度上提高运算效率. 相似文献

18.

基于TMS320DM6467的视频采集系统设计

文武吴勇张杰《电视技术》2011,35(17):39-41

采用TI公司的达芬奇系列数字媒体处理器DM6467为平台,利用TVP5158译码器实现了一种8路实时视频采集系统.完成了接口电路设计,开发了多通道视频端口和基于V4L2的DaVinci视频接口驱动程序,最后通过VPIF接口将采集到的视频帧送入LCD显示设备进行显示. 相似文献

19.

面向异构信号处理平台的负载均衡算法

下载免费PDF全文

沈小龙马金全胡泽明李宇东《电讯技术》2023,63(12):1978-1984

针对当前异构信号处理平台中信号处理应用的调度算法优化目标单一且调度结果中处理器负载不均衡的问题，提出了一种基于蚁群优化算法的负载均衡算法。该算法结合蚁群优化算法的快速搜索能力和组合优化能力，以信号处理应用的调度长度和处理器负载均衡为优化目标，对初始信息素矩阵和蚂蚁的遍历顺序进行改进，提出调度长度启发因子和负载均衡启发因子对处理器选择公式进行改进，利用轮盘赌策略确定信号处理应用各子任务分配的处理器，完成信号处理应用的调度。仿真结果表明，该算法得到调度结果在调度长度和负载均衡方面均有改进，可以充分发挥各处理器性能，提高异构信号处理平台的整体效率。相似文献

20.

基于CCopt引擎的SMIC 40nm低功耗工艺Cortex A9的时钟树实现

王建中《中国集成电路》2012,(9):55-58,64

随着市场智能手机平台和平板电脑对芯片性能和上市时间要求的不断提升,后端工程师面临的设计压力会越来越大。传统的数字实现流程在满足当今SoC设计的功耗、频率与面积要求方面正在达到极限。那如何在很短的时间内迅速实现芯片功耗、频率与面积的提升变的尤为重要。本文基于SMIC 40nm低功耗工艺的ARM Cortex A9物理设计的实际情况,详细阐述了如何使用cadence最新的时钟同步优化技术,又称为CCopt技术来实现统一的时钟树综合和物理优化。根据实现的结果来看,CCopt引擎很好的实现了目标。实现8%的设计频率提升,并实现了时钟树功率与面积降低。Cadence最新的CCopt引擎对实现复杂芯片物理设计、缩短设计周期、提升芯片性能带来了很大的优势。相似文献