期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An optimized parallel order scheme of the deblocking filtering process for enhancing the performance of the HEVC standard using GPUs

Mohamed M. Fouad Richard M. Dansereau 《Multimedia Tools and Applications》2017,76(23):24609-24634

In HEVC, deblocking filtering (DF) is responsible for about 20% of the time consumed to perform video compression. In a typical parallel DF scheme, a set of horizontal and vertical edges are processed using deblocking filters. In conventional parallel DF schemes, deblocking filters could be applied to the same edges more than once. Moreover, some edges are assigned to cores to be filtered even though those edges are not designated to be filtered. Accordingly, the used parallel hardware architecture requires more on-chip memory modules. Those challenges negatively affect HEVC performance resulting in an increase in computational complexity. In this paper, an optimized parallel DF scheme is proposed for HEVC using graphical processing units (GPUs). The proposed scheme outperforms competing ones in terms of reducing the decoding time of all frames of video sequences by average speed-up factors of 2.83 and 2.45 using the all-intra and low-delay video coding configuration modes, respectively. The proposal does not change the rate-distortion between the decoded video sequences and their original sequences. 相似文献

2.

New access modes of parallel memory subsystem for sub-pixel motion estimation

Radomir Jakovljević Aleksandar Berić Edwin van Dalen Dragan Milićev 《Journal of Real-Time Image Processing》2018,15(2):279-296

Accessing pixels in memory is a well-known bottleneck of SIMD (single instruction multiple data) processors in video/imaging. To tackle it, we propose new block and row access modes of parallel on-chip memory subsystem, which enable a higher processing throughput and lower energy consumption than the access modes of the state-of-the-art subsystems. The new access modes significantly reduce the number of on-chip memory accesses, and thereby accelerate one of key video/imaging kernels: sub-pixel block-matching motion estimation. The main idea is to exploit spatial overlaps of blocks/rows accessed for pixel interpolation, which are known at the subsystem design-time, and merge multiple accesses into a single one by accessing somewhat more pixels at a time than with other parallel memories. To avoid the need for a wider, and, therefore, more costly SIMD datapath, we propose new memory read operations that split all pixels accessed at a time into multiple SIMD-wide blocks/rows, in a convenient way for further processing. As a proof of concept, we describe a parametric, scalable, and cost-efficient architecture that supports the new access modes. The architecture is based on a previously proposed set of memory banks with multiple pixels per bank word, and a previously proposed shifted scheme for arranging pixels in the banks. We analytically and experimentally demonstrate advantages of this work on a case study of sub-pixel motion estimation for video frame-rate conversion. The implemented motion estimator processes 2160p video at 60 fps in real time, while clocked at 600 MHz. Compared to the implementations based on the state-of-the-art subsystems, this work enables 40–70 % higher throughput, consumes 17–44 % less energy and has similar silicon area and off-chip memory bandwidth costs. That is 1.8–2.9 times more efficient than the prior art, considering the throughput and all costs, i.e., consumption, area, and off-chip bandwidth. Such a higher efficiency is the result of the new access modes, which reduced the number of on-chip memory accesses by 1.6–2.1 times, and the cost-efficient architecture. 相似文献

3.

超标量DSP的片上调试与实时跟踪支持*

王刚张盛兵黄嵩人《计算机应用研究》2012,29(1):207-210

针对嵌入式系统日益严峻的调试挑战,提出并实现了一种基于32 bit超标量DSP内核的片上调试与实时跟踪架构。该架构通过设计专用的跟踪接口与其他硬件资源,并扩展JTAG端口、存储器保护逻辑与流水线控制逻辑,以较低的硬件开销实现对内核的实时运行控制、内部寄存器与存储器的非侵入访问、带复杂触发条件的断点与观察点设置、硬件单步以及程序流的实时跟踪等典型特征的支持,可满足绝大部分嵌入式系统的开发与调试需求。相似文献

4.

Buffer structure optimized VLSI architecture for efficient hierarchical integer pixel motion estimation implementation

Haibing Yin Dong Sun Park Xiao Yun Zhang 《Journal of Real-Time Image Processing》2016,11(3):507-525

Integer pixel motion estimation (IME) is one crucial module with high complexity in high-definition video encoder. Efficient algorithm and architecture joint design is supposed to tradeoff multiple target parameters including throughput capacity, logic gate, on-chip SRAM size, memory bandwidth, and rate distortion performance. Data organization and on-chip buffer structure are crucial factors for IME architecture design, accounting for multiple target performance tradeoff. In this work, we combine global hierarchical search and local full search to propose hardware efficient IME algorithm, and then propose hardware VLSI architecture with optimized on-chip buffer structure. The major contribution of this work is characterized by: (1) improved hierarchical IME algorithm with presearch and deliberate data organization, (2) multistage on-chip reference pixel buffer structure with high data reuse between integer and fraction pixel motion estimations, (3) highly reused and reconfigurable processing element structure. The optimized data organization and buffer structure achieves nearly 70 % buffer saving with less than average 0.08, 0.12 dB the worst case, PSNR degradation compared with full search based architecture. At the hardware cost of 336 and 382 K logic gate and 20 kB SRAM, the proposed architecture achieves the throughput of 384 and 272 cycles per macroblock, at system frequency of 95 and 264 MHz for 1080p and QFHD @30fps format video coding. 相似文献

5.

Three-level pipelined multi-resolution integer motion estimation engine with optimized reference data sharing search for AVS

Xiaofeng Huang Kaijin Wei Haibing Yin Chuang Zhu Huizhu Jia Don Xie 《Journal of Real-Time Image Processing》2018,15(1):43-55

Integer motion estimation (IME), which acts as a key component in video encoder, is to remove temporal redundancies by searching the best integer motion vectors for dynamic partition blocks in a macro-block (MB). Huge memory bandwidth requirements and unbearable computational resource demanding are two key bottlenecks in IME engine design, especially for large search window (SW) cases. In this paper, a three-level pipelined VLSI architecture design is proposed, where efficiently integrates the reference data sharing search (RDSS) into multi-resolution motion estimation algorithm (MMEA). First, a hardware-friendly MMEA algorithm is mapped into three-level pipelined architecture with neglected coding quality loss. Second, sub-sampled RDSS coupled with Level C?+?are adopted to reduce on-chip memory and bandwidth at the coarsest and middle level. Data sharing between IME and fractional motion estimation (FME) is achieved by loading only a local predictive SW at the finest level. Finally, the three levels are parallelized and pipelined to guarantee the gradual refinement of MMEA and the hardware utilization. Experimental results show that the proposed architecture can reach a good balance among complexity, on-chip memory, bandwidth, and the data flow regularity. Only 320 processing elements (PE) within 550 cycles are required for IME search, where the SW is set to 256?×?256. Our architecture can achieve 1080P@30 fps real-time processing at the working frequency of 134.6 MHz, with 135 K gates and 8.93 KB on-chip memory. 相似文献

6.

A new DCT audio watermarking scheme based on preliminary MP3 study

Maha Charfeddine Maher El’arbi Chokri Ben Amar 《Multimedia Tools and Applications》2014,70(3):1521-1557

In this paper, a new audio watermarking scheme operating in the frequency domain and based on neural network architecture is described. The watermark is hidden into the middle frequency band after performing a Discrete Cosine transform (DCT). Embedding and extraction of the watermark are based on the use of a back-propagation neural network (BPNN) architecture. In addition, the selection of frequencies and the block hiding the watermark are based on a preliminary study of the effect of MP3 compression at several rates on the signal. Experimental results show that the proposed technique presents good robustness and perceptual quality results. We also investigate the application of the proposed technique in video watermarking. Traditional techniques have used audio channel as supplementary embedding space and adopt state-of-the art techniques that resist to MP3 compression attack. In these techniques, the MPEG compression attack is only evaluated on the video part and the audio part is kept unaffected. In this paper, we adapt the preliminary MP3 study to video watermarking technique but with a preliminary study of the MPEG compression applied to the audio channel. Here again, we notice that the application of the preliminary MPEG study to the audio channel improves the robustness of the video watermarking scheme though keeping high-quality watermarked video sequences. 相似文献

7.

一种数字电视广播系统中的版权保护方案 总被引：4，自引：1，他引：4

范科峰莫玮赵新华《网络安全技术与应用》2005,87(6):58-60

本文研究了数字电视广播中的版权保护方案,提出从视频接口和节目内容传输两个环节中进行版权的保护。针对视频接口提出一种安全认证加密机制。在接收或录制设备端,提供授权的解密方法,实现加密数据流的播放和录制。针对内容传输提出应用基于频域的主动水印算法进行广播监视,讨论了数字电视网络系统中基于水印的访问控制模型,以及数字水印的嵌入与提取原理。相似文献

8.

基于FPGA的实时视频处理平台设计

林辉吴黎明潘启军《计算机测量与控制》2012,20(1):196-198,201

为了能够实时地采集、处理、显示视频,设计并实现了一种基于双PowerPC硬核架构的实时视频处理平台;用硬件实现视频的预处理算法,并以用户IP核的形式添加到硬件系统中,上层的视频处理软件程序则直接从存储器中调用预处理后的图像数据;重点介绍了在FPGA上构建双PowerPC硬核架构的硬件系统;采用乒乓控制算法缓存一行图像数据;用DMA的方式将图像数据保存在存储器中;以边缘检测作为视频预处理算法的一个实例,在平台上实现,实验结果表明,用本平台实现仅需40ms;本平台能够实时处理视频,具有较高的实用价值。相似文献

9.

A fast and scalable architecture to run convolutional neural networks in low density FPGAs

《Microprocessors and Microsystems》2020

Deep learning and, in particular, convolutional neural networks (CNN) achieve very good results on several computer vision applications like security and surveillance, where image and video analysis are required. These networks are quite demanding in terms of computation and memory and therefore are usually implemented in high-performance computing platforms or devices. Running CNNs in embedded platforms or devices with low computational and memory resources requires a careful optimization of system architectures and algorithms to obtain very efficient designs. In this context, Field Programmable Gate Arrays (FPGA) can achieve this efficiency since the programmable hardware fabric can be tailored for each specific network. In this paper, a very efficient configurable architecture for CNN inference targeting any density FPGAs is described. The architecture considers fixed-point arithmetic and image batch to reduce computational, memory and memory bandwidth requirements without compromising network accuracy. The developed architecture supports the execution of large CNNs in any FPGA devices including those with small on-chip memory size and logic resources. With the proposed architecture, it is possible to infer an image in AlexNet in 4.3 ms in a ZYNQ7020 and 1.2 ms in a ZYNQ7045. 相似文献

10.

Object based watermarking for H.264/AVC video resistant to rst attacks

Sibaji Gaj Ashish Singh Patel Arijit Sur 《Multimedia Tools and Applications》2016,75(6):3053-3080

In this paper, a compressed domain video watermarking scheme is proposed which embeds the watermark in the homogeneous moving object within a shot of video sequence to resist geometric attacks such as rotation, scaling etc. Intuitively, object based watermarking results low payload and has the least impact on visual quality since the object area is generally small and highly textured. The proposed work has two main contributions, firstly, an existing compressed domain motion coherent block detection algorithm [7] is extended to detect the moving objects within a video shot and secondly, a watermarking scheme has been proposed by embedding within the moving objects to resist RST attacks. A comprehensive set of experiments has been carried out to justify the applicability of the proposed scheme over the existing literature. 相似文献

11.

Baring it all to software: Raw machines 总被引：2，自引：0，他引：2

Waingold E. Taylor M. Srikrishna D. Sarkar V. Lee W. Lee V. Kim J. Frank M. Finch P. Barua R. Babb J. Amarasinghe S. Agarwal A. 《Computer》1997,30(9):86-93

The most radical of the architectures that appear in this issue are Raw processors-highly parallel architectures with hundreds of very simple processors coupled to a small portion of the on-chip memory. Each processor, or tile, also contains a small bank of configurable logic, allowing synthesis of complex operations directly in configurable hardware. Unlike the others, this architecture does not use a traditional instruction set architecture. Instead, programs are compiled directly onto the Raw hardware, with all units told explicitly what to do by the compiler. The compiler even schedules most of the intertile communication. The real limitation to this architecture is the efficacy of the compiler. The authors demonstrate impressive speedups for simple algorithms that lend themselves well to this architectural model, but whether this architecture will be effective for future workloads is an open question 相似文献

12.

位域实时视频水印算法 总被引：1，自引：0，他引：1

邹复好卢正鼎凌贺飞《小型微型计算机系统》2005,26(11):2005-2008

在目前数字水印算法中，图像水印的算法远多于视频水印，然而在现实生活中，视频产品保护更加重要．视频水印除了满足图像水印的一般特性之外，还必须满足实时特性．因此视频水印的嵌入和提取算法必须有较低的计算复杂性．本文提出一种用于视频产品完整性认证的位域实时视频水印的算法．实验表明，该算法有较高的视觉质量，同时能对修改进行准确定位．满足完整性认证水印要求．相似文献

13.

Aging Bloom Filter with Two Active Buffers for Dynamic Sets

Yoon MyungKeun 《Knowledge and Data Engineering, IEEE Transactions on》2010,22(1):134-138

A Bloom filter is a simple but powerful data structure that can check membership to a static set. As Bloom filters become more popular for network applications, a membership query for a dynamic set is also required. Some network applications require high-speed processing of packets. For this purpose, Bloom filters should reside in a fast and small memory, SRAM. In this case, due to the limited memory size, stale data in the Bloom filter should be deleted to make space for new data. Namely the Bloom filter needs aging like LRU caching. In this paper, we propose a new aging scheme for Bloom filters. The proposed scheme utilizes the memory space more efficiently than double buffering, the current state of the art. We prove theoretically that the proposed scheme outperforms double buffering. We also perform experiments on real Internet traces to verify the effectiveness of the proposed scheme. 相似文献

14.

Real-time video watermarking system on the compressed domain for high-definition video contents: Practical issues

Min-Jeong LeeAuthor VitaeDong-Hyuck ImAuthor Vitae Hae-Yeoun LeeAuthor VitaeKyung-Su KimAuthor Vitae Heung-Kyu LeeAuthor Vitae 《Digital Signal Processing》2012,22(1):190-198

Everyday, we encounter high-quality multimedia contents from HDTV broadcasting, DVD, and high-speed Internet services. These contents are, unhappily, processed and distributed without protection. This paper proposes a practical video watermarking technique on the compressed domain that is real-time and robust against video processing attacks. In particular, we focus on video processing that is commonly used in practice such as downscaling resolution, framerate changing, and transcoding. Most previous watermarking algorithms are unable to survive when these processings are strong or composite. We extract low frequency coefficients of frames in fast by partly decoding videos and apply a quantization index modulation scheme to embed and detect the watermark. On an Intel architecture computer, we implement a prototype system and measure performance against video processing attacks frequently occur in the real world. Simulation results show that our video watermarking system satisfies real-time requirements and is robust to protect the copyright of HD video contents. 相似文献

15.

Modular Neural Tile Architecture for Compact Embedded Hardware Spiking Neural Network

Sandeep Pande Fearghal Morgan Seamus Cawley Tom Bruintjes Gerard Smit Brian McGinley Snaider Carrillo Jim Harkin Liam McDaid 《Neural Processing Letters》2013,38(2):131-153

Biologically-inspired packet switched network on chip (NoC) based hardware spiking neural network (SNN) architectures have been proposed as an embedded computing platform for classification, estimation and control applications. Storage of large synaptic connectivity (SNN topology) information in SNNs require large distributed on-chip memory, which poses serious challenges for compact hardware implementation of such architectures. Based on the structured neural organisation observed in human brain, a modular neural networks (MNN) design strategy partitions complex application tasks into smaller subtasks executing on distinct neural network modules, and integrates intermediate outputs in higher level functions. This paper proposes a hardware modular neural tile (MNT) architecture that reduces the SNN topology memory requirement of NoC-based hardware SNNs by using a combination of fixed and configurable synaptic connections. The proposed MNT contains a 16:16 fully-connected feed-forward SNN structure and integrates in a mesh topology NoC communication infrastructure. The SNN topology memory requirement is 50 % of the monolithic NoC-based hardware SNN implementation. The paper also presents a lookup table based SNN topology memory allocation technique, which further increases the memory utilisation efficiency. Overall the area requirement of the architecture is reduced by an average of 66 % for practical SNN application topologies. The paper presents micro-architecture details of the proposed MNT and digital neuron circuit. The proposed architecture has been validated on a Xilinx Virtex-6 FPGA and synthesised using 65 nm low-power CMOS technology. The evolvable capability of the proposed MNT and its suitability for executing subtasks within a MNN execution architecture is demonstrated by successfully evolving benchmark SNN application tasks representing classification and non-linear control functions. The paper addresses hardware modular SNN design and implementation challenges and contributes to the development of a compact hardware modular SNN architecture suitable for embedded applications 相似文献

16.

Design and implementation of motion compensator in memory reduced HDTV decoder with embedded compression engine

Hongli Gao Fei Qiao Huazhong Yang 《Multimedia Tools and Applications》2012,56(3):597-614

In this paper, a low-cost compatible motion compensator is implemented and integrated into a macroblock-level three-stage-pipelined HDTV decoder, in which an embedded compression (EC) engine is realized as well. The decoder with EC engine is designed to reduce the power consumption and memory bandwidth requirement since memory accesses are reduced. In the motion compensator, a boundary judgment scheme for reference pixel fetching is proposed to provide seamless integration in HDTV video decoder for the block-based EC engines. Furthermore, a buffer sharing mechanism is adopted to reduce extra memory requirement involved by EC. The reference pixel fetching unit costs only 17.3 K logic gates when the working frequency is set to 166.7 MHz. On average, when decoding HD1080 video sequence, 30% memory access reduction and 24% memory power consumption saving are achieved when a near lossless EC algorithm is integrated in the video decoder. In other words, the proposed motion compensator makes the EC engine an integral part of a memory reduced decoder without extra cost. Additionally, since the work in this paper is based on EC schemes, the EC design criterion are discussed, and several useful rules on the selection of EC algorithm are addressed for the video decoder of corresponding VLSI architecture. 相似文献

17.

Pipelined architecture for real-time cost-optimized extraction of visual primitives based on FPGAs

F. Barranco M. Tomasi J. Díaz M. Vanegas E. Ros 《Digital Signal Processing》2013,23(2):675-688

This paper presents an architecture for the extraction of visual primitives on chip: energy, orientation, disparity, and optical flow. This cost-optimized architecture processes in real time high-resolution images for real-life applications. In fact, we present a versatile architecture that may be customized for different performance requirements depending on the target application. In this case, dedicated hardware and its potential on-chip implementation on FPGA devices become an efficient solution. We have developed a multi-scale approach for the computation of the gradient-based primitives. Gradient-based methods are very popular in the literature because they provide a very competitive accuracy vs. efficiency trade-off. The hardware implementation of the system is performed using superscalar fine-grain pipelines to exploit the maximum degree of parallelism provided by the FPGA. The system reaches 350 and 270 VGA frames per second (fps) for the disparity and optical flow computations respectively in their mono-scale version and up to 32 fps for the multi-scale scheme extracting all the described features in parallel. In this work we also analyze the performance in accuracy and hardware resources of the proposed implementation. 相似文献

18.

基于非均匀离散余弦变换的MPEG 4视频内容保护*

赵险峰李宁邓艺夏冰冰《计算机应用研究》2008,25(8):2469-2473

由于视频数据量大、处理上实时性要求高,加密方法的效率通常是视频内容安全的关键。在MPEG4框架下,提出了一种新的视频保护方法。该方法利用非均匀离散余弦变换(NDCT)取代视频编解码中的常规离散余弦变换(DCT),对MPEG4视频数据在频域上进行加扰保护和解扰,并将控制离散余弦变换非均匀性的参数作为密钥使用。由于不存在专门的密码操作模块,整个方法的时间和空间开销与正常的编解码相当,且从保护效果和安全性方面满足了大量应用的要求。相似文献

19.

基于TMS320DM642的嵌入式网络视频服务器的实现 总被引：2，自引：2，他引：0

鹿宝生陈启美丁胜军《计算机工程与设计》2006,27(13):2362-2364

针对构建高可靠性的多媒体数字视频监控系统,设计了运行于DM642上的基于H．264编码标准的嵌入式网络视频服务器。介绍了DM642定点DSP芯片的结构特点,详细阐述了视频服务器的硬件结构、软件设计以及视频数据网络传输与处理等主要技术。同时从网络带宽和实时编码的角度出发,结合DM642的结构和专用操作指令,提出了核心编码算法和软件程序的优化方案。实验结果表明,该方案对视频图像的压缩编码效果良好,并且满足监控系统视频图像的实时性需求。相似文献

20.

基于压缩感知的视频双水印算法研究 总被引：1，自引：0，他引：1

周燕曾凡智赵慧民《计算机科学》2016,43(5):132-139

针对数字视频的内容保护与帧内、帧间篡改检测的难题,采用压缩感知理论提取视频的内容特征作为水印,提出一种双水印的视频保护和篡改检测算法。首先,利用压缩感知过程提取I帧宏块的内容特征,生成半脆弱的内容认证水印;然后,对帧序号进行二值运算,生成完整性水印;最后,利用压缩感知信号重构OMP(Orthogonal Matching Pursuit)算法把生成的双水印嵌入到I帧和P帧相应宏块的DCT高频系数的压缩测量值中,以此提高视频水印的抗攻击能力,并实现对视频篡改的检测。仿真实验表明,所提算法对视频帧内篡改具有精确定位到子块的检测能力;同时对帧插入、帧删除、帧交换等类型的视频帧间篡改具有很强的检测能力。相似文献