共查询到20条相似文献,搜索用时 961 毫秒
1.
In this paper, we present a fine-grained parallel implementation of the MPEG-2 video encoder an the Intel Paragon XP/S parallel computer. We use a data-parallel approach and exploit parallelism within each frame, unlike some of the previous approaches that employ multiple processing of several disjoint video sequences. This makes our encoder suitable for real-time applications where the complete video sequence may not be present on the disk and may become available on a frame-by-frame basis with time. The Express parallel programming environment is employed as the underlying message-passing system making our encoder portable across a wide range of parallel and distributed architectures. The encoder also provides control over various parameters such as the number of processors in each dimension, the size of the motion search window, buffer management, and bitrate. Moreover, it has the flexibility to allow the inclusion of fast and new algorithms for different stages of the codec into the program, replacing current algorithms. Comparisons of execution times, speedups, and frame encoding rates using different numbers of processors are provided. An analysis of frame data distribution among multiple processors is also presented. In addition, our study reveals the degrees of parallelism and bottlenecks in the various computational modules of the MPEG-2 algorithm. We have used two motion estimation techniques and five different video sequences for our experiments. Using maximum parallelism by dividing one block per processor, an encoding rate higher than 30 frames/s has been achieved. 相似文献
2.
针对在均匀条带划分的HEVC并行视频编码器中出现的负载失衡问题,提出了一种基于多条带HEVC并行编码器的负载均衡算法。从编码参数入手,通过分析量化参数、参考帧数目和图像组等因素与编码耗时之间的关系,提出了一种基于编码参数的编码时间预测模型。以位置上和时间层上相邻已编码帧的编码信息为基础,以实际编码参数为依据,根据编码时间预测模型进行当前帧编码时间的预测,从而以当前帧的预测时间为依据,进行多条带HEVC并行编码器的负载均衡操作。实验结果表明,与现有均匀条带划分方法相比,提出的方法能够提升加速比9.23%左右,而编码的性能损失几乎可以忽略不计。 相似文献
3.
根据H,264这一新的视频压缩标准的特点,利用Intel的超线程技术以及OpenMp,可以使软件编码器进行线程级的并行运算,从而大大提高编码速度。该文对超线程技术、OpenMp以及编码过程中不同级别的并行运算进行了分析。证明了利用OpenMP对H.264编码器进行并行处理优化,再加上具有超线程技术的处理器的支持,必将会大大提高编码器的性能。 相似文献
4.
5.
MPEG-4 currently being finalized by the Moving Pictures Experts Group of the ISO is a multimedia standard, MPEG-4 aims to support content-based coding of audio, text, image, and video (synthetic and natural) data, multiplexing of coded data, as well as composition and representation of audiovisual scenes. One of the most critical components of an MPEG-4 environment is the system encoder. An MPEG-4 scene may contain several audio and video objects, images, and text, each of which must be encoded individually and then multiplexed to form the system bitstream. Due to its flexible features, object-based nature, and provision for user interaction, MPEG-4 encoder is highly suitable for a software-based implementation. A full-scale software-based MPEG-J system encoder with real-time encoding speed is a nontrivial task and requires massive computation. We have built such an encoder using a cluster of workstations collectively working as a virtual parallel machine. Parallel processing of MPEG-4 encoder needs to be carried out carefully as objects may appear or disappear dynamically in a scene. In addition, objects may be synchronized with each other. User interactions may also prohibit a straightforward parallelization. We propose a modeling methodology that captures the spatio-temporal relationship between various objects and user interaction. We then propose a number of scheduling algorithms that periodically allocate MPEG-4 objects to multiple workstations ensuring load balancing and synchronization requirements among multiple objects. Each scheduling algorithm has its own performance and complexity characteristics. The experimental results, while showing real-time encoding rates, exhibit tradeoffs between load balancing, scheduling overhead cost, and global performance 相似文献
6.
Dongsan Jun Sung-Chang Lim Jinho Lee Hahyun Lee Jongho Kim Jungwon Kang Jinwook Seok Younhee Kim Soon-heung Jung Hui-Yong Kim Jin Soo Choi 《The Journal of supercomputing》2017,73(3):940-960
High efficiency video coding (HEVC) is the newest video coding standard that can support powerful video compression performance with increased picture resolution for ultrahigh definition (UHD). Compared to the previous standard, HEVC achieved a coding efficiency double with a tremendous increase in encoder computational complexity, making support of commercial applications for UHD video service difficult. Especially, optimized HEVC encoder for UHD is expected to be deployed as a key technology for an emerging smart surveillance system in Internet of Things environment. Single-instruction-multiple-data implementation on an Intel x86 processor and several fast encoding schemes were investigated for the complexity reduction of the HEVC reference model (HM) encoder. Fast encoding schemes included early termination processes and data-level parallel processing. The computational complexity of the proposed HEVC encoder was decreased by approximately 192 times compared with HM encoder with an acceptable coding loss. 相似文献
7.
8.
基于同构多核处理器的H.264多粒度并行编码器 总被引:2,自引:0,他引:2
H.264码率低和视频质量高的优越性能以增加编码计算的复杂度为代价,如何开发适用于多核处理器平台的并行编码算法是提高其编码速度的重要研究内容,对于满足高清视频实时传输和大规模共享具有十分重要的意义.利用H.264开源编码器项目X264,在片级和数据级并行编码算法的基础上,通过分析图像帧之间的参考关系,提出并实现了B帧个数可变的帧级并行算法;根据宏块之间的参考关系,设计了一种类似流水线的宏块级并行方法;基于Intel同构多核平台,提出融合帧级、片级、宏块级和数据级4种不同粒度的并行编码方案,开发了H.264多粒度并行编码器.实验结果表明,在码率增加不大的情况下,H.264多粒度并行编码器可以很好地提升编码加速比,视频编码质量符合高质量的要求. 相似文献
9.
10.
11.
介绍了一种基于多slice的并行AVS 实时编码器的算法研究与实现, 基于slice的并行编码的优点是处理速度快、延迟小和数据处理简单,缺点是条纹效应。介绍了一种对基于slice的并行编码带来的条纹效应缺点进行改进的方法。实验结果表明,实现标清AVS 实时编码器是可行的。 相似文献
12.
13.
提出了一种可配置的整数变换运算单元并将其用于H.264/AVC HiProfile视频编码器的自适应变换模块中。通过变换类型信号的配置,该变换单元可以完成相应的变换操作。本设计采用Altera公司的CycloneⅡ系列FPGA进行实现和验证,布局布线后的最大工作频率为63 MHz,采用4个可配置变换单元的变换模块,可以满足HD1080P@50帧/s视频的实时编码要求。 相似文献
14.
Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT. 相似文献
15.
Theoretical aspects of encoding cyclic redundant codes (CRCs) are reviewed. A method of designing hardware parallel encoders for CRCs that is based on digital system theory and z -transforms is presented. It allows designers to derive the logic equations of the parallel encoder circuit for any generator polynomial. A few interesting application areas for hardware parallel encoders are pointed out 相似文献
16.
H.264视频编码标准采用了很多新技术,具有更优越的编码效率,同时也增加了计算复杂度,无法满足实时应用。由于单指令多数据扩展指令集2(SSE2)的并行运算能力可以提高计算机对多媒体数据的实时处理。文中主要采用了SSE2对H.264中的一些耗时较多的关键模块,例如整数像素运动估计中计算SAD、整数DCT变换、量化、Hadamard变换以及亚像素运动估计中计算SATD进行了指令级优化。实验结果表明,经过优化后,在保持视频图像质量的前提下,相应模块运行速度得到了提高,使H.264编码器整体的编码速度较好地满足实时要求。 相似文献
17.
从单指令多数据并行运算的角度出发,将面向对象的思想引入到SAD值计算的并行操作过程中,给出了一种改进的图像组织优化算法,通过对多个标准测试序列进行运动预测的测试结果知,在当前最复杂的视频编码H.264/AVC上,该算法的实施可以明显地提高编码器的编码速度,为实现窄带中的实时视频通信提供了保障. 相似文献
18.
In this paper, we introduce a Turbo coded modulation scheme, called multilevel turbo coded-continuous phase frequency shift keying (MLTC-CPFSK). The underlying basis of multilevel coding is to partition a signal set into several levels and to encode separately each level through the respective layer of the encoder. In MLTC-CPFSK, to provide phase continuity of the signals, turbo encoder and continuous phase encoder (CPE) are serially concatenated at the last level, while all other levels consist of only a turbo encoder. Therefore, the proposed system contains multiple turbo encoder/decoder blocks in its architecture. The parallel input data sequences are encoded by our multilevel scheme and mapped to CPFSK signals. Then, for the purpose of performance analysis, these modulated signals are passed through AWGN and fading channels. At the receiver side, the input sequence of the first level is estimated by the first turbo decoder block. Subsequently, the other input sequences of other levels are computed using the estimated input bit streams of the respective previous levels. Simulation results are drawn for 4-ary CPFSK two level and 8-ary CPFSK three level turbo codes over AWGN, Rician, and Rayleigh channels for three iterations while frame sizes are chosen as 100 and 1024. It is concluded that satisfactory performance is achieved in MLTC-CPFSK systems for all SNR values in various fading environments. 相似文献
19.
20.
Recent advances in semiconductor technologies make it possible to integrate many processor cores in a small device package. The parallel execution capability of such multi-core processors can be exploited to enhance the performance of many traditional sequential applications. There have been numerous research activities to develop parallelization techniques using the OpenMp programming model, in order to speed up sequential applications such as the H.264/AVC codec, but mostly in the PC environment. Therefore, it is difficult to understand which parallelization technique fits well with the H.264/AVC encoder on an embedded multi-core architecture. In this paper, we present parallelization techniques applicable to the H.264/AVC encoder on ARM MPCore using the OpenMP programming model. Further, we propose an analytical model for the performance estimation of the H.264/AVC encoder, and we then verify the model accuracy by performing simulations using hardware/software co-verification tool. Our experimental results show that the parallelization techniques proposed in this paper for the embedded multi-core platform improve the encoder performance by up to 2.36 times, and that the parallelization technique exploiting data-level parallelism outperforms the one using task-level parallelism by 41%. It is also observed that balancing loads among processor cores is a critical parameter in achieving better scalability in the encoder. 相似文献