首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This paper presents a TriMedia processor extended with three reconfigurable designs for entropy decoding (ED), inverse quantization (IQ), and two-dimensional (2-D) inverse discrete cosine transform (IDCT), and assesses the performance gain that is provided by such extensions when performing MPEG2-compliant pel reconstruction. We first describe an extension of the TriMedia architecture, which consists of a multiple-context field programmable gate array (FPGA)-based reconfigurable functional unit (RFU), a configuration unit managing the reconfiguration of the RFU, and their associated instructions. Then, we address the computation of the ED, IQ, and 2-D IDCT tasks, and propose to provide reconfigurable hardware support for a variable-length decoder that can decode two symbols per call (VLD-2), an inverse quantizer that can dequantize four coefficients per call (IQ-4), and an 1-D IDCT (1-D IDCT). The most important aspects concerning the implementation of the FPGA-mapped VLD-2, IQ-4, and 1-D IDCT units, as well as the organization of the software routines calling these FPGA-mapped computing units are outlined. Experimental results indicate that by configuring each of the VLD-2, IQ-4, and 1-D IDCT units on a different FPGA context, and by activating the contexts as needed, the FPGA-augmented TriMedia can perform MPEG2-compliant pel reconstruction with an average speed-up of 1.4/spl times/ over the standard TriMedia.  相似文献   

2.
This paper describes a block processing unit in a single-chip MPEG-2 MP@ML video encoder LSI. The block processing unit executes algorithms such as a discrete cosine transform (DCT), a quantization, an inverse quantization, and an inverse discrete cosine transform (IDCT). A double-block pipeline scheme has been introduced to execute DCT and IDCT operations on the shared circuits. Using a time-multiplexed DCT/IDCT architecture, we achieve processing performance of 2.0 clk/pel. This architecture has 21% fewer transistors and 30% less power dissipation than a conventional one. The number of transistors of the block processing unit is 240 kTr which measures 7.7% of the total of the chip. By controlling the clock signal supply, power dissipation can be reduced to 43% which is about 400 mW at 3.3 V using a 0.35 m triple-layer metal CMOS cell-base technology at 54 MHz.  相似文献   

3.
根据ITU-T提出的H.264视频编解码标准,对JM算法及TI Blackfin 533 DSP自身特点进行了分析,将标准中编码采用的整数离散余弦变换(DCT)、量化,解码采用的反DCT变换、反量化的JM算法成功移植到DSP上面,同时根据DSP的特点进行软硬件优化,达到了较好效果。  相似文献   

4.
MPEG-4视频编码器象素压缩模块的VLSI结构设计   总被引:1,自引:0,他引:1  
文章设计了一种基于MPEG-4的视频压缩编码器中象素压缩模块的VLSI结构。该设计采用分布算式结构——NEDA作为DCT变换的核心技术;应用基于LUT表结构使量化/反量化模块的设计简洁明了;同时对AC/DC预测模块还应用了新的存储策略,大大降低了FPGA中宝贵的存储空间。在满足处理速度和精度的要求下,利用了较少的晶体管数目和简洁的结构实现了象素压缩模块。  相似文献   

5.
叶林  叶玉堂  成志强  何宇  胡滢滨  刘霖  陈镇龙   《电子器件》2007,30(6):2119-2121
提出了一种全硬件实现高速MPEG-2视频DCT量化模块的新方法,并研究了DCT量化算法及其基于FPGA的实现方法.仿真、验证与测试结果均表明,通过使用FPGA实现DCT量化模块,可以有效地简化软硬件设计的复杂程度,并以全硬件的实现方式大幅提高DCT量化模块的处理速度,其实现成果对于解决广播电视系统对译码芯片的需求有一定价值.  相似文献   

6.
A new linear-array architecture for computation of both the discrete cosine transform (DCT) and the inverse DCT (IDCT) is derived from the heterogeneous dependence graphs representing the factorised coefficient matrices in the matrix formulation of the recursive algorithm. Using the Kronecker product representation of the order-recursive algorithm, it is observed that the kernel operations of the DCT and IDCT can be merged together by proper input/output data reordering. The processor containing only O(log2N) stages is fully pipelineable and easily scaleable to compute longer DCT/IDCTs with transform length N to the power of two. Owing to the systematic matrix formulation and the corresponding efficient architectural design, the new DCT/IDCT processor has the advantages of high-throughput rate and low hardware cost. Furthermore, the power consumption can be reduced significantly by turning off the operation of the arithmetic units whenever possible  相似文献   

7.
The need for low-power embedded systems has become very significant within the microelectronics scenario in the most recent years. A power-driven methodology is mandatory during embedded systems design to meet system-level requirements while fulfilling time-to-market. The aim of this paper is to introduce accurate and efficient power metrics included in a hardware/software (HW/SW) codesign environment to guide the system-level partitioning. Power evaluation metrics have been defined to widely explore the architectural design space at high abstraction level. This is one of the first approaches that considers globally HW and SW contributions to power in a system-level design flow for control dominated embedded systems  相似文献   

8.
A configurable architecture for performing image transform algorithms is presented that provides a better tradeoff between low complexity and algorithm flexibility than either software-programmable processors or dedicated ASIC's. The configurable processor unit requires only 110 K transistors and can execute several image transform algorithms. By emulating the signal flow of the algorithms in hardware, rather than software, complexity is reduced by an order of magnitude compared with current software programmable video signal processors, while providing more flexibility than single function ASIC's. The processor has been fabricated in 1.2-μm CMOS and has been successfully used to execute the discrete cosine transform/inverse discrete cosine transform (DCT/IDCT), subband coding, vector quantization, and two-dimensional filtering algorithms at pixel rates up to 25 MPixels/s  相似文献   

9.
Hardware/software (HW/SW) codesign and reconfigurable computing are commonly used methodologies for digital-systems design. However, no previous work has been carried out in order to define a HW/SW codesign methodology with dynamic scheduling for run-time reconfigurable architectures. In addition, all previous approaches to reconfigurable computing multicontext scheduling are based on static-scheduling techniques. In this paper, we present three main contributions: 1) a novel HW/SW codesign methodology with dynamic scheduling for discrete event systems using dynamically reconfigurable architectures; 2) a new dynamic approach to reconfigurable computing multicontext scheduling; and 3) a HW/SW partitioning algorithm for dynamically reconfigurable architectures. We have developed a whole codesign framework, where we have applied our methodology and algorithms to the case study of software acceleration. An exhaustive study has been carried out, and the obtained results demonstrate the benefits of our approach.  相似文献   

10.
赵滨  黄大庆 《电子设计工程》2011,19(24):126-129
提出了一种新的二维DCT和IDCT的FPGA实现结构,采用行列快速算法将二维算法分解为两个一维算法实现,其中每个一维算法采用并行的流水线结构,每一个时钟处理8个数据,大大提高电路的数据吞吐率和运算速度。通过Modelsim仿真工具对该设计进行仿真,证明该算法的功能的正确性,进行一次8*8的分块二维DCT变换仅仅需要16个时钟,满足图像以及视频实时性的要求。  相似文献   

11.
王永霞  刘博  张刚 《电视技术》2014,38(7):71-74,65
以数字音视频编码技术标准(Audio Video coding Standard,AVS)为依据,通过对该算法全过程的分析,并结合FPGA硬件平台的结构特点,设计了预测、DCT变换和反DCT变换、量化和反量化、熵编码的全过程。利用FPGA开发工具ISE10.1和仿真工具ModelSim SE 6.2b,并通过Xilinx公司xc5vlx110t-1ff1136平台验证,完成了AVS编码标准P帧从设计到实现的全过程。填补了AVS编码器P帧在FPGA上未实现的空白,同时促进了AVS在FPGA上的发展,并对AVS+和AVS 2发展起到关键性作用。  相似文献   

12.
一种新型2-DCT/IDCT结构的设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
傅宇卓  王嘉芳  胡铭曾 《电子学报》2002,30(Z1):2126-2129
本文根据MPEG-2视频编码的特点,设计了仅由一个1-DCT核完成的2-DCT/IDCT结构,该结构的转换矩阵通过SRAM实现,具备双端口的输入输出,数据吞吐率较高,能够有效节省芯片面积.1-DCT核由7个乘法器组成,乘法器可以根据计算速度的快慢灵活设计.为了解决双端口无冲突的存储访问,提出了一个数据排列方案.由于乘法器的乘数之一为常数,我们设计了一种常数修改方案能够有效的降低成法器的硬件开销.该2-DCT/IDCT结构通过了FPGA验证,具有较强的工程实用价值.  相似文献   

13.
实时视频编码的二维DCT/IDCT的实现   总被引:1,自引:0,他引:1  
用FPGA FLEX10130实现了二维离散余弦交换和逆变换(DCT/IDCT),结构设计采用行列分解法,乘法器采用移位求和的方法实现,并且采用流水线结构设计,提高处理核的性能,系统时钟达到33MHz,计算精度满足CCITT标准要求。  相似文献   

14.
A direct method for the computation of 2-D DCT/IDCT on a linear-array architecture is presented. The 2-D DCT/IDCT is first converted into its corresponding I-D DCT/IDCT problem through proper input/output index reordering. Then, a new coefficient matrix factorisation is derived, leading to a cascade of several basic computation blocks. Unlike other previously proposed high-speed 2-D N /spl times/ N DCT/IDCT processors that usually require intermediate transpose memory and have computation complexity O(N/sup 3/), the proposed hardware-efficient architecture with distributed memory structure has computation complexity O(N/sup 2/ log/sub 2/ N) and requires only log/sub 2/ N multipliers. The new pipelinable and scalable 2-D DCT/IDCT processor uses storage elements local to the processing elements and thus does not require any address generation hardware or global memory-to-array routing.  相似文献   

15.
This paper proposes new concepts of the all phase biorthogonal transform (APBT) and the dual biorthogonal basis vectors. In the light of all phase digital filtering theory, three kinds of all phase biorthogonal transforms based on the Walsh transform (WT), the discrete cosine transform (DCT) and the inverse discrete cosine transform (IDCT) are proposed. The matrices of APBT based on WT, DCT and IDCT are deduced, which can be used in image compression instead of the conventional DCT. Compared with DCT-based JPEG (DCT-JPEG) image compression algorithm at the same bit rates, the PSNR and visual quality of the reconstructed images using these transforms are approximate to DCT, outgoing DCT-JPEG at low bit rates especially. But the advantage is that the quantization table is simplified and the transform coefficients can be quantized uniformly. Therefore, the computing time becomes shorter and the hardware implementation easier.  相似文献   

16.
DCT/IDCT/Hadamard变换被广泛应用于多种视频编码标准中,而H.264/MPEG-4AVC作为新一代的视频压缩标准,它具有在相同图像质量下比其他视频压缩标准拥有更高的压缩率的特性[1],因此对于H.264/MPEG-4AVC中的DCT/IDCT/Hadamard变换的研究就有着十分重要的意义。对于H.264/MPEG-4AVC中变换算法进行分析,并且提出一种可用的高效的硬件实现电路结构,此电路结构能够并行计算4输入像素数据。  相似文献   

17.
The Rapid Prototyping of Application-Specific Signal Processors (RASSP) [1–3] program of the US Department of Defense (ARPA and Tri-Services) targets a 4X improvement in the design, prototyping, manufacturing, and support processes (relative to current practice). Based on a current practice study (1993) [4], the prototyping time from system requirements definition to production and deployment, of multiboard signal processors, is between 37 and 73 months. Out of this time, 25–49 months is devoted to detailed hardware/software (HW/SW) design and integration (with 10–24 months devoted to the latter task of integration). With the utilization of a promising top-down hardware-less codesign methodology based on VHDL models of HW/SW components at multiple abstractions, reduction in design time has been shown especially in the area of hardware/software integration [5]. The authors describe a top-down design approach in VHDL starting with the capture of system requirements in an executable form and through successive stages of design refinement, ending with a detailed hardware design. This hardware/software codesign process is based on the RASSP program design methodology called virtual prototyping, wherein VHDL models are used throughout the design process to capture the necessary information to describe the design as it develops through successive refinement and review. Examples are presented to illustrate the information captured at each stage in the process. Links between stages are described to clarify the flow of information from requirements to hardware.  相似文献   

18.
The hardware implementation of the intra prediction described in this paper allows the H.264/AVC encoder to achieve optimal compression efficiency in real-time conditions. The architecture has some features that distinguish it from other solutions described in literature. Firstly, the architecture supports all intra prediction modes defined in High Profile of the H.264/AVC standard for all chroma formats. Secondly, the architecture can generate predictions for several quantization parameters. Thirdly, the hardware cost is reduced as the same resources are used to compute prediction samples for all the modes. Fourthly, the high sample-generation rate enables the encoder to achieve high throughputs. Fifthly, 4?×?4 block reordering and interleaving with other modes minimize the impact of the long-delay reconstruction loop on the encoder throughput. The architecture is verified against the JM.12 reference model and within the real-time FPGA hardware encoder. The synthesis results show that the design can operate at 100 MHz and 200 MHz for FPGA Aria II and 0.13 μm TSMC technology, respectively. These frequencies allow the encoder to support 720p and 1080p video at 30 fps.  相似文献   

19.
A new algorithm to compute the DCT and its inverse   总被引:2,自引:0,他引:2  
A novel algorithm to convert the discrete cosine transform (DCT) to skew-circular convolutions is presented. The motivation for developing such an algorithm is the fact that VLSI implementation of distributed arithmetic is very efficient for computing convolutions. It is also shown that the inverse DCT (IDCT) can be computed using the same building blocks which are used for computing the DCT. A DCT/IDCT processor can be designed to compute either the DCT or the IDCT depending on a 1-b control signal  相似文献   

20.
提出了一种基于FPGA的H.264视频解码的IP核设计方案,对以NIOS II软件处理器为内核的SOPC系统进行了优化。对帧内预测进行了优化。帧内预测模块硬件加速的方法,与无硬件加速的NIOS II软件解码方法相比,缩短了解码耗时。该方法使基于FPGA的H.264视频实时解码和播放成为可能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号