首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 39 毫秒
1.
The TANGRAM VLSI co-processor is intended as a building block for use in system-on-chip (SOC) designs for the versatile MPEG-4 multimedia standard. It is designed to perform the computation intensive final step of MPEG-4 video decoding: compositing of scenes at the display. This includes warping and alpha blending of multiple full-screen video textures in real-time. TANGRAM consists of a RISC control processor and multiple powerful arithmetic units that perform rendering calculations directly in hardware. This hybrid architecture enables adaptation to changes in algorithms or support for different video-formats in software. Communication to a host CPU and video decoding hardware is done via the very common PI-bus on-chip interface. TANGRAM directly interfaces with the ITU-R601/656 digital video output. VHDL implementation and synthesis for a 0.35 standard-cell library provide an estimate of 100 MHz achievable clock frequency (worst-case), 52 mm2 overall area and 1 Watt power dissipation. TANGRAM has sufficient performance for rendering of MPEG-4 Main Profile@Layer3 scenes (ITU-R 601).  相似文献   

2.
多媒体系统芯片(M-SoC)是一种典型的多任务系统芯片.芯片内部众多的数据请求源都要通过总线访问单一的片外存储器,合理调度这些总线请求成为系统设计的关键.本文通过详细分析总线上片内外数据通道的特点和数据流量,给出了一种基于多通道DMA的总线调度策略,并将该策略成功运用于单芯片音视频解码系统芯片的总线设计中.该策略有效地融合了DMA请求和总线总裁问题,普遍适用于片级总线多请求的多媒体系统芯片.  相似文献   

3.
Future mobile and wireless communication networks require flexible modem architectures to support seamless services between different network standards. Hence, a common hardware platform that can support multiple protocols implemented or controlled by software, generally referred to as software defined radio (SDR), is essential. This paper presents a family of dynamically reconfigurable application-specific instruction-set processors (ASIPs) for channel coding in wireless communication systems. As a weakly programmable intellectual property (IP) core, it can implement trellis-based channel decoding in a SDR environment. It features binary convolutional decoding, and turbo decoding for binary as well as duobinary turbo codes for all current and upcoming standards. The ASIP consists of a specialized pipeline with 15 stages and a dedicated communication and memory infrastructure. Logic synthesis revealed a maximum clock frequency of 400 MHz and an area of 0.11 mm$^{2}$ for the processor's logic using a low power 65-nm technology. Memories require another 0.31 mm$^{2}$ . Simulation results for Viterbi and turbo decoding demonstrate maximum throughput of 196 and 34 Mb/s, respectively. The ASIP hence outperforms state-of-the-art decoder architectures targeting software defined radio by at least a factor of three while consuming only 60% or less of the logic area.   相似文献   

4.
A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency.  相似文献   

5.
Iterative decoding of convolutional turbo code (CTC) has a large memory power consumption. To reduce the power consumption of the state metrics cache (SMC), low-power memory-reduced traceback maximum a posteriori algorithm (MAP) decoding is proposed. Instead of storing all state metrics, the traceback MAP decoding reduces the size of the SMC by accessing difference metrics. The proposed traceback computation requires no complicated reversion checker, path selection, and reversion flag cache. For double-binary (DB) MAP decoding, radix-2 $,times,$2 and radix-4 traceback structures are introduced to provide a tradeoff between power consumption and operating frequency. These two traceback structures achieve an around 20% power reduction of the SMC, and around 7% power reduction of the DB MAP decoders. In addition, a high-throughput 12-mode WiMAX CTC decoder applying the proposed radix-2$,times,$2 traceback structure is implemented by using a 0.13-$mu$m CMOS process in a core area of 7.16 mm$^{2}$. Based on postlayout simulation results, the proposed decoder achieves a maximum throughput rate of 115.4 Mbps and an energy efficiency of 0.43 nJ/bit per iteration.   相似文献   

6.
In this paper, a highly efficient inter-interpolation architecture for the H.264/AVC standard is proposed. Since the placement order of frame pixels in the memory is either row-wise or column-wise which may not be suitable for the sample prediction in particular direction, this paper proposes a novel interpolator design which can dynamically configure the data-path for different predicted modes to perform proper computation schedules suitable for the nature input order of reference samples. The proposed design methodology not only can avoid the additional data transposition buffer, but most importantly the data transfer time spent to fetch the reference samples can be overlapped with the data computation time. Furthermore, by decomposing the chroma interpolation into a series of shift and addition operations, both luma and chroma interpolations can be integrated within the same module. In addition to the data-path design, this paper also proposes a new data-reuse buffer design based on a two-dimensional cache architecture to explore the possible data reuse among the inter and intra partitions. This design can be easily integrated with the H.264 interpolator to reduce the enormous demand of memory access. Our experimental results shows that our saving of memory bandwidth can be 23% more than what the best design can achieve by exploring the intra-partition data reuse only. The proposed design methodology has been implemented, and the result shows that the proposed interpolation architecture is the most compact design among the literatures which can perform the real-time H.264 video decoding with resolution up to 1920×1088 high-definition television standard. The proposed interpolator can be applied to the dedicated H.264 hardware codec design for various consumer devices.
Yun-Nan ChangEmail:
  相似文献   

7.
8.
We propose a high-performance hardwired deblocking filter for H.264/AVC decoding. To decode QFHD (3840 $times$ 2160, i.e., four times full HD) ultra high definition video, we minimize number of processing cycles, working frequency and amount of external memory traffic. We propose a novel filtering order and employ a 5-stage pipelined and resource-shared dual-edge filter to generate two filtering results every cycle. Taking advantage of skip modes, our filter takes only 48 cycles to filter a macroblock in the best case and 100 in the worst case. Furthermore, it eliminates most unnecessary off-chip memory traffic with a novel on-chip memory scheme. Our design can support QFHD at 30 fps application by running only at 98 MHz.   相似文献   

9.
AVS1-P2 is the newest video standard of Audio Video coding Standard (AVS) workgroup of China, which provides close performance to H.264/AVC main profile with lower complexity. In this paper, a platform-independent software package with macroblock-based (MB-based) architecture is proposed to facilitate AVS video standard implementation on embedded system. Compared with the frame-based architecture, which is commonly utilized for PC platform oriented video applications, the MB-based decoder performs all of the decoding processes, except the high-level syntax parsing, in a set of MB-based buffers with adequate size for saving the information of the current MB and the neighboring reference MBs to minimize the on-chip memory and to save the time consumed in on-chip/off-chip data transfer. By modifying the data flow and decoding hierarchy, simulating the data transfer between the on-chip memory and the off-chip memory, and modularizing the buffer definition and management for low-level decoding kernels, the MB-based system architecture provides over 80% reduction in on-chip memory compared to the frame-based architecture when decoding 720p sequences. The storage complexity is also analyzed by referencing the performance evaluation of the MB-based decoder. The MB-based decoder implementation provides an efficient reference to facilitate development of AVS applications on embedded system. The complexity analysis provides rough storage complexity requirements for AVS video standard implementation and optimization.  相似文献   

10.
This paper presents the hardware architecture of a co-processor supporting the real time rendering of all 2D natural or synthetic visual objects proposed by the MPEG-4 standard as well as sprite decoding. It enables the composition and the transformation of natural video objects and the texture mapping on triangles, allowing the 2D-mesh decoding. This architecture is able to render scenes that are compliant with MPEG-4 Main Profile, Level3 and Hybrid Visual Profile. The co-processor is designed to be used in a shared memory system architecture. It consists in a dedicated implementation that seeks the best compromise between cost and performances. In a first step, a software model is used to guarantee the visual quality of the rendered scene and to validate the algorithmic choices. Then, the complexity and performances of this novel architecture are evaluated. Finally, a behavioral model validates the architectural choices.  相似文献   

11.
From a system level perspective, this paper presents a 128 × 128 flexible and reconfigurable Focal-Plane Analog Programmable Array Processor, which has been designed as a single chip in a 0.35 m standard digital 1P-5M CMOS technology. The core processing array has been designed to achieve high-speed of operation and large-enough accuracy (7 bits) with low power consumption. The chip includes on-chip program memory to allow for the execution of complex, sequential and/or bifurcation flow image processing algorithms. It also includes the structures and circuits needed to guarantee its embedding into conventional digital hosting systems: external data interchange and control are completely digital. The chip contains close to four million transistors, 90% of them working in analog mode. The chip features up to 330 GOPs (Giga Operations per second), and uses the power supply (180 GOP/Joule) and the silicon area (3.8 GOPS/mm2) efficiently, and is able to maintain VGA processing throughputs of 100 Frames/s with about 10–20 basic image processing tasks on each frame.  相似文献   

12.
Wireless communication standards make use of parallel turbo decoder for higher data rate at the cost of large hardware resources. This paper presents a memory-reduced back-trace technique, which is based on a new method of estimating backward-recursion factors, for the maximum a posteriori probability (MAP) decoding. Mathematical reformulations of branch-metric equations are performed to reduce the memory requirement of branch metrics for each trellis stage. Subsequently, an architecture of MAP decoder and its scheduling based on the proposed back trace as well as branch-metric reformulation are presented in this work. Comparative analysis of bit-error-rate (BER) performances in additive white Gaussian noise channel environment for MAP as well as parallel turbo decoders are carried out. It has shown that a MAP decoder with a code rate of 1/2 and a parallel turbo decoder with a code rate of 1/3 have achieved coding gains of 1.28 dB at a BER of 10\(^{-5}\) and of 0.4 dB at a BER of 10\(^{-4}\), respectively. In order to meet high-data-rate benchmarks of recently deployed wireless communication standards, very large scale integration implementations of parallel turbo decoder with 8–64 MAP decoders have been reported. Thereby, savings of hardware resources by such parallel turbo decoders based on the suggested memory-reduced techniques are accounted in terms of complementary metal oxide semiconductor transistor count. It has shown that the parallel turbo decoder with 32 and 64 MAP decoders has shown hardware savings of 34 and 44 % respectively.  相似文献   

13.
Recent sub- semiconductor technology supports the monolithic integration of multiprocessor systems. High wiring density and short on-chip memory access cycles motivate novel architecture concepts, outperforming conventional parallel systems. An efficient controlling strategy is a key to gain high performance from limited silicon resources. In this paper, a controlling concept for a monolithic Autonomous Single-Instruction/Multiple Data (ASIMD) processor is presented, which combines the high parallelism of an SIMD approach with the flexibility of standard DSP architectures. To demonstrate the performance gains of the concept, a digital video signal processor, the HiPAR-DSP has been implemented. It consists of an array of 4 or 16 datapaths, local memories for each datapath, a shared memory with concurrent data access in shape of a matrix and a central RISC controller. A three stage execution autonomy has been implemented, consisting of conditional instructions, conditional skip of instructions by the data paths and global evaluation of local conditions by the central controller. This allows efficient execution of data dependent medium- and high-level algorithms with very low controlling overhead. A performance of up to two arithmetic gigaoperations per second is achieved for algorithms with irregular data flow or control flow for the 100 MHz clocked processor with 16 data paths.  相似文献   

14.
This paper presents a channel decoder that completes both turbo and Viterbi decodings, which are pervasive in many wireless communication systems, especially those that require very low signal-to-noise ratios. The trellis decoding algorithm merges them with less redundancy. However, the implementation is still challenging due to the power consumption in wearable devices. This research investigates an optimized memory scheme and rescheduled data flow to reduce power consumption and chip area. The memory access is reduced by buffering the input symbols, and the area is reduced by reducing the embedded interleaver memory. A test chip is fabricated in a 1.8 V 0.18-/spl mu/m standard CMOS technology and verified to provide 4.25-Mb/s turbo decoding and 5.26-Mb/s Viterbi decoding. The measured power dissipation is 83 mW, while decoding a 3.1 Mb/s turbo encoded data stream with six iterations for each block. The power consumption in Viterbi decoding is 25.1 mW in the 1-Mb/s data rate. The measurement shows the power dissipation is 83 mW for the turbo decoding with six iterations at 3.1 Mb/s, and 25.1 mW for the Viterbi decoding at 1 Mb/s.  相似文献   

15.
A low-power dual-standard video decoder has been developed for mobile applications. It supports MPEG-2 SP@ML and H.264/AVC BL@L4 video decoding in a single chip and features a scalable architecture to reach area/power efficiency. This chip integrates diverse algorithms of MPEG-2 and H.264/AVC to reduce silicon area. Three low-power techniques are proposed. First, a domain-pipelined scalability (DPS) technique is used to optimize the pipelined structure according to the number of processing cycles. Second, bandwidth scalability is implemented via a line-pixel-lookahead (LPL) scheme to improve the external bandwidth and reduce the internal memory size, leading to 51% of memory power reduction compared to a conventional design. Third, low-power motion compensation and deblocking filter are designed to reduce the operating frequency without degrading system performance. A test chip is fabricated in a 0.18mum one-poly six-metal CMOS technology with an area of 15.21 mm2. For mobile applications, H.264/AVC and MPEG-2 video decoding of quarter-common intermediate format (QCIF) sequences at 15 frames per second are achieved at 1.15 MHz clock frequency with power dissipation of 125 muW and 108 muW, respectively, at 1V supply voltage  相似文献   

16.
以CCSDS(太空数据系统咨询委员会)标准中1/2码率的LDPC码为例,分析了低密度奇偶校验码(LDPC)译码算法的特点,提出了在译码器的FPGA实现中采用乒乓操作的设计方法,优化译码器信道似然比信息存储模块结构,交替接收两帧数据,使译码器不间断地工作,提高了硬件资源利用率,使译码器的吞吐量增加一倍.  相似文献   

17.
MPEG—2视频解码的VHDL描述与验证   总被引:2,自引:0,他引:2  
本文提出一种MPEG-2视频解码的硬件结构,并采用VHDL进行了描述。辚实现MPEG-2视频时的实时解码,本文针对时序控制、变长码解码、反量化、TDCT、运动补偿和输入输出控制等各部分都提出了相应的性能的电路结构。验证和仿真的结果表明:本文的设计可以完成相应的功能,能被用于实现MPEG-2MP@ML的实时解码芯片。  相似文献   

18.
An on-chip high-speed two-cell Bose–Chaudhuri–Hocquenghen (BCH) decoder for error correction in a multilevel-cell (MLC) nor flash memory is presented. To satisfy the reliability requirements, a double-error-correcting (DEC) BCH code is required in nor flash memories with the process shrinking beyond 45 nm. A novel fast-decoding algorithm is developed to speed up the BCH decoding process using iteration-free solutions and division-free transformations in finite fields. As a result, the decoding latency is significantly reduced by 80%. Furthermore, a novel architecture of a two-cell decoder that is suitable for an MLC flash memory is proposed to obtain a good time–area tradeoff. Experimental results show that the latency of the proposed two-cell BCH decoder is only 7.5 ns, which satisfies the fast-access-time requirements of nor flash memories.   相似文献   

19.
In this paper, we study the security of a general two-level E0-like encryption model and its instance, the real-world Bluetooth encryption scheme. Both unconditional and conditional correlation properties of the two-level model are investigated in theory and a key-recovery framework based on condition masking, that studies how to choose the condition to get better tradeoffs on the time/memory/data complexity curve, is refined. A novel design criterion to resist the attack is proposed and analyzed. Inspired by these cryptanalytic principles, we describe more threatening and real time attacks on two-level E0. It is shown that only the latest four inputs going into the FSM play the most important role in determining the magnitude of the conditional correlation and the data complexity analysis of the previous practical attacks on two-level E0 are inaccuracy. A new decoding method to improve the data complexity is provided. In the known-IV scenario, if the first 24 bits of \(2^{24}\) frames are available, the secret key can be reliably found with \(2^{25}\) on-line computations, \(2^{21.1}\) off-line computations and 4 MB memory. Then, we convert the attack into a ciphertext-only attack, which needs the first 24 bits of \(2^{26}\) frames and all the complexities are under \(2^{26}\). This is the first practical ciphertext-only attack on the real Bluetooth encryption scheme so far. A countermeasure is suggested to strengthen the security of Bluetooth encryption in practical applications.  相似文献   

20.
For the decoding of a binary linear block code of minimal Hamming distance $d$ over additive white Gaussian noise (AWGN) channels, a soft-decision decoder achieves bounded-distance (BD) decoding if its squared error-correction radius is equal to $d$. A Chase-like algorithm outputs the best (most likely) codeword in a list of candidates generated by a conventional algebraic binary decoder in a few trials. It is of interest to design Chase-like algorithms that achieve BD decoding with as least trials as possible. In this paper, we show that Chase-like algorithms can achieve BD decoding with only $O(d^{1/2+varepsilon })$ trials for any given positive number $varepsilon $.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号