共查询到20条相似文献,搜索用时 15 毫秒
1.
Xiaofeng Huang Huizhu Jia Binbin Cai Chuang Zhu Jie Liu Mingyuan Yang Don Xie Wen Gao 《Journal of Real-Time Image Processing》2016,12(2):285-302
The emerging intra-coding tools of High Efficiency Video Coding (HEVC) standard can achieve up to 36 % bit-rate reduction compared to H.264/AVC, but with significant complexity increase. The design challenges, such as data dependency and computational complexity, make it difficult to implement a hardware encoder for real-time applications. In this paper, firstly, the data dependency in HEVC intra-mode decision is fully analyzed, which is cost by the reconstruction loop, the Most Probable Mode, the context adaption during Context-based Adaptive Binary Arithmetic Coding based rate estimation, and the Chroma derived mode. Then, several fast algorithms are proposed to remove the data dependency and to reduce the computational complexity, which include source signal based Rough Mode Decision, coarse to fine rough mode search, Prediction Mode Interlaced RDO mode decision, parallelized context adaption and Chroma-free Coding Unit (CU)/Prediction Unit (PU) decision. Finally, the parallelized VLSI architecture with CU reordering and Chroma reordering scheduling is proposed to improve the throughput. The experimental results demonstrate that the proposed intra-mode decision achieves 41.6 % complexity reduction with 4.3 % Bjontegaard Delta Rate (BDR) increase on average compared to the reference software, HM-13.0. The intra-mode decision scheme is implemented with 1571.7K gate count in 55 nm CMOS technology. The implementation results show that our design can achieve 1080p@60fps real time processing at 294 MHz operation frequency. 相似文献
2.
Integer pixel motion estimation (IME) is one crucial module with high complexity in high-definition video encoder. Efficient algorithm and architecture joint design is supposed to tradeoff multiple target parameters including throughput capacity, logic gate, on-chip SRAM size, memory bandwidth, and rate distortion performance. Data organization and on-chip buffer structure are crucial factors for IME architecture design, accounting for multiple target performance tradeoff. In this work, we combine global hierarchical search and local full search to propose hardware efficient IME algorithm, and then propose hardware VLSI architecture with optimized on-chip buffer structure. The major contribution of this work is characterized by: (1) improved hierarchical IME algorithm with presearch and deliberate data organization, (2) multistage on-chip reference pixel buffer structure with high data reuse between integer and fraction pixel motion estimations, (3) highly reused and reconfigurable processing element structure. The optimized data organization and buffer structure achieves nearly 70 % buffer saving with less than average 0.08, 0.12 dB the worst case, PSNR degradation compared with full search based architecture. At the hardware cost of 336 and 382 K logic gate and 20 kB SRAM, the proposed architecture achieves the throughput of 384 and 272 cycles per macroblock, at system frequency of 95 and 264 MHz for 1080p and QFHD @30fps format video coding. 相似文献
3.
提出了基于整数小波变换的VLSI结构的设计。根据整数小波变换的一些特点,也就是小波矩阵的最佳因数分解和有限精度的表现效果,应用少数位来表示尾数使得性能退化非常有限,基于这些结果,提出了基于小波整数变换的VLSI实现,用适度的门复杂性来得到快速帧率。 相似文献
4.
《Future Generation Computer Systems》1988,4(3):245-254
The Applicative Programming System Architecture contains a novel Data Structure Memory (DSM) which supports fast access operations on compact linear data structures. Several problems that arise in implementations of applicative and functional programming languages can be solved efficiently using special data representations on the DSM. Each memory word in the DSM contains a very small local processor, and there is also a tree-structured communications network within the DSM. Therefore the DSM is a massively parallel SIMD machine. This paper describes a VLSI implementation of the DSM architecture and compares its performance with implementations on a conventional sequential computer and the NASA Massively Parallel Processor. 相似文献
5.
在研究新一代高性能视频编码标准(HEVC)帧内预测中planar和DC模式预测算法的基础上,分别设计了高效VLSI架构,通过状态机的自适应控制和模块的复用来实现速度的提高和面积的减少。针对planar模式,设计了一种基于状态机自适应控制的寄存器累加架构;针对DC模式,设计了一种基于算法的分割处理架构。实验结果表明,所设计的架构在TSMC180 nm的工艺下最高频率为350 MHz,面积合计为68.1 kgate,能够实现对4∶2∶0格式7 680×4 320@30 f/s视频序列的实时编码,最高工作频率可以达到23.4 MHz。 相似文献
6.
提出了一种基于9/7小波的二维小波变换器的硬件设计方案.通过优化算法以及采用行列变换并行处理的方式,提高了变换器的数据吞吐量.该方案采用了流水线技术,较大地提高了硬件效率.综合结果表明,该方案的系统时钟可达到110 MHz,且具有高速、高吞吐量、片内存储器小等优点. 相似文献
7.
介绍了AVS标准中整数DCT变换矩阵的化简方法,该方法提高了一维整数DCT变换硬件实现的速度。基于此一维整数DCT变换,采用模块复用和流水线设计,实现了二维整数DCT直接变换在一个时钟周期内完成,工作频率可达160MHz。仿真结果证实了该算法的有效性。 相似文献
8.
9.
James M. Apffel K. Wayne Current Jorge L. C. Sanz Anil K. Jain 《Machine Vision and Applications》1989,2(4):193-214
A novel hardware architecture for extracting region boundaries in two raster scan passes through a binary image is presented. The first pass gathers statistics regarding the size of each object contour. This information is used by the hardware to allocate dynamically off-chip memory for storage of boundary codes. In the second raster pass the same architecture constructs lists of grid-joint codes to represent the perimeter pixels of each object. These codes, referred to variously as crack codes or raster-chain codes in the literature, are later decoded by the hardware to reproduce the ordered sequence of coordinates surrounding each object. This list of coordinates is useful for a variety of shape recognition and manipulation algorithms that utilize boundary information. We present results of software simulations of the VLSI architecture, along with measurements on the coding efficiency of the basic algorithm, and estimates of the overall complexity of a proposed VLSI chip. 相似文献
10.
A VLSI architecture for real-time edge linking 总被引:1,自引:0,他引:1
A real-time algorithm and its VLSI implementation for edge linking is presented. The linking process is based on the break points' directions and the weak level points. The proposed VLSI architecture is capable of outputting one pixel of the linked edge map per clock cycle with a latency of 11n+12 clock cycles, where n is the number of pixel columns in the image 相似文献
11.
Harry Rowbottom 《Computer Communications》1982,5(4):186-190
Progress in communications technology has produced a wide range of communications techniques for different applications. All these methods basically perform the same operation; transferring information between users. Little uniformity exists, however, and the main computer manufacturers have each produced their own network architectures, all incompatible with each other. The paper describes a network architecture designed by NCR. The requirements of such a network architecture are listed, and the need for an architecture to be compatible with international standards is emphasized by comparing NCR's architecture with the open systems interconnection reference model from the ISO. The structure and operations of the layered architecture is described and compared to the OSI reference model. 相似文献
12.
Corner detection is a low-level feature detection operator that is of great use in image processing applications, for example, optical flow and structure from motion by image correspondence. The detection of corners is a computationally intensive operation. Past implementations of corner detection techniques have been restricted to software. In this paper we propose an efficient very large-scale integration (VLSI) architecture for detection of corners in images. The corner detection technique is based on the half-edge concept and the first directional derivative of Gaussian. Apart from the location of the corner points, the algorithm also computes the corner orientation and the corner angle and outputs the edge map of the image. The symmetrical properties of the masks are utilized to reduce the number of convolutions effectively, from eight to two. Therefore, the number of multiplications required per pixel is reduced from 1800 to 392. Thus, the proposed architecture yields a speed-up factor of 4.6 over conventional convolution architectures. The architecture uses the principles of pipelining and parallelism and can be implemented in VLSI. 相似文献
13.
An efficient VLSI architecture and FPGA implementation of the Finite Ridgelet Transform 总被引:1,自引:0,他引:1
Shrutisagar Chandrasekaran Abbes Amira Shi Minghua Amine Bermak 《Journal of Real-Time Image Processing》2008,3(3):183-193
In this paper, an efficient architecture for the Finite Ridgelet Transform (FRIT) suitable for VLSI implementation based on
a parallel, systolic Finite Radon Transform (FRAT) and a Haar Discrete Wavelet Transform (DWT) sub-block, respectively is
presented. The FRAT sub-block is a novel parametrisable, scalable and high performance core with a time complexity of O(p
2), where p is the block size. Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) implementations
are carried out to analyse the performance of the FRIT core developed.
相似文献
Abbes AmiraEmail: |
14.
《电子技术应用》2015,(11):61-64
针对双码字、双天线的TD-LTE下行链路串行结构设计了一种新型的空间复用预编码和资源映射模块。传统的空间复用预编码模块无法对串行结构中两个码字进行分时串行处理,导致在预编码之前需要添加缓存模块,增加了系统的存储器开销。通过对空间复用预编码系数矩阵进行矩阵变换,剥离出两个码字间的加法操作,从而实现对两个码字进行分时串行处理。同时,结合资源映射模块的特点,移植了空间复用预编码模块对两个码字间的加法操作,并提出了一种新的存储器复用方式,从而节省了缓存模块,将两个模块中存储器的资源开销降低16.4%。实验结果表明,本文提出的新型空间复用预编码和资源映射模块相比于同类型设计具有控制简单、硬件资源开销小等优势。 相似文献
15.
The organization and operation of a semantic network array processor (SNAP) are described. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and communication unit. Each cell is dedicated to one node of the semantic network and its associated relations. The array can perform global associative functions under the supervision of an outside controller. In addition, each-cell is equipped with the necessary logic to perform individual functions. A set of primitive instructions was carefully chosen. Some of the applications discussed include pattern search operations, production systems, and inferences. A LISP simulator was developed for this architecture, and some simulation results are presented. 相似文献
16.
为了使DCT变换能够通用,首先通过对DCT变换原理进行研究,发现了变换基系数的取值个数与阶数的关系,并结合余弦函数的性质对其进行了证明;然后以此为基础,提出了一种N(N=2k, k>0,下同)阶整数DCT变换基的通用生成算法(该算法无需对相应的浮点基进行具体分析);接着通过巧妙排列系数的序号,使得生成的中间多项式具有极强的规律性;最后设计了一个N位M进制数,用来实现N重循环,以穷举所有的可能解,并成功对任意N元多项式组进行了求解。实验结果表明,只要计算机的能力足够强大,应用此算法便可以发现任意N×N整数DCT变换的所有可用基。 相似文献
17.
Graham R. Nudd R.David Etchells Jan Grinberg 《Journal of Parallel and Distributed Computing》1985,2(1):1-29
The recent advances in integration technology for microelectronic circuitry will provide unprecedented systems capabilities in the upcoming decade. Among the most significant aspects of these systems will be their increasing “intelligence,” based on their manipulation of a variety of sensory data. Presently, the impressive advances in image understanding technology for visible, infrared, and synthetic array radar data provide added impetus to the development of truly autonomous systems. With the impending advent of these systems, it is crucially important to understand the impact that the new integration technologies will have on the necessary hardware. Furthermore, how the resulting systems may best be made to serve the requirements of each intended application must be understood. The computational requirements of autonomous systems based on image understanding are examined, and how those requirements might be satisfied by a cellular machine employing three-dimensional microelectronic technology is demonstrated. 相似文献
18.
Image segmentation is a crucial part of machine vision applications. In this paper a system to perform real-time segmentation of images is presented. It uses a real-time segmentation VLSI chip that is based on a gradient relaxation algorithm and is designed using the Path Programmable Logic design methodology developed at the University of Utah. The system design considerations, system specifications, and an input/output format for the chip are discussed. The actual design of the chip is given that uses pipeline methodology to achieve real-time performance with a compact VLSI layout. The implementation of the segmentation system is presented and the segmentation chip and the overall system are evaluated with regard to real-time performance and segmentation results.This work was supported in part by Grant ISI-856-0393 from the National Science Foundation. 相似文献
19.
沈少华 《计算技术与自动化》2001,20(3):266-269
本文从建筑智能化系统的概念、目标出发,透析当前智能化系统设计与施工中存在的问题,提出了从结构的"整体性"、系统的"开放性"、布线的"综合性"、界面的"适用性"及管理的"规范性"等五个方面采取措施,力求对健康、有序的发展智能建筑起到积极的借鉴作用. 相似文献
20.
This paper presents a novel two-stage class of decimation filters with superior spurious signal rejection performance around the so-called folding bands, i.e., frequency intervals whose signals get folded down to baseband due to decimation. The key idea to enhance signal rejection in the frequency domain lies on an effective way to place the zeros of a classical comb filter in the aforementioned folding bands. On the other hand, the paper provides a mathematical framework for designing two-stage multiplierless and nonrecursive structures of the proposed filters.Examples are provided to highlight the key steps in the design of the proposed filters. Moreover, the frequency behavior of the proposed filters in both baseband and stopband is compared with classical and generalized comb filters, and a droop compensator is proposed to counteract the passband distortion of the proposed filters. 相似文献