首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
提出了半像素运动估计算法的硬件实现方案,该方案可有效地提高视频编码的速度,耗费较低的硬件资源,减小处理器的面积。  相似文献   

2.
《微型机与应用》2017,(21):41-44
可分级高效视频编码(SHVC)可实现对视频序列的分层编码,正是因为实现了分层编码,编码的时间复杂度也会大大增加,尤其是在帧内预测过程中,需要从35种模式中通过率失真优化(RDO)选出最佳预测模式。为了加速增强层(EL)帧内预测模式的决策进程,基于当前预测单元(PU)与基本层(BL)相同位置PU,以及与BL中相同位置或EL中当前PU空间上相邻的PU的帧内预测模式的相关性,提出帧内预测模式快速决策算法。实验结果表明:在保证视频质量基本不变的情况下,相比较于SHVC的标准SHM-9.0而言,能减少大约40%~50%的时间。  相似文献   

3.
Integer pixel motion estimation (IME) is one crucial module with high complexity in high-definition video encoder. Efficient algorithm and architecture joint design is supposed to tradeoff multiple target parameters including throughput capacity, logic gate, on-chip SRAM size, memory bandwidth, and rate distortion performance. Data organization and on-chip buffer structure are crucial factors for IME architecture design, accounting for multiple target performance tradeoff. In this work, we combine global hierarchical search and local full search to propose hardware efficient IME algorithm, and then propose hardware VLSI architecture with optimized on-chip buffer structure. The major contribution of this work is characterized by: (1) improved hierarchical IME algorithm with presearch and deliberate data organization, (2) multistage on-chip reference pixel buffer structure with high data reuse between integer and fraction pixel motion estimations, (3) highly reused and reconfigurable processing element structure. The optimized data organization and buffer structure achieves nearly 70 % buffer saving with less than average 0.08, 0.12 dB the worst case, PSNR degradation compared with full search based architecture. At the hardware cost of 336 and 382 K logic gate and 20 kB SRAM, the proposed architecture achieves the throughput of 384 and 272 cycles per macroblock, at system frequency of 95 and 264 MHz for 1080p and QFHD @30fps format video coding.  相似文献   

4.
In this paper, an efficient architecture for the Finite Ridgelet Transform (FRIT) suitable for VLSI implementation based on a parallel, systolic Finite Radon Transform (FRAT) and a Haar Discrete Wavelet Transform (DWT) sub-block, respectively is presented. The FRAT sub-block is a novel parametrisable, scalable and high performance core with a time complexity of O(p 2), where p is the block size. Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) implementations are carried out to analyse the performance of the FRIT core developed.
Abbes AmiraEmail:
  相似文献   

5.
Endoscopic images are subjected to spatial distortion due to the wide-angle configuration of the camera lenses. This barrel type of non-linear distortion should be corrected before these images are subjected to further analysis for diagnostic purposes. An efficient digital architecture suitable for an embedded system which can correct the barrel distortion in real-time is presented in this paper. The theoretical approach of this spatial warping technique is based on least-squares estimation. The images in the distorted image space are mapped onto the corrected image space by using a polynomial mapping model. The polynomial parameters include the expansion coefficients, back-mapping coefficients, distortion centre and corrected centre. Several experiments were conducted by applying the spatial warping algorithm on many endoscopic images. A digital architecture suitable for hardware implementation of the distortion correction technique is developed by mapping the algorithmic steps onto a linear array of processing modules. Each module of a particular unit communicates with its nearest neighbours. The spatial warping architecture implemented and simulated with Altera’s Quartus II software shows an overall computation time of 1.8 ms with 50 MHz clock for an image of size 256 × 192 pixels, which confirms that the spatial warping module could be mounted as a dedicated unit in an endoscopy system for real-time applications.  相似文献   

6.
The use of a special purpose VLSI chip for relational operations is proposed The chip is structured like a tree with processors at the nodes, called TOP (Tree of Processors). Each node is capable of storing a data element and of performing elementary operations on elements. A table ofn tuples ofk elements each (e. g., a relation defined as in data base theory) is stored inn subtrees of at leastk nodes each, at the lowest levels of TOP. The upper portion of TOP is used for routing and bookkeeping purposes. A number of elementary operations are defined for the nodes, and high level operations on tables are performed as combinations of the former ones. In particular, some operations for data input/output and update are discussed, and the basic operations of UNION, DIFFERENCE, PROJECTION, PRODUCT, SELECTION, and JOIN, defined in relational algebra, are studied for TOP realization. Even the most complex operations are executed inO (kn) steps, that is the size of data. This result is optimal in our system, where we assume that data are transmitted to TOP's through channels of constan bandwidth. Dedicated to Professor S. Faedo on his 70th birthday This research has been partially supported by Ministero della Pubblica Istruzione of Italy.  相似文献   

7.
In this paper, an efficient VLSI architecture of a hierarchical block matching algorithm has been proposed for motion estimation. At the lowest resolution level, two motion vector (MV) candidates are selected to get better performance. In the next search level, these two candidates provide the center points for local searches to get one MV candidate. Then, at next level and the finest level, one MV candidate is chosen from one local search area (LSA), defined by the MV candidate, obtained from lower resolution level. This architecture requires nine processing elements and data are processed in such a way that calculation to obtain frames of different resolution is overlapped with the MV calculation. Simulation results indicate that this architecture is more area-efficient and faster than many full-search, three-step-search and multiresolution architectures which makes it suitable for SD and HD video. To avoid the delay due to pipelining, the MVs of all the macro-blocks are calculated for one resolution level and stored in RAM to get LSA for next resolution level. This architecture with about 16 K gates is implemented for a search range of [?15, +15]. As this architecture requires only two-port memory, which is very common in most consumer electronics systems, it can be integrated easily in any existing system at the expense of a very small area.  相似文献   

8.
This article proposes a high efficiency video coding (HEVC) standard hardware using block matching motion estimation algorithm. A hybrid parallel spiral and adaptive threshold star diamond search algorithm (Hyb PS-ATSDSA) proposes for fast motion estimation in HEVC. Parallel spiral search approach utilizes spiral pattern for searching from center to the surroundings and adaptive threshold SDA consists of two phases, they are adaptive threshold and star diamond algorithm. To lower computational complexities in HEVC architecture, parallel spiral search algorithm uses several blocks matching schemes. Adaptive threshold and star diamond algorithm are used to reduce the matching errors and remove the invalid blocks early from the procedure of motion estimation and finally predicts the final motion of the image. Speed is increased while using this hybrid algorithm. Proposed structure is carried out in Xilinx; ISE 14.5 design suit, then the experimental outcomes are analyzed to existing motion estimation strategies in field-programmable gate array devices. Experimental performance of the proposed Hyb-PS-ATSDSA-ME-HEVC method attains lower delay 33.97%, 32.97%, 62.97, and 26.97%, and lower area 34.867%, 45.97%, 27.97%, and 43.967% compared with the existing methods, such as FSA-ME-HEVC, TZSA-ME-HEVC, hyb TZSA-IME-HEVC, and IBMSA-ME-HEVC, respectively.  相似文献   

9.
The recent advances in integration technology for microelectronic circuitry will provide unprecedented systems capabilities in the upcoming decade. Among the most significant aspects of these systems will be their increasing “intelligence,” based on their manipulation of a variety of sensory data. Presently, the impressive advances in image understanding technology for visible, infrared, and synthetic array radar data provide added impetus to the development of truly autonomous systems. With the impending advent of these systems, it is crucially important to understand the impact that the new integration technologies will have on the necessary hardware. Furthermore, how the resulting systems may best be made to serve the requirements of each intended application must be understood. The computational requirements of autonomous systems based on image understanding are examined, and how those requirements might be satisfied by a cellular machine employing three-dimensional microelectronic technology is demonstrated.  相似文献   

10.
In this paper, high performance VLSI architectures for lifting based 1D and 2D-Discrete wavelet transforms (DWTs) are proposed. The proposed logic used for area efficient lifting based DWT is to perform the whole operation with one processing element. Similarly, the proposed logic used for delay efficient lifting based DWT is to perform the whole operation with multiple processing elements in parallel. In both the cases, the processing element consists of one floating point adder and one proposed fused multiply add design. The proposed and existing lifting based 1D and 2D lifting based DWTs are implemented with 45 nm technology. The results show that the proposed designs achieve significant improvement compared with existing architectures. For example, 9-point 2-parallel proposed (9, 7) single level 1D-DWT achieves 33.5% of reduction in total cycle delay compared with direct form. Similarly, 9-point single PE proposed (9, 7) single level 1D-DWT achieves 59.8% and 75.5% of reduction in total area and net power over direct form respectively.  相似文献   

11.
12.
This paper presents the design of a VLSI fuzzy processor, which is capable of dealing with complex fuzzy inference systems, i.e., fuzzy inferences that include rule chaining. The architecture of the processor is based on a computational model whose main features are: the capability to cope effectively with complex fuzzy inference systems; a detection phase of the rule with a positive degree of activation to reduce the number of rules to be processed per inference; parallel computation of the degree of activation of active rules; and representation of membership functions based on α-level sets. As the fuzzy inference can be divided into different processing phases, the processor is made up of a number of stages which are pipelined. In each stage several inference processing phases are performed parallelly. Its performance is in the order of 2 MFLIPS with 256 rules, eight inputs, two chained variables, and four outputs and 5.2 MFLIPS with 32 rules, three inputs, and one output with a clock frequency of 66 MHz  相似文献   

13.
14.
A VLSI architecture for real-time edge linking   总被引:1,自引:0,他引:1  
A real-time algorithm and its VLSI implementation for edge linking is presented. The linking process is based on the break points' directions and the weak level points. The proposed VLSI architecture is capable of outputting one pixel of the linked edge map per clock cycle with a latency of 11n+12 clock cycles, where n is the number of pixel columns in the image  相似文献   

15.
An efficient fuzzification algorithm named as Dynamic Precision Fuzzification (DPF) is introduced in this paper which is mainly developed for hardware implementation. The DPF which might be generally used with any piecewise linear membership function, exploits an inherent capacity of the normal fuzzification algorithm to improve its efficiency when realized in a finite-precision implementation bed such as digital VLSI. The accuracy simulation results of the DPF and normal fuzzification method are presented and compared to show the superiority of the DPF. As the word-length is the most important parameter in a finite-precision implementation environment which determines the system cost-precision trade-off, the simulation results show that DPF provides suitable precision improvements with respect to traditional fuzzification without increasing the system word-length. The VLSI synthesis results of both methods are also presented to show that this considerable accuracy improvement is achieved by an acceptable increase in its VLSI implementation costs in terms of area, delay, and power consumption with respect to traditional methods.  相似文献   

16.
Spatial architecture neural network (SANN), which is inspired by the connecting mode of excitatory pyramidal neurons and inhibitory interneurons of neocortex, is a multilayer artificial neural network and has good learning accuracy and generalization ability when used in real applications. However, the backpropagation-based learning algorithm (named BP-SANN) may be time consumption and slow convergence. In this paper, a new fast and accurate two-phase sequential learning scheme for SANN is hereby introduced to guarantee the network performance. With this new learning approach (named SFSL-SANN), only the weights connecting to output neurons will be trained during the learning process. In the first phase, a least-squares method is applied to estimate the span-output-weight on the basis of the fixed randomly generated initialized weight values. The improved iterative learning algorithm is then used to learn the feedforward-output-weight in the second phase. Detailed effectiveness comparison of SFSL-SANN is done with BP-SANN and other popular neural network approaches on benchmark problems drawn from the classification, regression and time-series prediction applications. The results demonstrate that the SFSL-SANN is faster convergence and time-saving than BP-SANN, and produces better learning accuracy and generalization performance than other approaches.  相似文献   

17.
This paper discusses the challenges of the design of real-time image and video processing systems and reviews some practical design approaches for these systems.  相似文献   

18.
为了降低二维小波变换中的存储消耗并同时提高电路处理速度,提出了一种二维并行的VLSI结构。通过充分挖掘二维变换中行变换和列变换之间的关系,优化了行变换核和列变换核的并行数据扫描输入方式,将9/7小波变换的中间存储降低至4N。同时,采用基于翻转格式的流水线技术,将电路的关键路径缩短至一级乘法器延时,有效地提高了电路处理速度,并通过伸缩电路合并的优化方法将乘法器个数降低至10个,从而有效地减少了硬件资源消耗。  相似文献   

19.
The CORDIC algorithm, originally proposed using nonredundant radix-2 arithmetic, has been refined in terms of throughput and latency with the introduction of redundant arithmetic and higher radix techniques. In this paper, we propose a pipelined architecture using signed digit arithmetic for the VLSI efficient implementation of rotational radix-4 CORDIC algorithm, eliminating z path completely. A detailed comparison of the proposed architecture with the available radix-2 architectures shows the latency and hardware improvement. The proposed architecture achieves latency improvement over the previously proposed radix-4 architecture with a relatively small hardware overhead. The proposed architecture for 16-bit precision was implemented using VHDL and extensive simulations have been performed to validate the results. The functionally simulated net list has been synthesized for 16-bit precision with 90 nm CMOS technology library and the area-time measures are provided. This architecture was also implemented using Xilinx ISE9.1 software and a Virtex device.  相似文献   

20.
一种基于整数小波变换的VLSI实现   总被引:1,自引:1,他引:0  
提出了基于整数小波变换的VLSI结构的设计。根据整数小波变换的一些特点,也就是小波矩阵的最佳因数分解和有限精度的表现效果,应用少数位来表示尾数使得性能退化非常有限,基于这些结果,提出了基于小波整数变换的VLSI实现,用适度的门复杂性来得到快速帧率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号