共查询到20条相似文献,搜索用时 0 毫秒
1.
VLIW(Very Long Instruction Word)指令因为含有较多的空操作导致严重的代码体积膨胀问题,代码压缩是解决这一问题的有效措施.VLIW代码压缩需要解决三个关键问题,一是提高压缩率;二是降低解压操作对性能的影响;三是分支目标重定位.针对流体系结构上的VLIW指令特点,提出了二维压缩,对VLIW进行垂直与水平两个方向上的压缩,且水平解压可以与代码执行并行,并通过设置堆栈寄存器缓存循环入口地址.实验结果表明二维压缩有效解决了VLIW代码体积膨胀问题,可以使指令存储器的面积减少36.48%,并使得整个CISP系统面积减少了7.85%. 相似文献
2.
二维正交子波变换的VLSI并行计算 总被引:1,自引:1,他引:1
本文提出一个二维离散正交子波变换的VLSI并行结构,该结构将二维输入信号分解成不重叠的若干行组,从而使每组中的所有行被并行处理,而不同组的行的处理、不同级上的计算,以至不同信号的计算可以在此结构上流水线地进行。 相似文献
3.
Several parallel, pipelined and folded architectures with different throughput rates are presented for computation of DCT, one of the fundamental operations in image/video coding. This paper begins with a new decomposition algorithm for the 1-D DCT coefficient matrix. Then the 2-D DCT problem is converted into the corresponding 1-D counterpart through a regular index mapping technique. Afterward, depending on the trade-off between hardware complexity and speed performance, the derived decomposition algorithm is transformed into different parallel-pipelined and folded architectures that realize the butterfly operations and the post-processing operations. Compared to other DCT processor, our proposed parallel-pipelined architectures, without any intermediate transpose memory, have the features of modularity, regularity, locality, scalability, and pipelinability, with arithmetic hardware cost proportional to the logarithm of the transform length. 相似文献
4.
5.
In H.264/AVC, the concept of adapting the transform size to the block size of motion-compensated prediction residue has proven
to be an important coding tool. This paper presents highly parallel joint circuit architecture for 8 × 8 and 4 × 4 adaptive
block-size transforms in H.264/AVC. By decomposing the 8 × 8 transform to basic 4 × 4 transforms, a unified architecture is
designed for both 8 × 8 and 4 × 4 transform and the transform data-path can be efficiently reused for six kinds of transforms.
i.e., 8 × 8 forward, 8 × 8 inverse, 4 × 4 forward, 4 × 4 inverse, forward-Hadamard, inverse-Hadamard transforms. Linear shift
mapping is applied on the memory buffer to support parallel access both in row and column directions which eliminates the
need for a transpose circuit. For reusable and configurable transform data-path, a multiple-stage pipeline is designed to
reduce the critical path length and increase throughput. The design is implemented under UMC 0.18 um technology at 200 MHz
with 13.651 K logic gates, which can support 1,920 × 1,088 30 fps H.264/AVC HDTV decoder.
相似文献
Yu LiEmail: |
6.
提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100%,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N2/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中. 相似文献
7.
8.
Francesco Gregoretti Roberto Passerone Leonardo Maria Reyneri Claudio Sansoé 《The Journal of VLSI Signal Processing》2001,28(3):259-278
This article presents PAPRICA-3, a VLSI-oriented architecture for real-time processing of images and its implementation on HACRE, a high-speed, cascadable, 32-processors VLSI slice. The architecture is based on an array of programmable processing elements with the instruction set tailored to image processing, mathematical morphology, and neural networks emulation. Dedicated hardware features allow simultaneous image acquisition, processing, neural network emulation, and a straightforward interface with a hosting PC.HACRE has been fabricated and successfully tested at a clock frequency of 50 MHz. A board hosting up to four chips and providing a 33 MHz PCI interface has been manufactured and used to build BEATR IX, a system for the recognition of handwritten check amounts, by integrating image processing and neural network algorithms (on the board) with context analysis techniques (on the hosting PC). 相似文献
9.
本文首先简单回顾了作者曾提出的二维实值离散Gabor变换及其与复值离散Gabor变换的简单关系,然后着重探讨了二维实值离散Gabor变换快速计算问题,提出了二维实值离散Gabor变换系数求解的时间递归算法以及由变换系数重构原图像的块时间递归算法,研究了双层并行格型结构实现算法的方法,计算复杂性分析及与其它算法的比较证明了双层并行格型结构实现方法在实时处理方面的优越性。 相似文献
10.
本文研究了不可避免的阵元幅相误差对各列先微波合成为列子的矩形侧面相控机载预警(AEW)雷达二维杂波谱的影响,并分析了基于理想“脊背型”杂波谱的二维信号处理性能下降的原因,为深入研究时空二维处理提供了基础。 相似文献
11.
12.
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more
cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly
lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted.
The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely
related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing
of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical
memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated
and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping
scheme.
相似文献
Soon-Chieh LimEmail: |
13.
随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元,并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题,采用简单的部分地址异或逻辑完成SIMD并行访存地址转换,实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令,可完全消除SIMD结构下基2 FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明,采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计,不同点数FFT运算可获得1.32~2.66的加速比. 相似文献
14.
矩阵乘法是数值分析领域中一种十分常用的基本运算,被广泛应用于模式识别、图像和信号处理。由于矩阵运算具有局部性、一致性的特点,特别适合用二维网孔并行计算机来实现。文章讨论了基于二维网孔互连网络的矩阵乘并行算法的实现,首先给出了一种正方网孔处理机阵列的并行算法,然后将其推广到长方网孔处理机阵列中。最后通过在LSMPP计算机的应用,证明算法是可行的、有效的。 相似文献
15.
距离迁移(RM)算法能够精确校正近场距离徙动,同时通过使用快速傅里叶变换可以达到很高的计算效率,具有应用于近场MIMO雷达三维实时成像的潜力。RM算法应用于近场MIMO成像的主要挑战是设计合适的阵列结构。文中利用球面波分解为无穷多个平面波的方法推导了MIMO雷达近场三维RM 成像算法,在深入分析算法实现流程的基础上得
到了RM算法对MIMO阵列构型的四条约束条件。提出了一种适用RM算法的MIMO阵列设计方法,并利用所提方法设计了MIMO阵列,结合仿真,分析了所设计阵列的成像性能。 相似文献
16.
圆阵可以提供二维DOA估计,在相位模式空间下,UCA—ESPRIT算法具有计算量小、DOA参数自动配对等优点。然而与诸如UCA—RB—MUSIC等算法相比,其均方误差较大。为了保持UCA—ESPRIT算法原有的优点并减小DOA估计的均方误差,将行加权技术引入算法进行改进。 相似文献
17.
18.
A new algorithm for 2-D DOA estimation 总被引:1,自引:0,他引:1
In this paper we present a new algorithm to estimate the 2-D direction of arrival (DOA) of narrowband sources lying in the far field of the array. The array consists of matched co-directional triplets, and can be considered as an extension of the 1-D ESPRIT scenario to 2-D. The proposed approach is simple and direct and does not require a search procedure or initialization. Existing algorithms require a search to match the correct elevation and azimuth angles and are computationally more expensive. This technique automatically pairs the azimuth and elevation angles by marking them. The computational complexity is twice that of 1-D ESPRIT. Simulation results and comparisons with other existing algorithms are presented to demonstrate the performance of the proposed technique. 相似文献
19.
为提高2-D IDCT的解码速度,文中设计了一种基于DA的2-D IDCT处理器.该处理器在算法上用1-D IDCT实现2-D IDCT,用Chen算法实现1-D IDCT,用DA实现乘加结构.通过将输入数据分成高6位和低6位两组加快了处理器的速度,通过查找表的共用及将输入数据投影到(-1,1)的编码减少了查找表的数量及大小.通过在Q0上预存四舍五入值省去了四舍五入所需的加法运算.使用Altera的EP2C20F484C7对该处理器进行综合,时钟最高频率可达165.37MHz. 相似文献