首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
基于流体系结构的VLIW二维压缩及并行解压   总被引:1,自引:0,他引:1       下载免费PDF全文
VLIW(Very Long Instruction Word)指令因为含有较多的空操作导致严重的代码体积膨胀问题,代码压缩是解决这一问题的有效措施.VLIW代码压缩需要解决三个关键问题,一是提高压缩率;二是降低解压操作对性能的影响;三是分支目标重定位.针对流体系结构上的VLIW指令特点,提出了二维压缩,对VLIW进行垂直与水平两个方向上的压缩,且水平解压可以与代码执行并行,并通过设置堆栈寄存器缓存循环入口地址.实验结果表明二维压缩有效解决了VLIW代码体积膨胀问题,可以使指令存储器的面积减少36.48%,并使得整个CISP系统面积减少了7.85%.  相似文献   

2.
二维正交子波变换的VLSI并行计算   总被引:1,自引:1,他引:1  
本文提出一个二维离散正交子波变换的VLSI并行结构,该结构将二维输入信号分解成不重叠的若干行组,从而使每组中的所有行被并行处理,而不同组的行的处理、不同级上的计算,以至不同信号的计算可以在此结构上流水线地进行。  相似文献   

3.
Several parallel, pipelined and folded architectures with different throughput rates are presented for computation of DCT, one of the fundamental operations in image/video coding. This paper begins with a new decomposition algorithm for the 1-D DCT coefficient matrix. Then the 2-D DCT problem is converted into the corresponding 1-D counterpart through a regular index mapping technique. Afterward, depending on the trade-off between hardware complexity and speed performance, the derived decomposition algorithm is transformed into different parallel-pipelined and folded architectures that realize the butterfly operations and the post-processing operations. Compared to other DCT processor, our proposed parallel-pipelined architectures, without any intermediate transpose memory, have the features of modularity, regularity, locality, scalability, and pipelinability, with arithmetic hardware cost proportional to the logarithm of the transform length.  相似文献   

4.
张光烈  郑南宁  吴勇  张霞 《电子学报》2002,30(7):945-948
本文在讨论隔行视频信号的逐行处理算法的VLSI实现和视频信号的色度处理和色度空间转换的硬件实现基础上,针对视频信号处理实时性,并发性以及运算量大的特点,提出了基于同步并行流水线的VLSI结构.同时结合SOC的IP模块设计给出相应的硬件实现算法.该设计已基于0.35μm CMOS工艺标准单元库进行了综合验证.  相似文献   

5.
A Highly Parallel Joint VLSI Architecture for Transforms in H.264/AVC   总被引:1,自引:0,他引:1  
In H.264/AVC, the concept of adapting the transform size to the block size of motion-compensated prediction residue has proven to be an important coding tool. This paper presents highly parallel joint circuit architecture for 8 × 8 and 4 × 4 adaptive block-size transforms in H.264/AVC. By decomposing the 8 × 8 transform to basic 4 × 4 transforms, a unified architecture is designed for both 8 × 8 and 4 × 4 transform and the transform data-path can be efficiently reused for six kinds of transforms. i.e., 8 × 8 forward, 8 × 8 inverse, 4 × 4 forward, 4 × 4 inverse, forward-Hadamard, inverse-Hadamard transforms. Linear shift mapping is applied on the memory buffer to support parallel access both in row and column directions which eliminates the need for a transpose circuit. For reusable and configurable transform data-path, a multiple-stage pipeline is designed to reduce the critical path length and increase throughput. The design is implemented under UMC 0.18 um technology at 200 MHz with 13.651 K logic gates, which can support 1,920 × 1,088 30 fps H.264/AVC HDTV decoder.
Yu LiEmail:
  相似文献   

6.
JPEG2000并行阵列式小波滤波器的VLSI结构设计   总被引:2,自引:0,他引:2       下载免费PDF全文
兰旭光  郑南宁  梅魁志  刘跃虎 《电子学报》2004,32(11):1806-1809
提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100%,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N2/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中.  相似文献   

7.
8.
This article presents PAPRICA-3, a VLSI-oriented architecture for real-time processing of images and its implementation on HACRE, a high-speed, cascadable, 32-processors VLSI slice. The architecture is based on an array of programmable processing elements with the instruction set tailored to image processing, mathematical morphology, and neural networks emulation. Dedicated hardware features allow simultaneous image acquisition, processing, neural network emulation, and a straightforward interface with a hosting PC.HACRE has been fabricated and successfully tested at a clock frequency of 50 MHz. A board hosting up to four chips and providing a 33 MHz PCI interface has been manufactured and used to build BEATR IX, a system for the recognition of handwritten check amounts, by integrating image processing and neural network algorithms (on the board) with context analysis techniques (on the hosting PC).  相似文献   

9.
本文首先简单回顾了作者曾提出的二维实值离散Gabor变换及其与复值离散Gabor变换的简单关系,然后着重探讨了二维实值离散Gabor变换快速计算问题,提出了二维实值离散Gabor变换系数求解的时间递归算法以及由变换系数重构原图像的块时间递归算法,研究了双层并行格型结构实现算法的方法,计算复杂性分析及与其它算法的比较证明了双层并行格型结构实现方法在实时处理方面的优越性。  相似文献   

10.
廖桂生  保铮 《电子学报》1994,22(3):116-119
本文研究了不可避免的阵元幅相误差对各列先微波合成为列子的矩形侧面相控机载预警(AEW)雷达二维杂波谱的影响,并分析了基于理想“脊背型”杂波谱的二维信号处理性能下降的原因,为深入研究时空二维处理提供了基础。  相似文献   

11.
传统的L型阵相比面阵精简了阵列结构,以较少的阵元实现二维波达方向估计,但是波达方向估计性能受到物理孔径限制。本文将MIMO技术和L型阵结合,提出一种基于MIMO技术的L型阵二维波达方向估计方法。该方法通过MIMO等效虚拟阵列原理,将L型阵等效为一矩形平面阵列,然后在等效矩形阵列的基础上,采用MUSIC进行二维波达方向估计,以L型阵的物理孔径实现矩形平面阵列的估计性能。本文推导了二维波达方向估计的CRB,计算机仿真实验证实了所提方法的有效性。   相似文献   

12.
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted. The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping scheme.
Soon-Chieh LimEmail:
  相似文献   

13.
陈海燕  杨超  刘胜  刘仲 《电子学报》2016,44(2):241-246
随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元,并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题,采用简单的部分地址异或逻辑完成SIMD并行访存地址转换,实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令,可完全消除SIMD结构下基2 FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明,采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计,不同点数FFT运算可获得1.32~2.66的加速比.  相似文献   

14.
矩阵乘法是数值分析领域中一种十分常用的基本运算,被广泛应用于模式识别、图像和信号处理。由于矩阵运算具有局部性、一致性的特点,特别适合用二维网孔并行计算机来实现。文章讨论了基于二维网孔互连网络的矩阵乘并行算法的实现,首先给出了一种正方网孔处理机阵列的并行算法,然后将其推广到长方网孔处理机阵列中。最后通过在LSMPP计算机的应用,证明算法是可行的、有效的。  相似文献   

15.
距离迁移(RM)算法能够精确校正近场距离徙动,同时通过使用快速傅里叶变换可以达到很高的计算效率,具有应用于近场MIMO雷达三维实时成像的潜力。RM算法应用于近场MIMO成像的主要挑战是设计合适的阵列结构。文中利用球面波分解为无穷多个平面波的方法推导了MIMO雷达近场三维RM 成像算法,在深入分析算法实现流程的基础上得 到了RM算法对MIMO阵列构型的四条约束条件。提出了一种适用RM算法的MIMO阵列设计方法,并利用所提方法设计了MIMO阵列,结合仿真,分析了所设计阵列的成像性能。  相似文献   

16.
张雪  孙超 《电声技术》2009,33(6):26-28
圆阵可以提供二维DOA估计,在相位模式空间下,UCA—ESPRIT算法具有计算量小、DOA参数自动配对等优点。然而与诸如UCA—RB—MUSIC等算法相比,其均方误差较大。为了保持UCA—ESPRIT算法原有的优点并减小DOA估计的均方误差,将行加权技术引入算法进行改进。  相似文献   

17.
该文运用一种改进的粒子群优化算法对不等幅激励的矩形平面阵列天线的最大旁瓣电平进行了优化,采用对全局最优粒子微扰和跳变的惯性权重策略,并使用粒子群算法本身对参数组合进行了优化选择。新算法大大改善了优化速度和收敛精度。对二维阵列天线旁瓣电平优化和稀疏阵列方向图综合的良好结果也证明了该方法的有效性。  相似文献   

18.
A new algorithm for 2-D DOA estimation   总被引:1,自引:0,他引:1  
In this paper we present a new algorithm to estimate the 2-D direction of arrival (DOA) of narrowband sources lying in the far field of the array. The array consists of matched co-directional triplets, and can be considered as an extension of the 1-D ESPRIT scenario to 2-D. The proposed approach is simple and direct and does not require a search procedure or initialization. Existing algorithms require a search to match the correct elevation and azimuth angles and are computationally more expensive. This technique automatically pairs the azimuth and elevation angles by marking them. The computational complexity is twice that of 1-D ESPRIT. Simulation results and comparisons with other existing algorithms are presented to demonstrate the performance of the proposed technique.  相似文献   

19.
为提高2-D IDCT的解码速度,文中设计了一种基于DA的2-D IDCT处理器.该处理器在算法上用1-D IDCT实现2-D IDCT,用Chen算法实现1-D IDCT,用DA实现乘加结构.通过将输入数据分成高6位和低6位两组加快了处理器的速度,通过查找表的共用及将输入数据投影到(-1,1)的编码减少了查找表的数量及大小.通过在Q0上预存四舍五入值省去了四舍五入所需的加法运算.使用Altera的EP2C20F484C7对该处理器进行综合,时钟最高频率可达165.37MHz.  相似文献   

20.
在阐述二维MUSIC算法基本原理的基础上,通过将接收数据共轭重排的再利用、构造相关矩阵,提出一种基于正交阵列的修正二维MUSIC算法。在快拍次数有限时,此算法可以明显改善信号的波达方向估计性能,特别是相干信号波达方向(DOA)的估计性能,计算机仿真结果证明了该改进算法的有效性和可行性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号