共查询到20条相似文献,搜索用时 109 毫秒
1.
本文提出一种适用于H.264编码器的高度并行、双层流水线的CAVLC硬件实现结构. 该结构设计了四路并行扫描统计模块,克服了以往结构每个时钟周期只能扫描一个系数的处理速率瓶颈;通过使用FIFO,平衡每一级流水线的处理延时,提高整个流水线工作的效率;在各个编码模块内部也大量采用流水线结构,提高数据吞吐率. 基于0.18μm CMOS工艺,新结构在166.7MHz工作频率下,综合等效门数为20685门,数据吞吐率为每秒处理27M系数块,甚至能够实时编码数字影视格式的视频(4096×2048@30fp/s). 整个设计在数据吞吐率提高到以往结构的3.46倍的同时,硬件资源代价并没有显著的增加. 相似文献
2.
3.
为降低实现高阶矩阵SVD时的硬件复杂度和计算延时,本文改进了CORDIC迭代结构,设计了一种用于SVD的低硬件复杂度、高速CORDIC计算单元.本文以2×2矩阵为例,基于XilinxVirtex6硬件平台设计并实现了使用优化后CORDIC计算单元的SVD模块,在19bit位宽下吞吐率达25.9Gbps.对比Xilinx IP core中同类模块,本文设计节省27.6%寄存器,27.7%查找表,实时性提高14%.对高阶矩阵,本文给出资源消耗趋势曲线,可证明优化后CORDIC计算单元能降低16阶矩阵SVD模块约40%的硬件复杂度. 相似文献
4.
设计了一种用于IEEE802.11a的OFDM基带调制处理器.实现了IFFT、插入循环前缀、加窗及训练序列生成等模块的硬件设计.64点FFT/IFFT模块采用单碟形优化4路并行结构,提出了4路并行无冲突地址读写算法,有效地提高了数据吞吐率、减小了面积;采用多级流水线结构的基4蝶形单元,有效地减少了关键路径时延,其最高工... 相似文献
5.
6.
针对现有压缩算法通过增加复杂度来降低压缩率,获得信息高效传输的问题。该文提出阵列配置加速比模型,证明低压缩率不一定能提高传输效率,并找到影响信息传输效率的因子,即解压模块吞吐率和数据块压缩率。将影响因子与配置信息特征结合,设计了一种新的无损压缩算法,并硬件实现了解压模块,吞吐率可达到16.1 Gbps。采用AES, A5-1和SM4对无损压缩算法进行测试,然后与主流无损压缩算法LZW, Huffman, LPAQ1和Arithmetic对比。结果表明,整体压缩率相当,但该文压缩算法产生的数据块压缩率经过优化,不仅能满足加速需求,且具有高吞吐率的解压性能;该文无损压缩算法获得的配置加速比,比硬件吞吐率理想情况下的LPAQl, Arithmetic, Huffman, LZW算法分别高8%, 9%, 10%, 22%左右。 相似文献
7.
根据BP译码算法,设计了一种高速部分并行QC_LDPC码译码器结构,该结构适用于所有其校验矩阵具有准循环特性的LDPC码.针对传统BP译码器的结构复杂度高,系统运行频率低和吞吐率小等特点,本设计将BP译码算法中大量的复杂函数运算通过查找表的方法来实现;校验节点和变量节点的处理均采用5级流水线的方式;采用提前终止迭代译码策略.本设计能有效地减少译码器硬件实现复杂度,同时提高系统运行地频率和数据吞吐率. 相似文献
8.
通过对LDPC码解码算法及解码器结构的研究,本文提出一种改进型高吞吐率QC-LDPC码解码器设计方案.综合考虑硬件复杂度和解码吞吐率,该方案利用分层解码算法和部分并行结构进行设计,并采用提前检测技术,消除冗余的迭代,实现高吞吐率.然后通过ModelSim SE6.0对该解码器进行仿真测试,验证了其功能的正确性,最后采用... 相似文献
9.
RLWE加密方案是后量子时代格密码系统中最有潜力的候选方案之一。针对RLWE加密处理器存在的高延迟、低吞吐率的问题,文中提出了一种高性能RLWE加密处理器硬件架构。该结构采用了两个NTT模块和4个蝶形模块的并行结构。在预计算和后计算过程中,利用4个蝶形模块中的乘法器进行并行计算。在加密过程中,NTT运算与密文计算并行处理。在NTT以及INTT运算的处理过程中,将数据的读写过程及计算过程进行乒乓操作,从而隐藏数据的读写周期,降低RLWE加密处理器的延迟,提高了RLWE加密处理器的吞吐率。设计资源复用的硬件架构,在加密、解密过程复用蝶形模块中的乘法器和加法器,INTT复用NTT的电路结构,从而降低加密处理器硬件资源消耗。在Spartan-6 FPGA开发平台上实现了参数为n=256,q=65 537的加密处理器。实验结果表明,文中提出的加密时间仅为12.18μs,吞吐率为21.01 Mbit·s-1,解密时间仅为8.65μs,吞吐率为29.60 Mbit·s-1。与其他加密处理器的对比实验表明,文中所提出的加密处理器的延迟和吞吐率均得到了改善。 相似文献
10.
尺度不变特征变化(SIFT)算法是目前在图像配准领域最为活跃的算法之一,但描述子特征向量描述部分的计算复杂度特别高,现有硬件实现方法的数据吞吐率仅为50%.为了克服该瓶颈,提出了一种新型的描述子特征向量扫描控制机制,占用硬件资源少,显著提高有效像素点的采样率.在硬件耗用仅提高1.16%的情况下,平均吞吐率达到77%,与传统方法相比吞吐率提高了54%. 相似文献
11.
12.
In this article, we present the implementation of high throughput two-dimensional (2-D) 8?×?8 forward and inverse integer DCT transform for H.264. Using matrix decomposition and matrix operation, such as the Kronecker product and direct sum, the forward and inverse integer transform can be represented using simple addition operations. The dual clocked pipelined structure of the proposed implementation uses non-floating point adders and does not require any transpose memory. Hardware synthesis shows that the maximum operating frequency of the proposed pipelined architecture is 1.31?GHz, which achieves 21.05 Gpixels/s throughput rate with the hardware cost of 42932 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing. 相似文献
13.
14.
15.
16.
A new fast algorithm for the computation of the modulated lapped transform (MLT) is proposed and its efficient implementation using pipelining techniques and complex programmable logic device (CPLD) is presented. The new algorithm computes a length-M MLT via the length-M/2 fast Fourier transform (FFT). Computational overhead due to data shuffling in pre-processing and post-processing is offset in hardware realisation. Hence the overall throughput of the MLT computation for real-time applications is significantly improved. The pipelined CPLD architecture and circuitry are described in detail. Computational complexity of the proposed algorithm is analysed, and throughput improvement is verified by experimental results 相似文献
17.
This paper presents a high throughput digital design of the 128-bit Advanced Encryption Standard (AES) algorithm based on the 2-slow retiming technique on FPGA. The C-slow retiming is a well-known optimization and high performance technique. It can enhance designs with feedback loops and automatically rebalances the registers in the design. The C-slow retiming can break the critical path of the design into finer pieces to improve the throughput of the design. The complexity of the C-slow retiming on FPGA is to find the best register allocation in the data path of the design so that by increasing the number of registers, relocation of the registers to balance the AES architecture be in the best mode, and the critical path be optimally pipelined and improved. In this paper, architecture of the AES algorithm is implemented in the gate level by high-speed and breakable structures that are desirable for the 2-slow retiming. The Mix-columns transformation is implemented based on multiplication by constants 2 and 3 modules with combinational logic circuits. This work has been successfully verified and synthesized using Xilinx ISE 11 byVirtex-5, XC5VLX85 FPGA. The proposed implementation achieves a high throughput of 86 Gb/s and high maximum operation frequency of 671.524 MHz whereas the highest throughput and the highest operation frequency reported in the literature are 73.737 Gb/s and 576.07 MHz, respectively. 相似文献
18.
This paper presents two types of high‐speed hardware architectures for the block cipher ARIA. First, the loop architectures for feedback modes are presented. Area‐throughput trade‐offs are evaluated depending on the S‐box implementation by using look‐up tables or combinational logic which involves composite field arithmetic. The sub‐pipelined architectures for non‐feedback modes are also described. With loop unrolling, inner and outer round pipelining techniques, and S‐box implementation using composite field arithmetic over GF(24)2, throughputs of 16 Gbps to 43 Gbps are achievable in a 0.25 μm CMOS technology. This is the first sub‐pipelined architecture of ARIA for high throughput to date. 相似文献
19.
Several DSP algorithms need to remove high-frequency or impulsive noise while preserving edges, e.g., in speech and image processing applications: median filtering has been proved to be more effective for achieving this goal than other filtering techniques. Efficient architectural implementation for real-time applications involves a careful VLSI design, which takes into account modularity, regularity, adaptability, scalability, throughput, circuit complexity and fault tolerance.Four new architectural approaches are presented and evaluated in this paper to deal with different application and implementation constraints. They are: the serial-input polarizing median filter, the floating median filter, the pipelined polarizing median filter and the pipelined sorting median filter. The 1st and the 2nd architectures are based on majority voting, while the 3rd and the 4th ones are based on sorting techniques. All of them are designed so as to exhibit high scalability and to be easily pipelined for higher working frequencies. 相似文献
20.
Shih-Chang Hsia 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(4):651-658
High-quality televisions (TVs) such as improved digital TV, enhanced TV, and high-definition TV have become popular in recent years. However, impulse noise affects TV broadcasts. This paper proposes an efficient noise-removal algorithm using an adaptive digital signal-processing approach. Simulations have demonstrated that the new adaptive algorithm could efficiently reduce impulse noise even in highly corrupted images. In order to achieve real-time implementation, a cost-effective architecture is proposed using a parallel structure and pipelined processing. The proposed processor can achieve the throughput rate of 45M pixels/s using only 4k gates and two line buffers. Unlike median-filtering chips, this processor provides better filtering quality and its circuit is much less complex. 相似文献