期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

一种适用于H.264标准的高度并行双层流水线结构CAVLC编码器 总被引：1，自引：0，他引：1

乔飞魏鼎力杨华中汪蕙《电子学报》2010,38(7):1705-1710

本文提出一种适用于H.264编码器的高度并行、双层流水线的CAVLC硬件实现结构. 该结构设计了四路并行扫描统计模块,克服了以往结构每个时钟周期只能扫描一个系数的处理速率瓶颈;通过使用FIFO,平衡每一级流水线的处理延时,提高整个流水线工作的效率;在各个编码模块内部也大量采用流水线结构,提高数据吞吐率. 基于0.18μm CMOS工艺,新结构在166.7MHz工作频率下,综合等效门数为20685门,数据吞吐率为每秒处理27M系数块,甚至能够实时编码数字影视格式的视频（4096×2048@30fp/s）. 整个设计在数据吞吐率提高到以往结构的3.46倍的同时,硬件资源代价并没有显著的增加. 相似文献

2.

基于MPSoC的以太网接口设计与实现 总被引：1，自引：0，他引：1

李桦林宋同晶赵成伟《电子科技》2011,24(12):106-108,132

研究了以太网在多核系统中的数据通讯,设计了以太网IP核到MPSoC网络资源的硬件接口。阐述了设计中各模块的实现功能和设计方法,通过仿真和FPGA验证结果表明,以太网接口数据通讯具有实时和高吞吐率。实现了多核系统与网络数据的信息传递,硬件设计结构简单、性能稳定可靠相似文献

3.

基于低硬件复杂度、高速CORDIC的SVD模块设计与实现

下载免费PDF全文

张晓帆李广军《电子学报》2015,43(4):738-742

为降低实现高阶矩阵SVD时的硬件复杂度和计算延时,本文改进了CORDIC迭代结构,设计了一种用于SVD的低硬件复杂度、高速CORDIC计算单元.本文以2×2矩阵为例,基于XilinxVirtex6硬件平台设计并实现了使用优化后CORDIC计算单元的SVD模块,在19bit位宽下吞吐率达25.9Gbps.对比Xilinx IP core中同类模块,本文设计节省27.6%寄存器,27.7%查找表,实时性提高14%.对高阶矩阵,本文给出资源消耗趋势曲线,可证明优化后CORDIC计算单元能降低16阶矩阵SVD模块约40%的硬件复杂度. 相似文献

4.

基于IEEE802.11a的OFDM基带处理器的FPGA设计与实现

梁赫西《电视技术》2012,36(17):79-81,85

设计了一种用于IEEE802.11a的OFDM基带调制处理器.实现了IFFT、插入循环前缀、加窗及训练序列生成等模块的硬件设计.64点FFT/IFFT模块采用单碟形优化4路并行结构,提出了4路并行无冲突地址读写算法,有效地提高了数据吞吐率、减小了面积;采用多级流水线结构的基4蝶形单元,有效地减少了关键路径时延,其最高工... 相似文献

5.

抗差分功耗分析和差分故障分析的AES算法VLSI设计与实现

韩军曾晓洋赵佳《通信学报》2010,31(1):20-29

提出了一种抗差分功耗分析和差分故障分析的AES算法硬件设计与实现方案,该设计主要采用了数据屏蔽和二维奇偶校验方法相结合的防御措施.在保证硬件安全性的前提下,采用将128bit运算分成4次32bit运算、模块复用、优化运算次序等方法降低了硬件实现成本,同时使用3级流水线结构提高了硬件实现的速度和吞吐率.基于以上技术设计的AES IP核不仅具有抗双重旁道攻击的能力,而且拥有合理的硬件成本和运算性能. 相似文献

6.

一种基于阵列配置加速比模型的无损压缩算法

徐金甫刘露李伟王周闯杨宇航《电子与信息学报》2018,40(6):1492-1498

针对现有压缩算法通过增加复杂度来降低压缩率,获得信息高效传输的问题。该文提出阵列配置加速比模型,证明低压缩率不一定能提高传输效率,并找到影响信息传输效率的因子,即解压模块吞吐率和数据块压缩率。将影响因子与配置信息特征结合,设计了一种新的无损压缩算法,并硬件实现了解压模块,吞吐率可达到16.1 Gbps。采用AES, A5-1和SM4对无损压缩算法进行测试,然后与主流无损压缩算法LZW, Huffman, LPAQ1和Arithmetic对比。结果表明,整体压缩率相当,但该文压缩算法产生的数据块压缩率经过优化,不仅能满足加速需求,且具有高吞吐率的解压性能;该文无损压缩算法获得的配置加速比,比硬件吞吐率理想情况下的LPAQl, Arithmetic, Huffman, LZW算法分别高8%, 9%, 10%, 22%左右。相似文献

7.

一种QC_LDPC码译码器的设计

李振辉崔媛媛张洵颖沈绪榜杨佩《微电子学与计算机》2014,(9)

根据BP译码算法,设计了一种高速部分并行QC_LDPC码译码器结构,该结构适用于所有其校验矩阵具有准循环特性的LDPC码.针对传统BP译码器的结构复杂度高,系统运行频率低和吞吐率小等特点,本设计将BP译码算法中大量的复杂函数运算通过查找表的方法来实现;校验节点和变量节点的处理均采用5级流水线的方式;采用提前终止迭代译码策略.本设计能有效地减少译码器硬件实现复杂度,同时提高系统运行地频率和数据吞吐率. 相似文献

8.

改进型高吞吐率QC-LDPC码解码器设计

伊方龙汪鹏君《电路与系统学报》2011,(4):19-23

通过对LDPC码解码算法及解码器结构的研究,本文提出一种改进型高吞吐率QC-LDPC码解码器设计方案.综合考虑硬件复杂度和解码吞吐率,该方案利用分层解码算法和部分并行结构进行设计,并采用提前检测技术,消除冗余的迭代,实现高吞吐率.然后通过ModelSim SE6.0对该解码器进行仿真测试,验证了其功能的正确性,最后采用... 相似文献

9.

一种高性能RLWE加密处理器的设计与实现

王春华李斌杜高明李桢旻《电子科技》2022,(11):13-20

RLWE加密方案是后量子时代格密码系统中最有潜力的候选方案之一。针对RLWE加密处理器存在的高延迟、低吞吐率的问题,文中提出了一种高性能RLWE加密处理器硬件架构。该结构采用了两个NTT模块和4个蝶形模块的并行结构。在预计算和后计算过程中,利用4个蝶形模块中的乘法器进行并行计算。在加密过程中,NTT运算与密文计算并行处理。在NTT以及INTT运算的处理过程中,将数据的读写过程及计算过程进行乒乓操作,从而隐藏数据的读写周期,降低RLWE加密处理器的延迟,提高了RLWE加密处理器的吞吐率。设计资源复用的硬件架构,在加密、解密过程复用蝶形模块中的乘法器和加法器,INTT复用NTT的电路结构,从而降低加密处理器硬件资源消耗。在Spartan-6 FPGA开发平台上实现了参数为n=256,q=65 537的加密处理器。实验结果表明,文中提出的加密时间仅为12.18μs,吞吐率为21.01 Mbit·s^-1,解密时间仅为8.65μs,吞吐率为29.60 Mbit·s^-1。与其他加密处理器的对比实验表明,文中所提出的加密处理器的延迟和吞吐率均得到了改善。相似文献

10.

SIFT描述子高效扫描控制电路结构设计

李茜桑红石张静《微电子学与计算机》2014,(9)

尺度不变特征变化(SIFT)算法是目前在图像配准领域最为活跃的算法之一,但描述子特征向量描述部分的计算复杂度特别高,现有硬件实现方法的数据吞吐率仅为50%.为了克服该瓶颈,提出了一种新型的描述子特征向量扫描控制机制,占用硬件资源少,显著提高有效像素点的采样率.在硬件耗用仅提高1.16%的情况下,平均吞吐率达到77%,与传统方法相比吞吐率提高了54%. 相似文献

11.

SMS4密码算法高速引擎实现 总被引：1，自引：0，他引：1

下载免费PDF全文

周洲何一凡沈海斌赵旭鑫《电子器件》2007,30(4):1469-1471,1480

SMS4是国内公布的无线局域网分组密码算法,文章在分析该算法特点的基础上,介绍了一种单轮内流水的高速引擎实现结构.通过配置多个引擎进行二次流水或并行处理,可以灵活调整输出性能和硬件开销,以适应不同应用的需求.在FPGA的对比测试中,相同输出性能下,具有比32轮流水线结构更小的硬件开销.而且在单引擎应用时,SMS4加/解密系统可以非常容易的在低端的Spartan Ⅱ XC2S50 FPGA上实现. 相似文献

12.

Low cost high throughput pipelined architecture of 2-D 8 × 8 integer transforms for H.264/AVC

Meeturani Sharma Honey Durga Tiwari 《International Journal of Electronics》2013,100(8):1033-1045

In this article, we present the implementation of high throughput two-dimensional (2-D) 8?×?8 forward and inverse integer DCT transform for H.264. Using matrix decomposition and matrix operation, such as the Kronecker product and direct sum, the forward and inverse integer transform can be represented using simple addition operations. The dual clocked pipelined structure of the proposed implementation uses non-floating point adders and does not require any transpose memory. Hardware synthesis shows that the maximum operating frequency of the proposed pipelined architecture is 1.31?GHz, which achieves 21.05 Gpixels/s throughput rate with the hardware cost of 42932 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing. 相似文献

13.

基于FPGA的无损图像压缩算法实现

范文晶王召利王惠娟费聚锋李萧萧《电子科技》2016,29(11):126

针对采用传统硬件方法实现JPEG LS无损图像压缩算法时延时较多、实时性较差的问题,文中提出了一种基于FPGA的全流水线结构来实现JPEG LS算法。该结构以提高最大吞吐量为主要目标,通过多级流水线降低每一级运算的延迟,大幅提高了压缩算法的实时性,硬件电路操作频率可达120 MHz。相似文献

14.

一种新的高速自适应滤波的脉动实现结构

尚勇吴顺君项海格《电子与信息学报》2002,24(8):1022-1027

LMS算法具有计算简单,易于实现的特点,被广泛应用于诸如通信和雷达等许多信号处理领域,对其高速实现结构的研究一直是滤波器结构设计中的一个研究重点和热点。该文基于并行流水线LMS(PIPLMS)算法,设计了一种高速自适应滤波器脉动结构。该结构既具有脉动结构的高度流水特性,又具有一定的并行性。与已有结构相比,该文设计的结构具有更高的数据吞吐率。同时由于其并行特性,该结构还具有更低的系统功耗,更大的步长因子选择范围和更快的收敛速度。相似文献

15.

DVB-C系统中RS译码器的设计

高松彭大芹刘华平《电视技术》2011,35(5)

在对DVB-C系统信道外码的Matlab仿真的基础上,介绍了RS译码器各部分的实现结构,设计了一种用于DVB-C系统的RS译码器.基于改进的Euclidean算法,并用三级流水线结构实现以提高吞吐率,在FPGA中验证了设计的可行性与可靠性. 相似文献

16.

Design and implementation of a fast algorithm for modulated lappedtransform

Jing C.Y. Tai H.-M. 《Vision, Image and Signal Processing, IEE Proceedings -》2002,149(1):27-32

A new fast algorithm for the computation of the modulated lapped transform (MLT) is proposed and its efficient implementation using pipelining techniques and complex programmable logic device (CPLD) is presented. The new algorithm computes a length-M MLT via the length-M/2 fast Fourier transform (FFT). Computational overhead due to data shuffling in pre-processing and post-processing is offset in hardware realisation. Hence the overall throughput of the MLT computation for real-time applications is significantly improved. The pipelined CPLD architecture and circuitry are described in detail. Computational complexity of the proposed algorithm is analysed, and throughput improvement is verified by experimental results 相似文献

17.

FPGA based fast and high-throughput 2-slow retiming 128-bit AES encryption algorithm

Reza Rezaeian Farashahi Bahram Rashidi Sayed Masoud Sayedi 《Microelectronics Journal》2014

This paper presents a high throughput digital design of the 128-bit Advanced Encryption Standard (AES) algorithm based on the 2-slow retiming technique on FPGA. The C-slow retiming is a well-known optimization and high performance technique. It can enhance designs with feedback loops and automatically rebalances the registers in the design. The C-slow retiming can break the critical path of the design into finer pieces to improve the throughput of the design. The complexity of the C-slow retiming on FPGA is to find the best register allocation in the data path of the design so that by increasing the number of registers, relocation of the registers to balance the AES architecture be in the best mode, and the critical path be optimally pipelined and improved. In this paper, architecture of the AES algorithm is implemented in the gate level by high-speed and breakable structures that are desirable for the 2-slow retiming. The Mix-columns transformation is implemented based on multiplication by constants 2 and 3 modules with combinational logic circuits. This work has been successfully verified and synthesized using Xilinx ISE 11 byVirtex-5, XC5VLX85 FPGA. The proposed implementation achieves a high throughput of 86 Gb/s and high maximum operation frequency of 671.524 MHz whereas the highest throughput and the highest operation frequency reported in the literature are 73.737 Gb/s and 576.07 MHz, respectively. 相似文献

18.

High‐Speed Hardware Architectures for ARIA with Composite Field Arithmetic and Area‐Throughput Trade‐Offs

Sang‐Woo Lee Sang‐Jae Moon Jeong‐Nyeo Kim 《ETRI Journal》2008,30(5):707-717

This paper presents two types of high‐speed hardware architectures for the block cipher ARIA. First, the loop architectures for feedback modes are presented. Area‐throughput trade‐offs are evaluated depending on the S‐box implementation by using look‐up tables or combinational logic which involves composite field arithmetic. The sub‐pipelined architectures for non‐feedback modes are also described. With loop unrolling, inner and outer round pipelining techniques, and S‐box implementation using composite field arithmetic over GF(2⁴)², throughputs of 16 Gbps to 43 Gbps are achievable in a 0.25 μm CMOS technology. This is the first sub‐pipelined architecture of ARIA for high throughput to date. 相似文献

19.

Digital Median Filters

Luca Breveglieri Vincenzo Piuri 《The Journal of VLSI Signal Processing》2002,31(3):191-206

Several DSP algorithms need to remove high-frequency or impulsive noise while preserving edges, e.g., in speech and image processing applications: median filtering has been proved to be more effective for achieving this goal than other filtering techniques. Efficient architectural implementation for real-time applications involves a careful VLSI design, which takes into account modularity, regularity, adaptability, scalability, throughput, circuit complexity and fault tolerance.Four new architectural approaches are presented and evaluated in this paper to deal with different application and implementation constraints. They are: the serial-input polarizing median filter, the floating median filter, the pipelined polarizing median filter and the pipelined sorting median filter. The 1st and the 2nd architectures are based on majority voting, while the 3rd and the 4th ones are based on sorting techniques. All of them are designed so as to exhibit high scalability and to be easily pipelined for higher working frequencies. 相似文献

20.

Parallel VLSI design for a real-time video-impulse noise-reduction processor

Shih-Chang Hsia 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(4):651-658

High-quality televisions (TVs) such as improved digital TV, enhanced TV, and high-definition TV have become popular in recent years. However, impulse noise affects TV broadcasts. This paper proposes an efficient noise-removal algorithm using an adaptive digital signal-processing approach. Simulations have demonstrated that the new adaptive algorithm could efficiently reduce impulse noise even in highly corrupted images. In order to achieve real-time implementation, a cost-effective architecture is proposed using a parallel structure and pipelined processing. The proposed processor can achieve the throughput rate of 45M pixels/s using only 4k gates and two line buffers. Unlike median-filtering chips, this processor provides better filtering quality and its circuit is much less complex. 相似文献