期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈子钰何军郭翔宇《计算机工程与科学》2022,44(7):1162-1170

介绍了国际主流密码算法AES和SHA,综述了当前主流通用处理器架构的密码算法指令发展现状。为提高国产通用处理器在密码安全领域的性能,设计了面向国产通用处理器的AES和SHA密码算法扩展指令集,实现了能全流水执行的AES和SHA密码算法指令执行部件,并进行了实现评估和优化。该密码算法指令执行部件的工作频率达2.0 GHz,总面积为17 644μm²,总功耗为59.62 mW,相比软件采用原有通用指令实现,对AES密码算法的最小加速比为8.90倍,对SHA密码算法的最小加速比为4.47倍,在指令全流水执行时可达19.30倍,显著地改善了处理器执行AES和SHA密码算法的性能,有望应用于国产通用处理器并进一步提升国产通用处理器芯片在密码安全应用领域的竞争力。此外,该密码算法指令部件还可以封装成专门用于支持密码算法的IP,应用在密码安全领域的专用芯片中。相似文献

2.

An 8-bit systolic AES architecture for moderate data rate applications

Sheikh Muhammad Farhan Shoab A. Khan Habibullah Jamal 《Microprocessors and Microsystems》2009,33(3):221-231

The complexity involved in mapping an algorithm to hardware is a function of the controller logic and data path. Minimizing data path size can lead to significant savings in hardware area and power dissipation. This paper presents an implementation of a novel architectural transformation technique for mapping a word bit wide algorithm to byte vector serial architecture. The technique divides the input word to several bytes and then traces each byte for extracting architectural transformation. The technique is applied on Advanced Encryption Standard (AES) algorithm which is non-linear in nature. Using this technique, the 32-bit AES algorithm is transformed into a byte-systolic architecture. The novelty of the technique is more pronounced around the mix column design which is the most complex part of the AES algorithm. The complex matrix multiplication component and standard transformations of the 32-bit AES algorithm are transformed to support 8-bit operations. The resulted AES architectures reuse same logic resources for key expansion and encryption/decryption. The proposed design offers moderate data rates in the range of 41 Mbps for encryption and 37 Mbps for decryption while utilizing 236 and 280 slices, respectively, on Xilinx Virtex II xc2v1000-6 FPGA. Comparison results show significant gain in throughput when compared with other 8-bit designs. This makes it a viable data/communication security solution for a variety of embedded and consumer electronics. 相似文献

3.

Compact and unified hardware architecture for SHA-1 and SHA-256 of trusted mobile computing

Mooseop Kim Deok Gyu Lee Jaecheol Ryou 《Personal and Ubiquitous Computing》2013,17(5):921-932

This paper presents a compact and unified hardware architecture implementing SHA-1 and SHA-256 algorithms that is suitable for the mobile trusted module (MTM), which should satisfy small area and low-power condition. The built-in hardware hash engine in a MTM is one of the most important circuit blocks and dominates the performance of the whole platform because it is used as a key primitive to support most MTM commands concerning to the platform integrity and the command authentication. Unlike the general trusted platform module (TPM) for PCs, the MTM, that is to be employed in mobile devices, has very stringent limitations with respect to available power, circuit area, and so on. Therefore, MTM needs the spatially optimized architecture and design method for the construction of a compact SHA hardware. The proposed hardware for unified SHA-1 and SHA-256 component can compute a sequence of 512-bit data blocks and has been implemented into 12,400 gates of 0.25 μm CMOS process. Furthermore, in the processing speed and power consumption, it shows the better performance in comparison with commercial TPM chips and software-only implementation. The highest operation frequency and throughput of the proposed architecture are 137 MHz and 197.6 Mbps, respectively, which satisfy the processing requirement for the mobile application. 相似文献

4.

On the design and implementation of a RISC processor extension for the KASUMI encryption algorithm

Tomas Balderas-Contreras Author Vitae Author Vitae Claudia Feregrino-Uribe Author Vitae 《Computers & Electrical Engineering》2008,34(6):531-546

Modern cellular networks allow users to transmit information at high data rates, have access to IP-based networks deployed around the world, and access to sophisticated services. In this context, not only is it necessary to develop new radio interface technologies and improve existing core networks to reach success, but guaranteeing confidentiality and integrity during transmission is a must. The KASUMI block cipher lies at the core of both the f8 data confidentiality algorithm and the f9 data integrity algorithm for Universal Mobile Telecommunications System networks. KASUMI implementations must reach high performance and have low power consumption in order to be adequate for network components. This paper describes a specialized processor core designed to efficiently perform the KASUMI algorithm. Experimental results show two orders of magnitude performance improvement over software only based implementations. We describe the used design technique that can also be applied to implement other Feistel-like ciphering algorithms. The proposed architecture was implemented on a FPGA, results are presented and discussed. 相似文献

5.

小波滤波器低功耗并行的VLSI结构设计

兰旭光郑南宁薛建儒王飞刘跃虎《计算机研究与发展》2005,42(11):1889-1895

提出一种基于行和提升算法,实现JPEG2000编码系统中的小波正反变换(discretewavelettransform)的低功耗、并行的VLSI结构设计方法·利用该方法所得结构一次处理两行数据,分时复用行处理器,使行处理器内以及行、列处理器实现并行处理,且最小化行缓存·对称扩展通过嵌入式电路实现,整个结构采用流水线设计方法优化,加快了变换速度,增加了硬件资源利用率,降低了功耗,效率几乎达到100%·小波滤波器正反变换结构已经经过FPGA验证,可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中· 相似文献

6.

SHA2 and SHA-3 accelerator design in a 7 nm technology within the European Processor Initiative

《Microprocessors and Microsystems》2021

This paper proposes the architecture of the hash accelerator, developed in the framework of the European Processor Initiative. The proposed circuit supports all the SHA2 and SHA-3 operative modes and is to be one of the hardware cryptographic accelerators within the crypto-tile of the European Processor Initiative. The accelerator has been verified on a Stratix IV FPGA and then synthesised on the Artisan 7 nanometres TSMC silicon technology, obtaining throughputs higher than 50 Gbps for the SHA2 and 230 Gbps for the SHA-3, with complexity ranging from 15 to about 30 kGE and estimated power dissipation of about 13 (SHA2) to 26 (SHA-3) mW (supply voltage 0.75 V). The proposed design demonstrates absolute performances beyond the state-of-the-art and efficiency aligned with it. One of the main contributions is that this is the first SHA-2 SHA-3 accelerator synthesised on such advanced technology. 相似文献

7.

一种FPGA高速访问USB设备的设计方案

曾志斌 ;袁雨舟 ;姚引娣《单片机与嵌入式系统应用》2014,(8):46-48

针对FPGA访问USB设备存在传输速率低、资源消耗大、开发复杂的缺点,提出了一种将ARM处理器与FPGA相结合实现高速访问USB设备的方案。该方案利用ARM处理器的USB Host读取USB设备数据并缓存于高速内存,采用乒乓机制通过SRAM接口将数据传给FPGA。经测试,数据传输速率可以达到48Mbps。该方案具有开发难度小,资源占用率低和传输速率高的特点,适合于FPGA高速读取大量外部数据。相似文献

8.

FPGA架构上面向稀疏矩阵求解的静态调度算法

王晞阳陈继林李猛刘首文《计算机工程》2022,48(7):199-205+213

在电力系统仿真中,大型稀疏矩阵的求解会消耗大量存储和计算资源,未有效利用矩阵的稀疏性将导致存储空间浪费以及计算效率低下的问题。当前关于稀疏矩阵求解算法的研究主要针对众核加速硬件,聚焦于挖掘层次集合的并行度以提升算法的并行效率,而在众核处理器架构上频繁地进行缓存判断及细粒度访问可能导致潜在的性能问题。针对基于现场可编程门阵列（FPGA）的下三角稀疏矩阵求解问题,在吴志勇等设计的FPGA稀疏矩阵求解器硬件结构的基础上,提出一种静态调度求解算法。通过对稀疏矩阵进行预处理,设计数据分布和指令排布流程,将下三角稀疏矩阵的求解过程静态映射到多个FPGA片上的处理单元,以实现下三角稀疏矩阵在FPGA上的并行高速求解。将串行算法中所有的隐式并行关系排布到缓冲中,使得所有计算单元都能实现计算、访存和单元间通信的高效并行,从而最大限度地利用FPGA的硬件资源。典型算例上的测试结果表明,相较传统的CPU/GPU求解算法,该算法能够实现5~10倍的加速效果。相似文献

9.

具有高效缓冲策略的运动估计阵列处理器结构

苏睿刘贵忠张彤宇《计算机学报》2006,29(10):1772-1779

基于改进的线性处理器阵列,提出了一种用于全搜索运动估计的阵列处理器结构,它可以并行执行运算而只要求串行的数据输入.分析表明这种结构不仅执行效率高,而且内部缓冲区很小.由于其简单的结构和规则的数据流,它可以方便地在FPGA器件中实现,用作实时编码器的协处理器. 相似文献

10.

基于FPGA安全封装的身份认证模型研究

严博欧庆于吴晓平《电子技术应用》2009,35(1)

在深入分析基于FPGA的安全封装结构的基础上,针对其实际应用中身份认证的安全性要求,重点研究并设计了一种适用于FPGA安全封装结构的身份认证模型。该模型通过利用RSA公钥密码算法和SHA-1算法,实现了对用户及FPGA的双向认证。该模型具备良好的可移植性和安全性,能够有效抵御多种攻击,为基于FPGA的安全封装应用提供了强有力的用户权限认证。相似文献

11.

面向异构多核处理器的FPGA验证

李小波唐志敏李文《计算机研究与发展》2021,58(12):2684-2695

随着处理器架构的发展,高性能异构多核处理器不断涌现.由于高性能异构多核处理器的设计十分复杂,为了降低设计风险,缩短验证周期,提前进行软件开发,复现硅后问题等,通常需要搭建现场可编程门阵列(field programmable gate array, FPGA)的原型验证平台,并基于FPGA平台开展种类繁多,功能各异的软硬协同验证和调试工作.提出的基于同构FPGA平台对异构多核高性能处理器的FPGA调试、验证方法,有效地利用了异构多核处理器的架构特征,同构FPGA的对称特点,以层次化的方法自顶向下划分FPGA,自底向上构建FPGA平台.结合差速桥、自适应延迟调节、内嵌的虚拟逻辑分析仪(virtual logic analyzer, VLA)等技术可快速完成FPGA平台的点亮(bring-up)和部署.所提出的多核互补,核间替换模拟的调试SHELL等方法可以快速完整地对目标高性能异构多核处理器进行FPGA验证.通过该FPGA原型验证平台,成功地完成了硅前验证,软硬件协同开发和测试,硅后问题复现工作,并为下一代处理器架构设计提供了快速的硬件平台. 相似文献

12.

Robust feature extraction algorithm suitable for real-time embedded applications

Abiel Aguilar-González Miguel Arias-Estrada François Berry 《Journal of Real-Time Image Processing》2018,14(3):647-665

Smart cameras integrate processing close to the image sensor, so they can deliver high-level information to a host computer or high-level decision process. One of the most common processing is the visual features extraction since many vision-based use-cases are based on such algorithm. Unfortunately, in most of cases, features detection algorithms are not robust or do not reach real-time processing. Based on these limitations, a feature detection algorithm that is robust enough to deliver robust features under any type of indoor/outdoor scenarios is proposed. This was achieved by applying a non-textured corner filter combined to a subpixel refinement. Furthermore, an FPGA architecture is proposed. This architecture allows compact system design, real-time processing for Full HD images (it can process up to 44 frames/91.238.400 pixels per second for Full HD images), and high efficiency for smart camera implementations (similar hardware resources than previous formulations without subpixel refinement and without non-textured corner filter). For accuracy/robustness, experimental results for several real-world scenes are encouraging and show the feasibility of our algorithmic approach. 相似文献

13.

三维扫描仪中人头轮廓线提取方法及实现

雷海军李德华钱铮铁雷丰中《计算机工程与应用》2003,39(19):37-39

为解决三维扫描仪的实时性,文章提出了以FPGA处理器与PC主机交互式共同完成提取轮廓线的快速算法。该算法由两个阶段组成:第一阶段由主机计算背景与目标的分割阈值。第二阶段由FPGA处理器实时检测轮廓线位置信息。该快速算法具有计算简单、实现速度快等优点,并且减少了传输与存储的数据量,减轻了后面主机计算工作量。同时,省掉了昂贵的图像采集压缩卡与高速硬盘,降低了成本。可重构FPGA处理器设计成流水线结构,对每个像素的平均处理时间控制在70ns以内。仿真与综合结果表明:从一帧720576标准PAL制视频图像中提取轮廓线信息可在40ms内实时完成。相似文献

14.

SM3哈希算法的硬件实现与研究

刘宗斌马原荆继武夏鲁宁《信息网络安全》2011,(9):191-193,218

随着信息社会的进一步发展,哈希算法作为保护信息完整性的重要密码算法,它的应用越来越广泛。美国NIST组织已经顺利完成了哈希算法标准SHA0,SHA1和SHA2的征集工作,并且SHA-3的征集工作将于2012年结束。SM3作为国内商业应用中的国家标准哈希算法,于2010年12月公开。本文在硬件平台FPGA上实现了高吞吐率的SM3,经过优化处理SM3在Xilinxv5平台上的吞吐率可以达到1．5Gbps左右,并且就SM3在FPGA上的效率和SHA1,SHA2以及SHA-3的候选算法BLAKE在FPGA平台上的效率做了比较和分析。相似文献

15.

基于国产处理器的可信系统研究与实现

苏培培刘宝明《电子技术应用》2012,(1):136-138

根据可信计算组织TCG的可信计算规范,结合信任链的思想,基于国产处理器龙芯2F以及可信平台模块TPM,设计了基于龙芯处理器的可信计算平台,包括可信系统硬件层、可信BootLoader层和可信操作系统层,并设计了整个系统的启动程序,建立信任链,实现基于国产处理器的可信系统构建。相似文献

16.

FPGA实时实现PGA算法的研究

郝智泉王贞松刘波《计算机研究与发展》2008,45(2):342-347

合成孔径雷达(SAR)成像具有数据量巨大、算法比较复杂等特点.如何实时实现SAR成像的相关算法是嵌入式高性能计算领域一个值得研究的问题. FPGA以其高性能、可重构等优势,被越来越多地应用到嵌入式高性能计算领域中作为一种高效低成本的解决方案.针对SAR成像中多普勒调频率估计的经典算法——PGA算法,以FPGA作为实现平台,通过对算法的本质的挖掘,提出了适于FPGA实时实现的对于经典算法的改进算法.同时也阐述了将改进算法映射到FPGA实现的设计过程.实验结果表明,改进的算法较经典的PGA算法明显地减少了迭代次数,在SOC中通过硬件的运算精度能够满足系统的要求. 相似文献

17.

OFDM系统中傅里叶变换的硬件实现方法 总被引：1，自引：0，他引：1

汤晓峰戎蒙恬邓波林巍《计算机工程与应用》2005,41(25):106-108,111

在宽带OFDM系统中,FFT处理器是一个重要组成部分。文章介绍了一种适合OFDM系统的高效FFT处理器的VLSI设计方法,针对高效的特点采用了改进的Radix-4DIT算法,乒乓RAM的设计思想,以及流水线结构。根据Radix-4算法的特点,在基4运算单元CU(Computing Unit)设计,存取地址混序,每级迭代控制,数据对齐等方面也有一些特点。文章针对256点,36bit位长,浮点复数进行FFT运算。目前,此FFT处理器已经通过了FPGA验证,处理能力为100MSPS。相似文献

18.

Implementation of a secure TLS coprocessor on an FPGA

《Microprocessors and Microsystems》2016

In this paper we present a secure implementation architecture of a coprocessor for the TLSv1.2 protocol, on an FPGA. Techniques were used that increase the resistance of the design to side channel attacks, and also protect the private key data from software based attacks. The processor was implemented with a secure true random number generator which incorporates failure detection and thorough post-processing of the random bitstream. The design also includes hardware for signature generation and verification; based on elliptic curve algorithms. The algorithms used for performing the elliptic curve arithmetic were chosen to provide resistance against SPA and DPA attacks. Implementations of the AES and SHA256 algorithms are also included in order to provide full hardware acceleration for a specific suite of the TLSv1.2 protocol. The design is analysed for area and speed on a Virtex 5 FPGA. 相似文献

19.

一种嵌入式处理器上的HOS设计

下载免费PDF全文

尹震宇赵海王金英徐久强林恺《计算机工程》2008,34(5):268-270

在嵌入式处理器上提出一种基于硬件操作系统(HOS)的设计结构,将以往依靠操作系统复杂软件代码实现的系统调度、控制等处理过程通过可编程微码执行方式由处理器硬件执行。通过在51处理器内核的基础上添加任务调度等HOS设计,实现了一款针对家电嵌入式系统带有HOS支持的处理器。将该设计下载到FPGA芯片中替换空调控制器中的嵌入式处理器,测试结果表明,与原系统相比较,该处理器执行效率更高。相似文献

20.

SAR自动聚焦处理器的设计与实现

下载免费PDF全文

郝智泉王贞松《计算机工程》2007,33(10):255-257

合成孔径雷达(SAR)成像具有数据量巨大、算法比较复杂等特点。如何实时实现SAR成像的相关算法是嵌入式高性能计算领域一个值得研究的问题。针对SAR成像中多普勒调频率估计的经典算法PGA算法，阐述了算法的实时化改进。介绍了基于FPGA的SAR自动聚焦处理器的系统级设计及PGA算法到FPGA逻辑实现的映射过程。相似文献