首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《电子学报:英文版》2017,(6):1161-1167
By exploring symmetric cryptographic data level and instruction-level parallelism, the reconfigurable processor architecture for symmetric ciphers is presented based on Very-long instruction word (VLIW) structure. The application-specific instruction-set system for symmetric ciphers is proposed. As for the same arithmetic operation of symmetric ciphers, eleven kinds of reconfigurable cryptographic arithmetic units are designed by the reconfigurable technology. As to the requirement of high energy-efficient design, the loop buffer structure for instruction fetching unit is proposed to reduce the power consumption significantly with the same frequency as conventional, meanwhile, the chain processing mechanism is proposed to improve the cryptographic throughput without any area overhead. It has been fabricated with 0.18μm CMOS technology. The result shows that the processor can work up to 200MHz, and the fourteen kinds of cryptographic algorithms were mapped in the processor, the encryption throughput of AES, SNOW2.0 and SHA2 algorithm can achieve 1.19Gbps, 1.05Gbps, and 407Mbps respectively.  相似文献   

2.

In Internet of Things (IoT), the massive connectivity of devices and enormous data on the air have made information susceptible to different type of attacks. Cryptographic algorithms are used to provide confidentiality and maintain the integrity of the information. But small size, limited computational capability, limited memory, and power resources of the devices make it difficult to use the resource intensive traditional cryptographic algorithms for information security. In this scenario it becomes impertinent to develop lightweight security schemes for IoT. A thorough study on the lightweight cryptography as a solution to the security problem of resource-constrained devices in IoT has been presented in this work. This paper is a comprehensive attempt to provide an in-depth and state of the art survey of available lightweight cryptographic primitives till 2019. In this paper 21 lightweight block ciphers, 19 lightweight stream ciphers, 9 lightweight hash functions and 5 variants of elliptic curve cryptography (ECC) has been discussed i.e. in total 54 LWC primitives are compared in their respective classes. The comparison of the ciphers has been carried out in terms of chip area, energy and power, hardware and software efficiency, throughput, latency and figure of merit (FoM). Based on the findings it can be observed that AES and ECC are the most suitable for used lightweight cryptographic primitives. Several open research problems in the field of lightweight cryptography have also been identified.

  相似文献   

3.
密码专用可编程逻辑阵列(CSPLA)是一种数据流驱动的密码处理结构,该文针对不同规模的阵列结构和密码算法映射实现能效关系的问题,首先以CSPLA的特定硬件结构为基础,以分组密码的高能效实现为切入点,建立基于该结构的分组密码算法映射能效模型并分析影响能效的相关因素,然后进一步根据阵列结构上算法映射的基本过程提出映射算法,最后选取几种典型的分组密码算法分别在不同规模的阵列进行映射实验。结果表明越大的规模并不一定能够带来越高的能效,为取得映射的最佳能效,阵列的规模参数应当与具体的硬件资源限制和密码算法运算需求相匹配,CSPLA规模为4×4~4×6时映射取得最优能效,AES算法最优能效为33.68 Mbps/mW,对比其它密码处理结构,CSPLA具有较优的能效特性。  相似文献   

4.
该文以高能效为目标,建立了密码专用处理器能效概率模型,并指导高能效密码专用处理器体系结构设计。该文将面向密码领域的专用指令处理器设计空间探索问题描述为“1”值在配置矩阵中的定位问题,通过引入概率矩阵进一步将定位问题转化为最优配置的概率问题,并基于机器学习思想提出了密码专用处理器最高能效概率模型。实验证明,该文提出的能效概率模型平均经过2300次迭代输出最终结果,且预测准确率达到92.7%。根据最高能效概率模型,对密码专用处理器设计空间进行探索,获取满足高能效需求的密码专用处理器运算单元集合,以扩展指令的方式将其集成到开源通用64位RISCV处理器核心Araine中,提出高能效密码专用处理器体系结构。将该处理器在CMOS 55 nm工艺下进行逻辑综合,结果表明,该文提出的RISCV密码专用处理器与扩展前相比面积增大了426874 μm2,关键延迟增加了0.51 ns,完成密码算法总时间面积积增幅之和为0.46,执行常见密码算法能效比在1.61~35.16 Mbps/mW范围内。  相似文献   

5.
Cryptography circuits for portable elec-tronic devices provide user authentication and secure data communication. These circuits should, achieve high per-formance, occupy small chip area, and handle several cryptographic algorithms. This paper proposes a high-performance ASIP (Application specific instruction set processor) for five standard cryptographic algorithms in-cluding both block ciphers (AES, Camellia, and ARIA) and stream ciphers (ZUC and SNOW 3G). The processor reaches ASIC-like performance such as 11.6 Gb/s for AES encryption, 16.0 Gb/s for ZUC, and 32.0 Gb/s for SNOW 3G, etc under the clock frequency of 1.0 GHz with the area consumption of 0.56 mm2 (65 nm). Compared with state-of-the-art VLSI designs, our design achieves high perfor-mance, low silicon cost, low power consumption, and suf-ficient programmability. For its programmability, our de-sign can offer algorithm modification when an algorithm supported is unfortunately cracked and invalid to use. The product lifetime of our design can thus be extended.  相似文献   

6.
Security processors are used to implement cryptographic algorithmswith high throughput and/or low energy consumption constraints. The designof these processors is a balancing act between flexibility and energy consumption.The target is to create a processor with just enough programmability to covera set of algorithms—an application domain. This paper proposes GEZEL,a design environment consisting of a design language and an implementationmethodology that can be used for such domain specific processors. We use thesecurity domain as driver, and discuss the impact of the domain on the targetarchitecture. We also present a methodology to create, refine and verify asecurity processor.  相似文献   

7.
Wireless networks are very widespread nowadays, so secure and fast cryptographic algorithms are needed. The most widely used security technology in wireless computer networks is WPA2, which employs the AES algorithm, a powerful and robust cryptographic algorithm. In order not to degrade the Quality of Service (QoS) of these networks, the encryption speed is very important, for which reason we have implemented the AES algorithm in an FPGA, taking advantage of the hardware characteristics and the software-like flexibility of these devices. In this paper, we propose our own methodology for doing an FPGA-based AES implementation. This methodology combines the use of three hardware languages (Handel-C, VHDL and JBits) with partial and dynamic reconfiguration, and a pipelined and parallel implementation. The same design methodology could be extended to other cryptographic algorithms. Thanks to all these improvements our pipelined and parallel implementation reaches a very high throughput (24.922 Gb/s) and the best efficiency (throughput/area ratio) of all the related works found in the literature (6.97 Mb/s per slice).  相似文献   

8.
In this paper, we characterize the performance of datapath architectures of the Advanced Encryption Standard (AES). These architectures are parameterized by a datapath width of 8, 16, 32, 64, or 128 bits and, for the 128-bit width, an unrolling factor of 1, 2, 5 or 10. Composite field S-boxes are adopted for all the architectures and shift registers based ShiftRows and MixColumns components are used for architectures with datapath widths of less than 128 bits. Their performance in terms of area, peak power and average energy is benchmarked using a 90-nm standard cell CMOS technology under a variety of throughput requirements. Through this characterization, the performance trade-offs affected by the architecture parameters are extensively explored. The parameters leading to the best performance are identified. It is found that the 8-bit width datapath, which is conventionally adopted for resource efficient purposes, has the worst energy efficiency and does not result in the minimal peak power among the architectures. As well, the 16, 32 and 64-bit width AES datapath architectures are newly considered or represent improvements over previous work.  相似文献   

9.
Security protocols, such as IPSec and SSL, are being increasingly deployed in the context of networked embedded systems. The resource-constrained nature of embedded systems and, in particular, the modest capabilities of embedded processors make it challenging to achieve satisfactory performance while executing security protocols. A promising approach for improving performance in embedded systems is to use application-specific instruction set processors that are designed based on configurable and extensible processors. In this paper, we perform a comprehensive performance analysis of the IPSec protocol on a state-of-the-art configurable and extensible embedded processor (Xtensa from Tensilica Inc.). We present performance profiles of a lightweight embedded IPSec implementation running on the Xtensa processor, and examine in detail the various factors that contribute to the processing latencies, including cryptographic and protocol processing. In order to improve the efficiency of IPSec processing on embedded devices, we then study the impact of customizing an embedded processor by synergistically 1) configuring architectural parameters, such as instruction and data cache sizes, processor-memory interface width, write buffers, etc., and 2) extending the base instruction set of the processor using custom instructions for both cryptographic and protocol processing. Our experimental results demonstrate that upto 3.2times speedup in IPSec processing is possible over a popular embedded IPSec software implementation  相似文献   

10.
在通用处理器上进行信号处理是软件无线电发展的方向之一,现有的共享存储并行编程(OpenMP)和直接线程并行法难以对信号处理进行并行加速。针对串行算法的并行化问题,引入多核流水线方法,对传统串行方法和多核流水线的实时性进行了分析对比。针对多核流水线的同步问题,研究了一种分布式的自适应线程同步方法。结合信号处理实例,对串行方法和多核流水线的实时性进行测试,结果表明多核流水线的吞吐率是串行方法的2.1倍,处理能力大大提高。  相似文献   

11.
This paper presents a standard-cell-based semiautomatic design methodology for a new conceptual countermeasure against electromagnetic (EM) analysis and fault-injection attacks. The countermeasure, called the EM attack sensor, utilizes LC oscillators that react to variations in the EM field around a cryptographic LSI caused by a microprobe brought near the LSI. A dual-coil sensor architecture with digital calibration based on lookup table programming can prevent various microprobe-based EM attacks that cannot be thwarted by conventional countermeasures. All components of the sensor core are semiautomatically designed by standard electronic design automation tools with a fully digital standard cell library and hence minimum design cost. This sensor can therefore be scaled together with the cryptographic LSI to be protected. The sensor prototype is designed based on the proposed methodology together with a 128-bit-key composite AES processor in 0.18-\(\upmu \hbox {m}\) CMOS with overheads of only 2 % in area, 9 % in power, and 0.2 % in performance, respectively. The countermeasure has been validated against a variety of EM attack scenarios. In particular, some further experimental results are shown for a detailed discussion.  相似文献   

12.
Network security for mobile devices is in high demand because of the increasing virus count. Since mobile devices have limited CPU power, dedicated hardware is essential to provide sufficient virus detection performance. A TCAM-based virus-detection unit provides high throughput, but also challenges for low power and low cost. In this paper, an adaptively dividable dual-port BiTCAM (unifying binary and ternary CAMs) is proposed to achieve a high-throughput, low-power, and low-cost virus-detection processor for mobile devices. The proposed dual-port BiTCAM is realized with the dual-port AND-type match-line scheme which is composed of dual-port dynamic AND gates. The dual-port designs reduce power consumption and increase storage efficiency due to shared storage spaces. In addition, the dividable BiTCAM provides high flexibility for regularly updating the virus-database. The BiTCAM achieves a 48% power reduction and a 40% transistor count reduction compared with the design using a conventional single-port TCAM. The implemented 0.13 mum processor performs up to 3 Gbps virus detection with an energy consumption of 0.44 fJ/pattern-byte/scan at peak throughput.  相似文献   

13.
The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function‐safe architecture is proposed for a fault‐tolerance system such as an electronics system for autonomous cars. The general‐purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self‐recovering cache and dynamic lockstep function. The function‐safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28‐nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function‐safe design can have ISO26262 ASIL‐D with the single‐point fault‐tolerance rate of 99.64%.  相似文献   

14.
Next-generation mobile devices will continue to demand high processing power for imaging applications. The expected performance is in the class of supercomputers, but delivered with limited energy and memory bandwidth for embedded systems. This article advocates a streaming computation model that leverages the deterministic access patterns in imaging applications to deliver the necessary processing throughput. A reconfigurable datapath connects a set of functional units, forming a computation pipeline to offer energy efficiency. The architecture and implementation of a stream processor are presented along with the memory subsystem to support stream data transfers. The results show speedup ranging from a factor of 2 to 28 for imaging applications, offering favorable comparison against scalar processors.  相似文献   

15.
For the power analysis attack of the AES cryptographic algorithm with the single information leakage point,the traditional attack method does not use as much information as possible in the algorithm and power trace.So there are some problems such as required more power traces,the low utilization rate of information and so on.A novel method of muti-point joint power analysis attack against AES was proposed to solve the problems.And taking the correlation power analysis attack as an example,the detailed attack process was presented.The operations of the round key addition and the SubBytes were chosen as the attack intermediate variable at the same time.Then the joint power leakage function was con-structed for the attack intermediate variable.And the multi-point joint correlation energy analysis attack was given.Aiming at the AES cryptographic algorithm implemented on the smart card,the multi-point joint power analysis attack,the correlation power analysis attack with the single information leakage point in the key addition and the SubBytes were conducted.The measured results validate the proposed method is effective.It also shows that the proposed method has the advantages of high success rate and less power traces comparing with the single information leakage point.  相似文献   

16.
Pervasive computing has turned many ordinary commodity products to smart and digital computing devices. Though these devices are mostly equipped with low-cost processors offering limited computing power, they are often requested to handle user-sensitive data. This evidently calls for the integration of different security services that typically involves computationally expensive cryptography. In this context, lightweight cryptographic constructions came recently up to minimize the computational burden on such constrained devices. Unfortunately, many of those constructions were too simplistic to preserve long-lasting confidence in their security. Therefore we aim for another approach in this work and implement standardized and well-established cryptography on an alternative, lightweight platform, namely an asynchronous GA144 ultra-low-powered multi-core processor with 144 tiny cores. We demonstrate that symmetric and asymmetric cryptography such as AES and RSA can be realized on this low-end device. With energy consumption being as low as 0.63 μJ and 22.3 mJ, this platform achieves a performance of 38 μs and 462.9 ms per AES and RSA operation, respectively.This translates to an energy consumption and computation time that is significantly lower than many lightweight implementations reported so far. We finally emphasize that this low-power and asynchronous operation of cryptography does not eliminate the threat of physical attacks, in particular power attacks. We evaluate the side-channel resistance of our design and identified that less than 5,000 measurements are already sufficient to fully recover the 128-bit key of the unprotected AES implementation.  相似文献   

17.
信号处理机是雷达侦察设备的重要组成部分,提高其工作性能、处理能力、可靠性,应用灵活性、通用性、标准化,以及减小体积、降低功耗都具有极其重要的意义。首先介绍信号处理硬件平台的现状及需求,然后提出了一种基于CPCI标准由TMS320C6455双DSP加FPGA通用信号处理平台的设计方案,并详细讨论了该平台的硬件组成、工作原理、特点以及实现方法。  相似文献   

18.
This article examines vulnerabilities to power analysis attacks between software and hardware implementations of cryptographic algorithms. Representative platforms including an Atmel 89S8252 8-bit processor and a 0.25 um 1.8 v standard cell circuit are proposed to implement the advance encryption standard (AES). A simulation-based experimental environment is built to acquire power data, and single-bit differential power analysis (DPA), and multi-bit DPA and correlation power analysis (CPA) attacks are conducted on two implementations respectively. The experimental results show that the hardware implementation has less data-dependent power leakages to resist power attacks. Furthermore, an improved DPA approach is proposed. It adopts hamming distance of intermediate results as power model and arranges plaintext inputs to differentiate power traces to the maximal probability. Compared with the original power attacks, our improved DPA performs a successful attack on AES hardware implementations with acceptable power measurements and fewer computations.  相似文献   

19.
Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing power of GPU to offer fast Turbo decoding throughput. Several techniques are used to improve the performance of the decoder. To fully utilize the computational resources on GPU, our decoder can decode multiple codewords simultaneously, divide the workload for a single codeword across multiple cores, and pack multiple codewords to fit the single instruction multiple data (SIMD) instruction width. In addition, we use shared memory judiciously to enable hundreds of concurrent multiple threads while keeping frequently used data local to keep memory access fast. To improve efficiency of the decoder in the high SNR regime, we also present a low complexity early termination scheme based on average extrinsic LLR statistics. Finally, we examine how different workload partitioning choices affect the error correction performance and the decoder throughput.  相似文献   

20.
Two Kalman filter algorithms are implemented with a DSP32C processor. These two Kalman filters use conventional matrix operation and U-D factorization algorithms, respectively. The real-time processing performance of each algorithm is evaluated in terms of throughput, program and data memory sizes. Both DSP32C assembly and high-level C language programs of these two algorithms are developed (a total of four programs) for evaluating the coding efficiency. It is observed that both algorithms can be more efficiently programmed by using assembly language, a matrix-based algorithm enjoys its simple and regular operations so that less program memory is required in both assembly and in C languages, the U-D factorization algorithm involves fewer multiply-accumulate operations and provides a fast throughput in C language only, and the advantage of less multiply-accumulate operations in U-D factorization algorithm no longer exists in assembly language when the number of states of a Kalman filter is large  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号