期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FPGA based unified architecture for public key and private key cryptosystems

Yi WANG Renfa LI 《Frontiers of Computer Science》2013,7(3):307-316

Recently, security in embedded system arises attentions because of modern electronic devices need cautiously either exchange or communicate with the sensitive data. Although security is classical research topic in worldwide communication, the researchers still face the problems of how to deal with these resource constraint devices and enhance the features of assurance and certification. Therefore, some computations of cryptographic algorithms are built on hardware platforms, such as field program gate arrays (FPGAs). The commonly used cryptographic algorithms for digital signature algorithm (DSA) are rivest-shamir-adleman (RSA) and elliptic curve cryptosystems (ECC) which based on the presumed difficulty of factoring large integers and the algebraic structure of elliptic curves over finite fields. Usually, RSA is computed over GF(p), and ECC is computed over GF(p) or GF(2^p). Moreover, embedded applications need advance encryption standard (AES) algorithms to process encryption and decryption procedures. In order to reuse the hardware resources and meet the trade-off between area and performance, we proposed a new triple functional arithmetic unit for computing high radix RSA and ECC operations over GF(p) and GF(2^p), which also can be extended to support AES operations. A new high radix signed digital (SD) adder has been proposed to eliminate the carry propagations over GF(p). The proposed unified design took up 28.7% less hardware resources than implementing RSA, ECC, and AES individually, and the experimental results show that our proposed architecture can achieve 141.8MHz using approximately 5.5k CLBs on Virtex-5 FPGA. 相似文献

2.

A versatile Montgomery multiplier architecture with characteristic three support

E. Öztürk 《Computers & Electrical Engineering》2009,35(1):71-85

We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2ⁿ), GF(3^m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%. 相似文献

3.

New Technique for Decoding Codes in the Rank Metric and Its Cryptography Applications

Ourivski A. V. Johansson T. 《Problems of Information Transmission》2002,38(3):237-246

We present two new algorithms for decoding an arbitrary (n, k) linear rank distance code over GF(q ^N). These algorithms correct errors of rank r in O((Nr)³ q ^(r–1)(k+1)) and O((k + r)³ r ³ q ^{(r–1)(N–r)}) operations in GF(q) respectively. The algorithms give one of the most efficient attacks on public-key cryptosystems based on rank codes, as well as on the authentication scheme suggested by Chen. 相似文献

4.

Current-mode circuit-level technique to design variation-aware nanoscale summing circuit for ultra-low power applications

Guduri Manisha Mehra Rishab Srivastava Pragya Islam Aminul 《Microsystem Technologies》2017,23(9):4045-4056

Prodigious demand for fast performance-ultra low power electronic devices has insinuated the discovery of circuit style that promises reduced propagation delay (t _p), as well as low power dissipation (PWR). MOS current mode logic (MCML) style has emerged as a promising logic style that offers high speed of operation at the expense of acceptable power dissipation. This paper proposes a MCML full adder which employs a load controller circuit. It compares MCML full adder with hybrid-CMOS full adder in terms of various design metrics in superthreshold as well as subthreshold regions. MCML topology with load controller offers a high speed of operation and low power dissipation in superthreshold region. Same circuit arrangement, when operated in subthreshold region also delivers higher operating speed with ultralow power dissipation compared to its hybrid-CMOS counterpart. Power dissipation analysis established MCML based full adder more robust compared to its hybrid-CMOS counterpart. In particular, MCML full adder design achieves 3.77× (2.38×) improvement in propagation delay, 10.43× (3.45×) improvement in average power dissipation, 39.43× (8.21×) lower power-delay product (PDP) and 149.07× (19.55×) improvement in energy-delay product (EDP) in superthreshold (subthreshold) regions of operation at 16-nm technology node. The above results are also validated using TSMC’s industry standard 0.18-μm technology model parameters and a similar trend is observed in the design metrics of the MCML and hybrid-CMOS full adder circuits. In addition, noise performance of the above mentioned circuits is also carried out. It is observed that the noise induced by the hybrid-CMOS full adder is about 14× to that of the MCML full adder.

相似文献

5.

Design and leakage assessment of side channel attack resistant binary edwards Elliptic Curve digital signature algorithm architectures

《Microprocessors and Microsystems》2019

Considering that Elliptic Curve Digital Signature Algorithm (ECDSA) implementations need to be efficient, flexible and Side Channel Attack (SCA) resistant, in this paper, a design approach and architecture for ECDSA and the underlined scalar multiplication operation is proposed for GF(2^k), satisfying the above three directives. To achieve that, in the paper, Binary Edwards Curves (BECs) are adopted as an alternative to traditional Weierstrass Elliptic Curves (ECs) for GF(2^k) since they offer intrinsic SCA resistance against simple attacks due to their uniformity, operation regularity and completeness. To achieve high performance and flexibility, we propose a hardware/software ECDSA codesign approach where scalar multiplication is implemented in hardware and integrated in the ECDSA functionality through appropriate drivers of an ECDSA software stack. To increase BEC scalar multiplier performance and introduce SCA resistance we adopt and expand a parallelism design strategy/methodology where GF(2^k) operations of a scalar multiplier round for both point operations performed in this round are reordered and assigned into parallelism layer in order to be executed concurrently. Within this strategy we include hardware and software based SCA countermeasures that rely on masking/randomization and hiding. While scalar randomization is realized by the ECDSA software stack in an easy way, in order to achieve resistance using hardware means, we propose and introduce in every scalar multiplier round, within the parallelism layers, projective coordinates randomization of all the round’s output points. So, in our approach, considering that with the proposed parallelism plan in every scalar multiplier round BEC point operations are performed in parallel and that the round’s output points are randomized with a different number in each round, we manage to achieve maximum SCA resistance. To validate this resistance, we introduce and realize a leakage assessment process on BEC scalar multipliers for the first time in research literature. This process is based on real measurements collected from a controlled SAKURA X environment with a GF(2²³³) based scalar multiplier implementation. Using Welch’s t-test we investigate possible information leakage of the multiplier’s input point and scalar and after an extended analysis we find trivial leakage. Finally, we validate the ECDSA architecture and its scalar multiplier efficiency by implementing it on a Zynq 7000 series FPGA Avnet Zedboard and collecting very promising, well balanced, results on speed and hardware resources in comparison with other works. 相似文献

6.

An efficient architecture for designing reverse converters based on a general three-moduli set

Amir Sabbagh Keivan Omid Ali 《Journal of Systems Architecture》2008,54(10):929-934

In this paper, a high-speed, low-cost and efficient design of reverse converter for the general three-moduli set {2^α, 2^β − 1, 2^β + 1} where α < β is presented. The simple proposed architecture consists of a carry save adder (CSA) and a modulo adder. As a result it can be efficiently implemented in VLSI circuits. The values of α and β are set in order to provide the desired dynamic range and also to obtain a balanced moduli set. Based on the above, two new moduli sets {2^n+k, 2²ⁿ − 1, 2²ⁿ + 1} and {2²ⁿ⁻¹, 2²ⁿ⁺¹ − 1, 2²ⁿ⁺¹ + 1}, which are the special cases of the moduli set {2^α, 2^β − 1, 2^β + 1} are proposed. The reverse converters for these new moduli sets are derived from the proposed general architecture with better performance compared to the other reverse converters for moduli sets with similar dynamic range. 相似文献

7.

An area efficient multi-mode quadruple precision floating point adder

《Microprocessors and Microsystems》2016

Most of the scientific and engineering applications require accurate computations. Double precision floating point computations are not enough for many applications like climate modelling, computational physics, etc. Efficient design of quadruple precision floating point adder is needed for these applications. The proposed multi-mode quadruple precision floating point adder architecture supports four single precision operations in parallel, as well as two double precision operations in parallel and also supports one quadruple precision operation. Compared to existing Quadruple precision floating point adders and Dual mode Quadruple precision floating point adder, the proposed architecture can perform more computations with less area because of resource sharing among different precision operands. The proposed Multi-mode quadruple precision adder supports both normal and subnormal operations and also the exceptional case handling such as infinity, Not a Number (NaN) and zero cases. The proposed adder has been designed and implemented in both ASIC and FPGA. During ASIC implementation with 90 nm technology using the synopsis tool, the proposed Multi-mode quadruple precision floating point adder has a 38.57% smaller area compared to the existing quadruple precision floating point adder. Similarly, the proposed design reduces the area by 29.28% and 35.68% when implemented on Virtex 4 and Virtex 5 FPGAs respectively. 相似文献

8.

High performance hardware support for elliptic curve cryptography over general prime field

《Microprocessors and Microsystems》2017

Secure information exchange in resource constrained devices can be accomplished efficiently through elliptic curve cryptography (ECC). Due to the high computational complexity of ECC arithmetic, a high performance dedicated hardware architecture is essential to provide sufficient performance in a computation of elliptic curve scalar multiplication. This paper presents a high performance hardware support for elliptic curve cryptography over a prime field GF(p). It exploited a best available possible parallelism of elliptic curve points in projective representation. The proposed hardware for ECC is implemented on Xilinx Virtex-4, Virtex-5 and Virtex-6 FPGAs. A 256-bit scalar multiplication is completed in 2.01 ms, 2.62 ms and 3.91 ms on Virtex-6, Virtex-5 and Virtex-4 FPGA platforms, respectively. The results show that the proposed design is 1.96 times faster with insignificant increase in area consumption as compared to the other reported designs. Therefore, it is a good choice to be used in many ECC based schemes. 相似文献

9.

可伸缩双有限域模加减器的研究与实现

下载免费PDF全文

张军戴紫彬孟强秦帆《计算机工程》2010,36(8):158-160

在改进通用模加减算法的基础上,实现一种结构优化的模加减器。采用基于字的模加减法统一硬件架构,使该设计具有良好的可扩展性,可以完成素数有限域GF(p)和二进制有限域GF(2m)上任意长度操作数的模加减法运算。该设计引入流水线结构,使其工作效率提高50%~80%,可以应用于各种高性能的椭圆曲线密码协处理器设计中。相似文献

10.

On Cryptographic Propagation Criteria for Boolean Functions

Claude Carlet 《Information and Computation》1999,151(1-2)

We determine the functions on GF(2)ⁿ which satisfy the propagation criterion of degree n−2, PC(n−2). We study subsequently the propagation criterion of degree ℓ and order k and its extended version EPC. We determine those Boolean functions on GF(2)ⁿ which satisfy PC(ℓ) of order kn−ℓ−2. We show that none of them satisfies EPC(ℓ) of the same order. We finally give a general construction of nonquadratic functions satisfying EPC(ℓ) of order k. This construction uses the existence of nonlinear, systematic codes with good minimum distances and dual distances (e.g., Kerdock codes and Preparata codes). 相似文献

11.

Area and delay efficient GDI based accuracy configurable adder design

《Microprocessors and Microsystems》2020

Adders are one of the basic fundamental critical arithmetic circuits in a system and their performances affect the overall performance of the system. Traditional n-bit adders provide precise results, whereas the lower bound of their critical path delay of n bit adder is Ὠ(log n). To achieve a minimum critical path delay lower than Ὠ(log n), many inaccurate adders have been proposed. These inaccurate adders decrease the overall critical path delay and improve the speed of computation by sacrificing the accuracy or predicting the computation results. In this work, a fast reconfigurable approximate ripple carry adder has been proposed using GDI (Gate Diffusion Logic) passing cell. Here, GDI cell acts as a reconfigurable cell to be either connected with the previous carry value or approximated value in an adder chain. This adder has greater advantage and it can be configured as an accurate or inaccurate adder by selecting working mode in GDI cell. The implementation results show that, in the approximate working mode, the proposed 64-bit adder provides up to 23%, 34% and 95% reductions in area, power and delay, respectively compared to those of the existing adder. 相似文献

12.

Scalable Parallel Algorithms for Geometric Pattern Recognition

Laurence Boxer Russ Miller Andrew Rau-Chaplin 《Journal of Parallel and Distributed Computing》1999,58(3):477

This paper considers a variety of geometric pattern recognition problems on input sets of size n using a coarse grained multicomputer model consisting of p processors with Ω(n/p) local memory each (i.e., Ω(n/p) memory cells of Θ(log n) bits apiece), where the processors are connected to an arbitrary interconnection network. It introduces efficient scalable parallel algorithms for a number of geometric problems including the rectangle finding problem, the maximal equally spaced collinear points problem, and the point set pattern matching problem. All of the algorithms presented are scalable in that they are applicable and efficient over a very wide range of ratios of problem size to number of processors. In addition to the practicality imparted by scalability, these algorithms are easy to implement in that all required communications can be achieved by a small number of calls to standard global routing operations. 相似文献

13.

New Minimum Distance Bounds for Linear Codes over Small Fields

Daskalov R. N. Gulliver T. A. 《Problems of Information Transmission》2001,37(3):206-215

Let [n, k, d]_q-codes be linear codes of length n, dimension k, and minimum Hamming distance d over GF(q). In this paper we consider codes over GF(3), GF(5), GF(7), and GF(8). Over GF(3), three new linear codes are constructed. Over GF(5), eight new linear codes are constructed and the nonexistence of six codes is proved. Over GF(7), the existence of 33 new codes is proved. Over GF(8), the existence of ten new codes and the nonexistence of six codes is proved. All of these results improve the corresponding lower and upper bounds in Brouwer's table [www.win.tue.nl/aeb/voorlincod.html]. 相似文献

14.

Design and analysis of high-speed 8-bit ALU using 18 nm FinFET technology

Shylashree N. Venkatesh B. Saurab T. M. Srinivasan Tarun Nath Vijay 《Microsystem Technologies》2019,25(6):2349-2359

All modern computational devices consist of ALU. With increase in complexity of software and the consistent shift of software towards parallelism, high speed processors with hardware support for time consuming operations such as multiplication would benefit. Smaller, compact devices such as IoT devices need to run software such as security software and be able to offload computation cost from the cloud. In this paper, a high speed 8-bit ALU using 18 nm FinFET technology is proposed. The arithmetic and logical unit consists of fast compute units such as Kogge Stone fast adder and Dadda multiplier along with basic logic gates. In this paper, an ALU with each compute unit optimized for speed is proposed, while responsibly consuming area. Dadda multiplier is of 8 × 8 architecture as opposed to conventional approach of 4 × 4 making it a true 8-bit ALU. Simulation and analysis is done using Cadence Virtuoso in Analog Design Environment. The transistor count of proposed design is 5298, the power consumption is 219 µW and maximum delay is 166.8 ps. The design is also expected to consume a maximum of one clock cycle for any computation.

相似文献

15.

A class of injective compressing maps on linear recurring sequences over a Galois ring

D. N. Bylkov 《Problems of Information Transmission》2010,46(3):245-252

We consider pseudorandom sequences v over a field GF(p ^r) obtained by mapping ℓ-grams of a linear recurring sequence u over a Galois ring to an arbitrary coordinate set. We study the possibility of uniquely reconstructing u given v. Earlier known results are briefly overviewed. 相似文献

16.

Fast Evaluation of Interlace Polynomials on Graphs of Bounded Treewidth

Markus Bläser Christian Hoffmann 《Algorithmica》2011,61(1):3-35

We consider the multivariate interlace polynomial introduced by Courcelle (Electron. J. Comb. 15(1), 2008), which generalizes several interlace polynomials defined by Arratia, Bollobás, and Sorkin (J. Comb. Theory Ser. B 92(2):199–233, 2004) and by Aigner and van der Holst (Linear Algebra Appl., 2004). We present an algorithm to evaluate the multivariate interlace polynomial of a graph with n vertices given a tree decomposition of the graph of width k. The best previously known result (Courcelle, Electron. J. Comb. 15(1), 2008) employs a general logical framework and leads to an algorithm with running time f(k)⋅n, where f(k) is doubly exponential in k. Analyzing the GF(2)-rank of adjacency matrices in the context of tree decompositions, we give a faster and more direct algorithm. Our algorithm uses 2^3k²+O(k)·n2^{3k^{2}+O(k)}\cdot n arithmetic operations and can be efficiently implemented in parallel. 相似文献

17.

Fully Scalable Fault-Tolerant Simulations for BSP and CGM

Sung-Ryul Kim Kunsoo Park 《Journal of Parallel and Distributed Computing》2000,60(12):10

In this paper we consider general simulations of algorithms designed for fully operational BSP and CGM machines on machines with faulty processors. The BSP (or CGM) machine is a parallel multicomputer consisting of p processors for which a memory of n words is evenly distributed and each processor can send and receive at most h messages in a superstep. The faults are deterministic (i.e., worst-case distributions of faults are considered) and static (i.e., they do not change in the course of computation). We assume that a constant fraction of processors are faulty. We present two fault-tolerant simulation techniques for BSP and CGM: 1. A deterministic simulation that achieves O(1) slowdown for local computations and O((log_h p)²) slowdown for communications per superstep, provided that a preprocessing is done that requires O((log_h p)²) supersteps and linear (in h) computation per processor in each superstep. 2. A randomized simulation that achieves O(1) slowdown for local computations and O(log_h p) slowdown for communications per superstep with high probability, after the same (deterministic) preprocessing as above. Our results are fully scalable over all values of p from Θ(1) to Θ(n). Furthermore, our results imply that if pn for 0<<1 and h=Θ((n/p)^δ) for 0<δ1 (which hold in almost all practical BSP and CGM computations), algorithms can be made resilient to a constant fraction of processor faults without any asymptotic slowdown. 相似文献

18.

An efficient signed digit montgomery multiplication for RSA

Daesung Lim Nam Su Chang Sung Yeon Ji Chang Han Kim Sangjin Lee Young-Ho Park 《Journal of Systems Architecture》2009,55(7-9):355-362

相似文献

19.

Alternating optimization to solve penalized regression‐based clustering model

Mohammad Barati Mehrdad Jalali Yahya Forghani 《Expert Systems》2019,36(6)

Two previously proposed heuristic algorithms for solving penalized regression‐based clustering model (PRClust) are (a) an algorithm that combines the difference‐of‐convex programming with a coordinate‐wise descent (DC‐CD) algorithm and (b) an algorithm that combines DC with the alternating direction method of multipliers (DC‐ADMM). In this paper, a faster method is proposed for solving PRClust. DC‐CD uses p × n × (n ? 1)/2 slack variables to solve PRClust, where n is the number of data and p is the number of their features. In each iteration of DC‐CD, these slack variable and cluster centres are updated using a second‐order cone programming (SOCP). DC‐ADMM uses p × n × (n ? 1) slack variables. In each iteration of DC‐ADMM, these slack variables and cluster centres are updated using ADMM. In this paper, PRClust is reformulated into an equivalent model to be solved using alternating optimization. Our proposed algorithm needs only n × (n ? 1)/2 slack variables, which is much less than that of DC‐CD and DC‐ADMM and updates them analytically using a simple equation in each iteration of the algorithm. Our proposed algorithm updates only cluster centres using an SOCP. Therefore, our proposed SOCP is much smaller than that of DC‐CD, which is used to update both cluster centres and slack variables. Experimental results on real datasets confirm that our proposed method is faster and much faster than DC‐ADMM and DC‐CD, respectively. 相似文献

20.

Vapour sensing with conductive polymer nanocomposites (CPC): Polycarbonate-carbon nanotubes transducers with hierarchical structure processed by spray layer by layer

Jianbo Bijandra Mickaël Jean-Franois 《Sensors and actuators. B, Chemical》2009,140(2):451-460

The development of conductive polymer nanocomposite (CPC) sensors for volatile organic compounds (VOC) detection has been carried out using a spray layer by layer (LbL) process. This technique was successfully used to hierarchically structure polycarbonate-multiwall carbon nanotubes (PC-CNT) solutions into a double percolated architecture as attested by atomic force microscopy (AFM) and optical microscopy (OM). PC-CNT vapour sensing behaviour was investigated as a function of CNT content, films thickness, vapour flow and vapours solubility parameter. The response ranking A_r(toluene) > A_r(methanol) > A_r(water) of PC-CNT was found to be coherent with κ₁₂ Flory–Huggins interaction parameters provided that signals are normalised by analyte molecules number. Signals shape was interpreted to the light of Langmuir–Henry–Clustering (LHC) model and found to be proportional to vapour content. 相似文献