首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Multiprocessor embedded systems integrates diverse dedicated processing units to handle high performance applications such as in multimedia and network processing. However, lock-based synchronization limits the efficiency of such heterogeneous concurrent systems. Hardware Transactional Memory (HTM) is a promising approach in creating an abstraction layer for multi-threaded programming. However, HTM performance is application-specific and determined by version and conflict management configurations. Most previous HTM implementations for embedded system in literature were built on fixed version management that result in significant performance loss when transaction behaviour changes. In this paper, we propose a HTM targeted for embedded applications which is able to adapt its version management based on application behaviour at runtime. It is prototyped and analysed on Altera Cyclone IV platform. Random requests at different contention levels and different transaction sizes are used to verify the performance of the proposed HTM. Based on our experiments, lazy version management is able to obtain up to 12.82% speed-up compared to eager version management at high contention level. Meanwhile, eager version management obtains up to 37.84% speed-up compared to lazy version management at low contention. The adaptive mechanism is able to switch configuration at runtime based on applications behaviour for maximum performance.  相似文献   

2.
3.
Hardware implementations of cryptosystems are susceptible to fault attacks. By analyzing the side channel information from implementation, the attacker can retrieve the secret information. Generally, in the hardware implementations, validations of results are reported at the end of the computation. If faults are injected at the input side of computation, all the computations performed afterward are wasteful and this is a potential situation which can leak the secret key information using side channel attacks. The current work proposes fault attack resistant implementation of an elliptic curve cryptosystem using a shared point validator unit, zero-one detector, and double coherence check by modified Montgomery Powering Ladder Algorithm. The architecture is robust to fault attacks along with power and area efficiency.  相似文献   

4.
This paper addresses public key cryptosystems based on elliptic curves, which are aimed to high-performance digital signature schemes. Elliptic curve algorithms are characterized by the fact that one can work with considerably shorter keys compared to the RSA approach at the same level of security. A general and highly efficient method for mapping the most time-critical operations to a configurable co-processor is proposed. By means of real-time measurements the resulting performance values are compared to previously published state of the art hardware implementations.

A generator based approach is advocated for that purpose which supports application specific co-processor configurations in a flexible and straight forward way. Such a configurable CryptoProcessor has been integrated into a Java-based digital signature environment resulting in a considerable increase of its performance. The outlined approach combines in an unique way the advantages of mapping functionality to either hardware or software and it results in high-speed cryptosystems which are both portable and easy to update according to future security requirements.  相似文献   


5.
基于FPGA椭圆曲线密码体制的研究   总被引:2,自引:2,他引:0  
对基于FPGA椭圆曲线密码体制的实现进行全面研究,在Xilinx的FPGA上实现了二元有限域和椭圆曲线点运算的所有算法。将模乘算法、模逆算法、曲线点加算法、曲线点减算法、点乘算法、EC-Elgamal加密/解密方案、总线命令控制等在FPGA上完成仿真、综合和板级验证,并设计出具有PCI局部总线传输功能的加密/解密适配卡。研究中提出了新的基于正规基和正则基的比特串行模乘算法实现方案。  相似文献   

6.
A central geometric structure in applications such as robotic path planning and hidden line elimination in computer graphics is the visibility graph. A new parallel algorithm to construct the reduced visibility graph in a convex polygonal environment is presented in this paper. The computational complexity is O(p2log(n/p)) where p is the number of objects and n is the total number of vertices. A key feature of the algorithm is that it supports easy mapping to hardware. The algorithm has been simulated (and verified) using C. Results of hardware implementation show that the design operates at high speed requiring only small space. In particular, the hardware implementation operates at approximately 53 MHz and accommodates the reduced visibility graph of an environment with 80 vertices in one XCV3200E device.  相似文献   

7.
基于一种简化求商的高基Montgomery模乘流水化阵列结构,提出并实现了素域上椭圆曲线标量乘硬件结构。该结构采用修正的Jacobian坐标的点加和倍点算法以及Kaliski提出的Montgomery模逆的算法。实验结果表明,该结构与相关工作相比具有更好的性能。  相似文献   

8.
This paper presents a concept for automated architecture synthesis for adaptive multiprocessors on chip, in particular for Field-Programmable Gate-Array (FPGA) devices. Given a parallel program, the intent is to simultaneously allocate processor resources and the corresponding communication network, and at the same time, to map the parallel application to get an optimum application-specific architecture. This approach builds up on a previously proposed design platform that automates system integration and FPGA synthesis for such architectures. As a result, the overall concept offers an automated design approach from application mapping to system and FPGA configuration. The automated synthesis is based on combinatorial optimization. Automation is possible because a solvable Integer Linear Programming (ILP) model that captures all necessary design trade-off parameters of such systems has been found. Experimental results to study the feasibility of the automated synthesis indicate that problems with sizes that can be encountered in the embedded domain can be readily solved. Results obtained underscore the need for an automated synthesis for design space exploration.  相似文献   

9.
为满足高性能低功耗无线通信的要求,基于反向重算和线性估算的Turbo码译码器结构,通过改变其前向状态度量的存储方式,提出了一种低存储容量的低功耗译码器结构设计方案,并给出了FPGA实现结构。结果表明,与已有的Turbo码译码器结构相比,本设计的译码器结构使存储容量降低了65%,译码性能与Log-MAP算法接近;并且在25 MHz、50 MHz、75 MHz、100 MHz、125 MHz频率下,较传统的译码器结构相比,动态的存储容量功耗均下降50%左右,而总功耗分别降低了4. 97%、 8. 78%、 11. 93%、 14. 18%、 14. 65%。  相似文献   

10.
王萌  黄振  陆建华 《微计算机信息》2007,23(26):201-203
脉冲到达角(DOA)是脉冲信号分选中可利用的重要参数。目前,利用DOA进行的脉冲分选都是基于传统的串行聚类算法,实时性能差。本文针对阈值分割的聚类方式,设计了一种基于并行流水结构的实时聚类算法,使单个DOA的聚类可在单周期内完成,并通过对聚类数目分裂过多的情况进行控制,保证了算法的稳定性和有效性。文章还介绍了算法在FPGA上的实现方法,以及应用在XilinxV2P芯片上的实时性能,并对其聚类性能进行了比较分析。  相似文献   

11.
12.
A fast parallel architecture for the implementation of elliptic curve scalar multiplication over binary fields is presented. The proposed architecture is implemented on a single-chip FPGA device using parallel strategies that trades area requirements for timing performance. The results achieved show that our proposed design is able to compute GF(2191) elliptic curve scalar multiplication operations in 63 μs.  相似文献   

13.
Eric Bruneton  Michel Riveill 《Software》2001,31(13):1237-1264
This article presents a middleware platform architecture whose goals, motivated by the needs of a real‐world application, are the following: separation of functional and non‐functional code in applications, composition of non‐functional properties, and modularity and extensibility of the middleware platform itself. This architecture is inspired by the Enterprise Java Beans platform, and uses a new object composition model to separate and compose the non‐functional properties. In order to evaluate this architecture, we have implemented the JavaPod platform which we have used to implement a prototype of the application that motivated our goals. The results of these experiments show that our goals can indeed be achieved with our architecture. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

14.
Discrete relaxation techniques have proven useful in solving a wide range of problems in digital signal and digital image processing, artificial intelligence, operations research, and machine vision. Much work has been devoted to finding efficient hardware architectures. This paper shows that a conventional hardware design for a Discrete Relaxation Algorithm (DRA) suffers from O(n2m3) time complexity and O(n2m2) space complexity. By reformulating DRA into a parallel computational tree and using a multiple tree-root pipelining scheme, time complexity is reduced to O(nm), while the space complexity is reduced by a factor of 2. For certain relaxation processing, the space complexity can even be decreased to O(nm). Furthermore, a technique for dynamic configuring an architectural wavefront is used which leads to an O(n) time highly concurrent DRA3 architecture.  相似文献   

15.
This paper describes the development of a flexible control system based on multiple T800 floating point transputers. In Part 1 of this paper, entitled 'The application of transputers to distributed control', an overview has been presented of distributed command and control systems (DCCS) and the suitability of the transputer for implementation within such systems discussed. In Part 2 of the article a transputer-based control module which has been developed at the University of Paisley is described. The module allows a number of different modes of operation in that it can be configured to act as either a fixed controller with the coefficients being down-loaded from a central control station, or as an adaptive controller which can make use of an explicit pole placement or linear quadratic Gaussian (LQG) structure. A supervisor, or coordination level, is incorporated into the adaptive controller to monitor the various parameters produced within the controller and direct the system to maintain a safe operation. This improves the applicability, robustness and integrity of the controller in real-time applications. The ease with which software tasks can be distributed over different transputer architectures allows the same software to be configured to accommodate between one and four transputers within the module. In this way the controller module can utilize a number of different transputer configurations depending on cost/performance trade-offs.The use of the transputer also allows the controller to communicate easily through its serial links with other controllers, hosts and external devices. In this way the module can be used as a universal controller node which can be easily incorporated into a large DCCS. Software has been developed to facilitate the production of this type of integrated environment such that a central network interface can initialize, analyse and supervise a number of the controller modules.  相似文献   

16.
椭圆曲线点乘的实现速度决定了椭圆曲线密码算法(ECC)的实现速度.采用蒙哥马利点乘算法,其中模乘运算、模平方运算采用全并行算法,模逆运算采用费马·小定理并在实现中进行了优化,完成了椭圆曲线点乘的快速运算.采用Xilinx公司的Viaex-5器件族的XCV220T作为目标器件,完成了综合与实现.通过时序后仿真,其时钟频率可以达到40 MHz,实现一次点乘运算仅需要14.9μs.  相似文献   

17.
随着混合异构平台的发展,出现了类型不一的加速设备,如何在混合异构平台中充分利用这些不同类型的设备,以及如何在多个计算设备之间部署深度学习模型,而且训练大型和复杂模型的重要性不断提高.数据并行(DP)是应用最广泛的并行化策略,但是如果数据并行训练中的设备数量不断增加,设备之间的通信开销就会成为瓶颈.此外,每个步骤因设备性...  相似文献   

18.
E.  S.B.  B.  I. 《Computers & Electrical Engineering》2007,33(5-6):367-382
This paper describes the first differential power and electromagnetic analysis attacks performed on a hardware implementation of an elliptic curve cryptosystem. In the same time we also compared the metrics used in differential power and electromagnetic radiation attacks. We describe the use of the Pearson correlation coefficient, the distance of mean test and the maximum likelihood test. For each metric the number of measurements needed to get a clear idea of the right guess of the key-bit is taken as indication of the strength of the metric.  相似文献   

19.
The growing importance of expert systems in real-time applications reveals the necessity of reducing response times. Since monoprocessor optimizations of production systems have widely been explored, only multiple processor architectures appear to provide further performance gain. Efficient exploitation of the inherent parallelism of production systems, however, requires suitable algorithms for load balancing without simultaneously increasing organization or communication overhead. We present a novel parallel algorithm for PAMELA expert systems, based on dynamic distribution of data processing. The concept is supported by a transputer based architecture with an advanced interconnection structure.  相似文献   

20.
Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号