首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Network-on-Chip (NoC) paradigm brings networks inside chips. We use the routing capabilities inside NoC to serve as a replacement for Virtual Method Table (VMT) for Object-Oriented (OO) designed hardware/software co-design systems where some methods could be implemented as hardware modules. This eliminates VMT area and performance overhead in OO co-designed embedded systems where resources are limited and where some functionality needs to be implemented in hardware to meet performance goals of the system. Our experimental results on real world embedded applications show up to 32.15% lower area and up to 5.1% higher speed compared to traditional implementation using VMT.  相似文献   

2.
矩阵乘法是数值分析以及图形图像处理算法的基础,通用的矩阵乘法加速器设计一直是嵌入式系统设计的研究热点。但矩阵乘法由于计算复杂度高,处理效率低,常常成为嵌入式系统运算速度的瓶颈。为了在嵌入式领域更好地使用矩阵乘法,提出了基于MPSoC(MultiProcessor System-on-Chip)的软硬件协同加速的架构。在MPSoC的架构下,一方面,设计了面向硬件约束的矩阵分块方法,从而实现了通用的矩阵乘法加速器系统;另一方面,通过利用MPSoC下的多核架构,提出了相应的任务划分和负载平衡调度算法,提高了并行效率和整体系统加速比。实验结果表明,所提架构及算法实现了通用的矩阵乘法计算,并且通过软硬件协同设计实现的多核并行调度算法与传统单核设计相比在计算效率方面得到了显著的提高。  相似文献   

3.
A study of available hardware algorithms was made in order to design adaptive signal processors with VLSI. A suitable model invoking synchrony, topology, and granularity has been chosen to investigate design figures-of-merit for each implementation. At present, redundant arithmetic is being contrasted, basically because carry-free operations are possible resulting in a speed up. This paper focuses on models and primitive computational elements for the least-mean-square (LMS) algorithm embedded in conventional twos complement, bit-serial or distributed arithmetic, and redundant arithmetic processors.  相似文献   

4.
Security protocols such as IPSec, SSL and VPNs used in many communication systems employ various cryptographic algorithms in order to protect the data from malicious attacks. Thanks to public-key cryptography, a public channel which is exposed to security risks can be used for secure communication in such protocols without needing to agree on a shared key at the beginning of the communication. Public-key cryptosystems such as RSA, Rabin and ElGamal cryptosystems are used for various security services such as key exchange and key distribution between communicating nodes and many authentication protocols. Such public-key cryptosystems usually depend on modular arithmetic operations including modular multiplication and exponentiation. These mathematical operations are computationally intensive and fundamental arithmetic operations which are intensively used in many fields including cryptography, number theory, finite field arithmetic, and so on. This paper is devoted to the analysis of modular arithmetic operations and the improvement of the computation of modular multiplication and exponentiation from hardware design perspective based on FPGA. Two of the well-known algorithms namely Montgomery modular multiplication and Karatsuba algorithms are exploited together within our high-speed pipelined hardware architecture. Our proposed design presents an efficient solution for a range of applications where area and performance are both important. The proposed coprocessor offers scalability which means that it supports different security levels with a cost of performance. We also build a system-on-chip design using Xilinx’s latest Zynq-7000 family extensible processing platform to show how our proposed design improve the processing time of modular arithmetic operations for embedded systems.  相似文献   

5.
There has been an increasing concern for the security of multimedia transactions over real-time embedded systems. Partial and selective encryption schemes have been proposed in the research literature, but these schemes significantly increase the computation cost leading to tradeoffs in system latency, throughput, hardware requirements and power usage. In this paper, we propose a light-weight multimedia encryption strategy based on a modified discrete wavelet transform (DWT) which we refer to as the secure wavelet transform (SWT). The SWT provides joint multimedia encryption and compression by two modifications over the traditional DWT implementations: (a) parameterized construction of the DWT and (b) subband re-orientation for the wavelet decomposition. The SWT has rational coefficients which allow us to build a high throughput hardware implementation on fixed point arithmetic. We obtain a zero-overhead implementation on custom hardware. Furthermore, a Look-up table based reconfigurable implementation allows us to allocate the encryption key to the hardware at run-time. Direct implementation on Xilinx Virtex FPGA gave a clock frequency of 60 MHz while a reconfigurable multiplier based design gave a improved clock frequency of 114 MHz. The pipelined implementation of the SWT achieved a clock frequency of 240 MHz on a Xilinx Virtex-4 FPGA and met the timing constraint of 500 MHz on a standard cell realization using 45 nm CMOS technology.  相似文献   

6.
This paper presents a novel algorithm for field programmable gate array (FPGA) realization of vector quantizer (VQ) encoders using partial distance search (PDS). In most applications, the PDS is adopted as a software approach for attaining moderate codeword search acceleration. In this paper, a novel PDS algorithm well suited for hardware realization is proposed. The algorithm employs subspace search, bitplane reduction, and multiple-coefficient accumulation techniques for the effective reduction of the area complexity and computation latency. Concurrent encoding of different input vectors for further computation acceleration is also allowed by the employment of multiple-module PDS. The proposed implementation has been embedded in a softcore CPU for physical performance measurement. Experimental results show that the implementation provides a cost-effective solution to the FPGA realization of VQ encoding systems where both high throughput and high fidelity are desired.  相似文献   

7.
一类定量微分对策理论中最优策略的算法及其收敛性   总被引:3,自引:0,他引:3  
吴汉生 《自动化学报》1992,18(2):143-150
本文利用不动点原理讨论了一类定量微分对策理论中最优策略的计算方法问题.首先构造出了一种迭代方法,然后利用不动点原理分析了该迭代法的收敛性.本文给出的方法还可用于一类Nash微分对策的Nash策略的分散计算方法.  相似文献   

8.
In this age, where new technological devices such as PDAs and mobile phones are becoming part of our daily lives, providing efficient implementations of suitable cryptographic algorithms for devices built on embedded systems is becoming increasingly important. This paper presents an efficient design of a high-performance hyperelliptic curve cryptosystem for a field programmable gate array which is well suited for embedded systems having limited resources such as memory, space and processing power. In this paper, we investigate two architectures, one using a projective coordinate representation for hyperelliptic systems and the second using a mixed coordinate representation that eliminates the need for field inversions in the point arithmetic, which has been proven to be expensive in both time and space. In addition, both architectures are based on an explicit formula which allows one to compute the point arithmetic directly in the finite field, thereby eliminating a level of arithmetic. The operation time for the HECC system is also improved by considering simplifications of the hyperelliptic curve which are accomplished through simple transformation of variables. As a result, these implementations offer significantly faster operation time and smaller area consumption then other HECC hardware implementations done to date.  相似文献   

9.
This paper presents hardware designs, arithmetic algorithms, and numerical applications for variable-precision, interval arithmetic coprocessors. These coprocessors give the programmer the ability to set the initial precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. Variable-precision, interval arithmetic algorithms are used to reduce the execution times of numerical applications. Three hardware designs with data paths of 16, 32, and 64 bits are examined. These designs are compared based on their estimated chip area, cycle time, and execution times for various numerical applications. Each coprocessor can be implemented on a single chip with a cycle time that is comparable to IEEE double-precision floating point coprocessors. For certain numerical applications, the coprocessors are two to four orders of magnitude faster than a conventional software package for variable-precision, interval arithmetic.  相似文献   

10.
Implementation of intelligent and bio-inspired algorithms in industrial and real applications is arduous, time consuming and costly; in addition, many aspects of system from high level behavior of algorithm to energy consumption of targeted system must be considered simultaneously in the design process. Advancement of hardware platforms such as DSPs, FPGAs and ASICs in recent years has made it increasingly possible to implement computationally complex intelligent systems; on the other hand, however, the design and testing costs of these systems are high. Reusability and extendibility features of the developed models can decrease the total cost and time-to-market of an intelligent system. In this work, model driven development approach is utilized for implementation of emotional learning as a bio-inspired algorithm for embedded purposes. Recent studies show that emotion is a mechanism for fast decision making in human and other animals, and can be assumed as an expert system. Mathematical models have been developed for describing emotion in mammals from cognitive studies. Here brain emotional based learning intelligent controller (BELBIC), which is based on mammalian middle brain, is designed and implemented on FPGA and the obtained embedded emotional controller (E-BELBIC) is utilized for controlling real laboratorial overhead traveling crane in model-free and embedded manner. Short time-to-market, easy testing and error handling, separating concerns, improving reusability and extendibility of obtained models in similar applications are some benefits of the model driven development methodology.  相似文献   

11.
Di Lecce, V., and Guerriero, A., Spectral Estimation by AFT Computation,Digital Signal Processing6(1996) 213–223.At the beginning of this century Bruns developed a method for computing the coefficients of the Fourier series of a periodic functiony(t) using the Möbius inversion formula. This idea for Fourier analysis was considered again by Wintner from an arithmetical point of view in 1945. In recent papers, many authors have shown that the arithmetic Fourier transform (AFT) computation is more convenient in signal processing, requiring a reduced computation load, than are fast Fourier transform and convolution algorithms. The data dependence in the AFT is not uniform (this algorithm requires nonequidistant inputs to produce equidistant spectral coefficients). To have a series of suitable values as AFT inputs, oversampling or interpolation is used. In these papers, bases on algorithms, evaluations of errors in the spectral coefficients computation using AFT, and the complexity of different hardware and software solutions for the AFT computation are proposed. The spectral coefficients computed via AFT and via discrete Fourier transform are compared in terms of accuracy. AFT computation proves to be an easy task but its software or hardware implementation is much more complex. Furthermore there is not a complete evaluation of AFT in any of the papers. Our aim is to provide a complete evaluation of this algorithm.  相似文献   

12.
Classification techniques development constitutes a foundation for machine learning evolution, which has become a major part of the current mainstream of Artificial Intelligence research lines. However, the computational cost associated with these techniques limits their use in resource constrained embedded platforms. As the classification task is often combined with other high computational cost functions, efficient performance of the main modules is fundamental requirements to achieve hard real-time speed for the whole system. Graph-based machine learning techniques offer a powerful framework for building classifiers. Optimum-Path Forest (OPF) is a graph-based classifier presenting the interesting ability to provide nonlinear classes separation surfaces. This work proposes a SoC/FPGA based design and implementation of an architecture for embedded applications, presenting a hardware converted algorithm for an OPF classifier. Comparison of the achieved results with an embedded processor software implementation shows accelerations of the OPF classification from 2.18 to 9 times, which permits to expect real-time performance to embedded applications.  相似文献   

13.
As we enter the multi-core era, seeking methods to boost the performance of single-threaded applications remains critical. Achieving gains in processor performance by increasing the operating frequency has begun to meet more obstacles. However, significant performance improvements can be achieved by extending the capability of the processor with the addition of hardware support, which makes much more effective use of the available transistors. This paper presents a novel hardware support called, DistTree, to speed up processor performance. The DistTree hardware automates gather and scatter operations for applications with complex but predictable memory access patterns like the Fast Fourier Transform (FFT). With this hardware support integrated with a modern microprocessor (the Alpha architecture in our experiments), the FFT performance can reap a more than twofold increase when compared against the FFTW library, a state-of-the-art implementation. The DistTree hardware support enables the processor to spend the majority of processor cycles on executing the computations of an algorithm by reducing both the arithmetic and address computation overhead. Therefore, the performance of many single-threaded applications can be significantly increased.  相似文献   

14.
The paper addresses the omnipresent problem of bit counting. This problem is of particular importance for information systems where the choice of a rational access strategy may require repeated evaluation of the cardinalities of retrieved sets of data items. There are several different methods available to implement this procedure, which involve shifting, table look‐up, exploiting the properties of fixed point arithmetic, and manipulations with bitwise logical operations. This paper presents a novel approach to the organization of bit counting based on the principle of frequency division (FD). The developed algorithm emulates a set of 32 binary counters using the bit‐parallelism of computer word operations. The overflowing bits generated by these counters at a lower frequency are processed with the arithmetic‐logic method, which is most efficient for sparse binary vectors. The suggested FD procedure is one of the fastest among the known, widely available procedures for bit counting. In future computers, with 64‐bit words and larger, the gain in speed due to the FD technique will be higher, and the performance of this software could be comparable to that of specialized hardware. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

15.
In this paper, we report a hardware/software (HW/SW) co-designed K-means clustering algorithm with high flexibility and high performance for machine learning, pattern recognition and multimedia applications. The contributions of this work can be attributed to two aspects. The first is the hardware architecture for nearest neighbor searching, which is used to overcome the main computational cost of a K-means clustering algorithm. The second aspect is the high flexibility for different applications which comes from not only the software but also the hardware. High flexibility with respect to the number of training data samples, the dimensionality of each sample vector, the number of clusters, and the target application, is one of the major shortcomings of dedicated hardware implementations for the K-means algorithm. In particular, the HW/SW K-means algorithm is extendable to embedded systems and mobile devices. We benchmark our multi-purpose K-means system against the application of handwritten digit recognition, face recognition and image segmentation to demonstrate its excellent performance, high flexibility, fast clustering speed, short recognition time, good recognition rate and versatile functionality.  相似文献   

16.
This paper treats the evaluation of one of the elementary functions on short wordlength computers. The setting is a binary fixed point short wordlength (8–16 bits) machine where the intent is to suggest improvements in ROM- or microcode-based software which include the square root function as part of a more general mathematical software library or for special computation in real-time applications. This paper focuses on the evaluation of square roots and features a careful treatment of Newton's method with linear initialization. Comparisons with other popular algorithms are made based on storage requirements, speed, and accuracy, with some indication of the effect that special hardware features have on the performance of these routines.  相似文献   

17.
随着大型图像集的出现以及计算机硬件尤其是GPU的快速发展,卷积神经网络(CNN)已经成为人工智能领域的一种成功算法,在各种机器学习任务中表现出色.但CNN的计算复杂度远高于传统算法,嵌入式设备上有限资源的限制成为制造高效嵌入式计算的挑战性问题.在本文中,我们提出了一种基于嵌入式设备的高效卷积神经网络用于电力设备检测,根据处理速度评估这种高效的神经网络.结果表明,该算法能够满足嵌入式设备实时视频处理的要求.  相似文献   

18.
Hardware designs need to obey constraints of resource utilization, minimum clock frequency, power consumption, computation precision and data range, which are all affected by the data type representation. Floating and fixed-point representations are the most common data types to work with real numbers where arithmetic hardware units for fixed-point format can improve performance and reduce energy consumption when compared to floating point solution. However, the right bit-lengths estimation for fixed-point is a time-consuming task since it is a combinatorial optimization problem of minimizing the accumulative arithmetic computation error. This work proposes two evolutionary approaches to accelerate the process of converting algorithms from floating to fixed-point format. The first is based on a classic evolutionary algorithm and the second one introduces a compact genetic algorithm, with theoretical evidence that a near-optimal performance, to find a solution, has been reached. To validate the proposed approaches, they are applied to three computing intensive algorithms from the mobile robotic scenario, where data error accumulated during execution is influenced by sensor noise and navigation environment characteristics. The proposed compact genetic algorithm accelerates the conversion process up to 10.2× against the state of art methods reaching similar bit precision and robustness.  相似文献   

19.
Wireless, battery-powered camera networks are becoming of increasing interest for surveillance and monitoring applications. The computational power of these platforms is often limited in order to reduce energy consumption. In addition, many embedded processors do not have floating point support in hardware. Among the visual tasks that a visual sensor node may be required to perform, motion analysis is one of the most basic and relevant. Events of interest are usually characterized by the presence of moving objects or persons. Knowledge of the direction of motion and velocity of a moving body may be used to take actions such as sending an alarm or triggering other camera nodes in the network.We present a fast algorithm for identifying moving areas in an image. The algorithm is efficient and amenable to implementation in fixed point arithmetic. Once the moving blobs in an image have been precisely localized, the average velocity vector can be computed using a small number of floating point operations. Our procedure starts by determining an initial labeling of image blocks based on local differential analysis. Then, belief propagation is used to impose spatial coherence and to resolve aperture effect inherent in texture less areas. A detailed analysis of the computational cost of the algorithm and of the provisions that must be taken in order to avoid overflow with 32-bit words is included.  相似文献   

20.
In this paper, an algorithm is proposed for identifying multivariable systems in state-space form from noisy data, which is suitable for implementation on dedicated microprocessor systems. The proposed algorithm uses the normalized stochastic approximation criterion which reduces the computational complexity and memory requirements. It is shown that the overall performance of the proposed stochastic approximation algorithm when using a dedicated microprocessor with fixed point arithmetic is superior to the extended least-squares method in terms of memory requirements, execution speed per iteration, and the estimation results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号