期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Customizing floating-point units for FPGAs: Area-performance-standard trade-offs

Pedro Echeverría Marisa López-VallejoAuthor vitae 《Microprocessors and Microsystems》2011,35(6):535-546

The high integration density of current nanometer technologies allows the implementation of complex floating-point applications in a single FPGA. In this work the intrinsic complexity of floating-point operators is addressed targeting configurable devices and making design decisions providing the most suitable performance-standard compliance trade-offs. A set of floating-point libraries composed of adder/subtracter, multiplier, divisor, square root, exponential, logarithm and power function are presented. Each library has been designed taking into account special characteristics of current FPGAs, and with this purpose we have adapted the IEEE floating-point standard (software-oriented) to a custom FPGA-oriented format. Extended experimental results validate the design decisions made and prove the usefulness of reducing the format complexity. 相似文献

2.

一种面向FPGA的指/对数函数求值算法

下载免费PDF全文

牟胜梅李兆刚《计算机工程与应用》2011,47(33):59-61

CORDIC算法常用于高效地实现多种超越函数求值,但算法的通用性使其无法针对具体函数进行优化。提出一种统一的指数/对数函数迭代求值算法LnE,实现方式与CORDIC算法类似,每次迭代同样只需进行移位、加法和简单的判断操作,拥有线性收敛速度,但LnE算法具备更多优势：只需x、y两条通路;每次迭代均进行加法操作,不需根据迭代系数[di]选择加法/减法,控制简单;不需进行扩展因子补偿;不需重复某些迭代以保证收敛。因此LnE算法的迭代次数和每次迭代的开销均小于CORDIC算法,相对于CORDIC算法可节省1/3以上的面积开销。相似文献

3.

Real root isolation for exp-log-arctan functions

Adam Strzeboński 《Journal of Symbolic Computation》2012,47(3):282-314

We present a real root isolation procedure for univariate functions obtained by composition and rational operations from exp,log,arctan and real constants. The procedure was first introduced for exp-log functions in Strzeboński (2008). Here we extend the procedure to exp-log-arctan functions, describe computation with elementary constants in detail and discuss the complexity of the root isolation procedure for the general exp-log-arctan case as well as for the special case of sparse polynomials. We discuss implementation of the procedure and present empirical results. 相似文献

4.

Sufficient and necessary conditions for discrete-time nonlinear switched systems with uniform local exponential stability

Junjie Lu Zhikun She 《International journal of systems science》2016,47(15):3561-3572

In this paper, we investigate sufficient and necessary conditions of uniform local exponential stability (ULES) for the discrete-time nonlinear switched system (DTNSS). We start with the definition of T-step common Lyapunov functions (CLFs), which is a relaxation of traditional CLFs. Then, for a time-varying DTNSS, by constructing such a T-step CLF, a necessary and sufficient condition for its ULES is provided. Afterwards, we strengthen it based on a T-step Lipschitz continuous CLF. Especially, when the system is time-invariant, by the smooth approximation theorem, the Lipschitz continuity condition of T-step CLFs can further be replaced by continuous differentiability; and when the system is time-invariant and homogeneous, due to the extension of Weierstrass approximation theorem, T-step continuously differentiable CLFs can even be strengthened to be T-step polynomial CLFs. Furthermore, three illustrative examples are additionally used to explain our main contribution. In the end, an equivalence between time-varying DTNSSs and their corresponding linearisations is discussed. 相似文献

5.

M-DSP中高性能浮点乘加器的设计与实现

车文博刘衡竹田甜《计算机应用》2016,36(8):2213-2218

针对高性能M型数字信号处理器（M-DSP）对浮点运算的性能、面积和功耗要求,研究分析了M-DSP总体结构和浮点运算的指令特点,设计和实现了一种高性能低功耗的浮点乘累加器（FMAC）。该乘加器采用单、双精度通路分离的主体结构,分为六级流水站执行,对乘法器、对阶移位等关键模块进行了复用设计,支持双精度和单精度浮点乘法、乘累加、乘累减、单精度点积和复数运算。对所设计的乘加器进行了全面的验证,基于45nm工艺采用Synopsys公司的Design Compiler工具综合所设计的代码,综合结果表明运行频率可达1GHz,单元面积36856μm²;与FT-XDSP中的乘加器相比,面积节省了12.95%,关键路径长度减少了2.17%。相似文献

6.

Some infinite integrals involving products of exponential and bessel functions

《国际计算机数学杂志》2012,89(1-4):207-217

This paper is concerned with the evaluation of some infinite integrals involving products of exponential and Bessel functions. These integrals are transformed, through some identities, into the expressions containing modified Bessel functions. In this way, the difficulties associated with the computations of infinite integrals with oscillating integrands are eliminated. 相似文献

7.

Parameterized algorithms for feedback set problems and their duals in tournaments

Venkatesh Raman Saket Saurabh 《Theoretical computer science》2006

The parameterized feedback vertex (arc) set problem is to find whether there are k vertices (arcs) in a given graph whose removal makes the graph acyclic. The parameterized complexity of this problem in general directed graphs is a long standing open problem. We investigate the problems on tournaments, a well studied class of directed graphs. We consider both weighted and unweighted versions. 相似文献

8.

Parameterized complexity and improved inapproximability for computing the largest j-simplex in a V-polytope

Ioannis Koutis 《Information Processing Letters》2006,100(1):8-13

We consider the problem of computing the squared volume of the largest j-simplex contained in an n-dimensional polytope presented by its vertices (a V-polytope). We show that the related decision problem is W[1]-complete, with respect to the parameter j. We also improve the constant inapproximability factor given in [A. Packer, Polynomial-time approximation of largest simplices in V-polytopes, Discrete Appl. Math. 134 (1-3) (2004) 213-237], by showing that there are constants μ<1,c>1 such that it is NP-hard to approximate within a factor of cμn the volume of the largest ⌊μn⌋-simplex contained in an n-dimensional polytope with O(n) vertices. 相似文献

9.

An Improved (and Practical) Parameterized Algorithm for the Individual Haplotyping Problem MFR with Mate-Pairs

Minzhu Xie Jianxin Wang 《Algorithmica》2008,52(2):250-266

The Individual Haplotyping MFR problem is a computational problem that, given a set of DNA sequence fragment data of an individual, induces the corresponding haplotypes by dropping the minimum number of fragments. Bafna, Istrail, Lancia, and Rizzi proposed an algorithm of time O(2^2k m ² n+2^3k m ³) for the problem, where m is the number of fragments, n is the number of SNP sites, and k is the maximum number of holes in a fragment. When there are mate-pairs in the input data, the parameter k can be as large as 100, which would make the Bafna-Istrail-Lancia-Rizzi algorithm impracticable. The current paper introduces a new algorithm PM-MFR of running time , where k ₁ is the maximum number of SNP sites that a fragment covers (k ₁ is smaller than n), and k ₂ is the maximum number of fragments that cover a SNP site (k ₂ is usually about 10). Since the time complexity of the algorithm PM-MFR is not directly related to the parameter k, the algorithm solves the Individual Haplotyping MFR problem with mate-pairs more efficiently and is more practical in real biological applications. This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 60433020 and 60773111, the Program for New Century Excellent Talents in University No. NCET-05-0683, the Program for Changjiang Scholars and Innovative Research Team in University No. IRT0661, and the Scientific Research Fund of Hunan Provincial Education Department under Grant No. 06C526. 相似文献

10.

LMI criteria on exponential stability of BAM neural networks with both time-varying delays and general activation functions

Huiwei Wang Chengjun Duan 《Mathematics and computers in simulation》2010,81(4):837-850

相似文献

11.

Compact Keccak Hardware Architecture for Data Integrity and Authentication on FPGAs

《Information Security Journal: A Global Perspective》2013,22(5):231-242

ABSTRACT

Cryptographic hash functions play a crucial role in networking and communication security, including their use for data integrity and message authentication. Keccak hash algorithm is one of the finalists in the next generation SHA-3 hash algorithm competition. It is based on the sponge construction whose hardware performance is worth investigation. We developed an efficient hardware architecture for the Keccak hash algorithm on Field-Programmable Gate Array (FPGA). Due to the serialization exploited in the proposed architecture, the area needed for its implementation is reduced significantly accompanied by higher efficiency rate. In addition, low latency is attained so that higher operating frequencies can be accessed. We use the coprocessor approach which exploits the use of RAM blocks that exist in most FPGA platforms. For this coprocessor, a new datapath structure allowing parallel execution of multiple instructions is designed. Implementation results prove that our Keccak coprocessor achieves high performance in a small area. 相似文献

12.

Efficient algorithms for 2D area management and online task placement on runtime reconfigurable FPGAs

Zonghua Weichen Jiang Jin Xiuqiang Qingxu 《Microprocessors and Microsystems》2009,33(5-6):374-387

Partial Runtime Reconfigurable (PRTR) FPGAs allow HW tasks to be placed and removed dynamically at runtime. We make two contributions in this paper. First, we present an efficient algorithm for finding the complete set of Maximal Empty Rectangles on a 2D PRTR FPGA. We also present a HW implementation of the algorithm with negligible runtime overhead. Second, we present an efficient online deadline-constrained task placement algorithm for minimizing area fragmentation on the FPGA by using an area fragmentation metric that takes into account probability distribution of sizes of future task arrivals as well as the time axis. The techniques presented in this paper are useful in an operating system for runtime reconfigurable FPGAs to manage the HW resources on the FPGA when HW tasks that arrive and finish dynamically at runtime. 相似文献

13.

Hankel singular value functions from Schmidt pairs for nonlinear input–output systems

W. Steven Gray Jacquelien M.A. Scherpen 《Systems & Control Letters》2005,54(2):135-144

This paper presents three results in singular value analysis of Hankel operators for nonlinear input–output systems. First, the notion of a Schmidt pair is defined for a nonlinear Hankel operator. This makes it possible to define a Hankel singular value function from a purely input–output point of view and without introducing a state space setting. However, if a state space realization is known to exist then a set of sufficient conditions is given for the existence of a Schmidt pair, and the state space provides a convenient representation of the corresponding singular value function. Finally, it is shown that in a specific coordinate frame it is possible to relate this new singular value function definition to the original state space notion used to describe nonlinear balanced realizations. 相似文献

14.

布尔函数和伪布尔函数多项式表示的快速实现算法

李云强孙怀波王爱兰《计算机工程与应用》2007,43(1):50-52

布尔函数和伪布尔函数在不同的领域有着广泛的应用,利用多项式表示有利于刻划它们的一些特征属性。论文首先在已知输入都能得到输出的条件下给出了布尔函数多项式表示的快速实现算法,该算法仅用到模2加运算,运算次数少,具有简洁、易于编程实现、准确而快速的特点,而且该算法很易推广为伪布尔函数多项式表示的快速实现算法,只需把模2加运算换成实数加运算即可。接着通过比较说明了伪布尔函数多项式表示的快速实现算法,同时指出任何伪布尔函数都能通过多项式形式表示出来。最后通过实例进一步验证了算法的正确性。相似文献

15.

Recursive functions of context free languages (Ⅱ)——Validity of CFPRF and CFRF definitions 总被引：2，自引：0，他引：2

董韫美《中国科学F辑(英文版)》2002,45(2)

In this paper we proved that the function class CFRF and its proper subclass CFPRF are respectively the partial recursive functions and primitive recursive functions of context free languages (CFLs). Also we discussed the relation between them and recursive functions defined on other domains . It is indicated that the functions of natural numbers and/or symbol strings (words) are functions of CFLs. Several frequently used primitive recursive functions on words were given, including logical connectives, conditional expressions. Also the powerful operators (bounded maximization and minimization operators) for constructing primitive recursive functions were defined. Two important nontrivial algorithms, the characteristic function of arbitrary CFL and the parse function of CFL sentences were constructed. Based on them, the method for extending or restricting function domain was described. 相似文献

16.

基于FPGA的星载SAR方位压缩处理器设计与实现 总被引：1，自引：0，他引：1

郑晓双禹卫东李早社《数据采集与处理》2006,21(2):154-158

介绍了基于FPGA芯片的星载合成孔径雷达实时成像处理器中方位压缩处理器的设计与实现。该处理器可根据参数实时生成匹配滤波参考函数,用频域方法实现雷达回波的方位向压缩,并输出实图像。处理器与主控间采用ISA总线接口。介绍了方位压缩的原理和功能,详细描述了处理器硬件开发和FPGA设计。测试结果表明,该处理器可以实现星载条件下雷达数据的方位压缩。相似文献

17.

Hybrid Fourier and block-pulse functions for applications in the calculus of variations

《国际计算机数学杂志》2012,89(8-9):695-702

If we divide the interval [0,1] into N sub-intervals, then hybrid Fourier and block-pulse functions on each sub-interval can approximate any function. This ability helps us to have more accurate approximations of piecewise continuous functions. Hence we obtain more accurate solutions to problems in the calculus of variations. In this article, we use a combination of Fourier and block-pulse functions on the interval [0, 1] to solve a variational problem in the solution of algebraic equations. An illustrative example is included to demonstrate the validity and applicability of the technique. 相似文献

18.

New delay-dependent exponential stability criteria for neural networks with discrete and distributed time-varying delays 总被引：1，自引：0，他引：1

Junkang TianAuthor Vitae Shouming ZhongAuthor Vitae 《Neurocomputing》2011,74(17):3365-3375

In this paper, the problem of exponential stability criteria for neural networks with discrete and distributed time-varying delays are considered. By dividing the discrete delay interval into multiple segments and choosing a new class of Lyapunov functional which contains tripe-integral terms, some new delay-dependent stability criteria are derived in terms of linear matrix inequalities. The obtained criteria are less conservative because free-weighting matrices method and a convex optimization approach are considered. Finally, numerical examples are given to illustrate the effectiveness of the proposed method. 相似文献

19.

A parametric-based performance evaluation and design trade-offs for interconnect architectures using FPGAs for networks-on-chip

Sani Abba Jeong-A Lee 《Microprocessors and Microsystems》2014

Network-on-Chip (NoC) interconnect fabrics are categorized according to trade-offs among latency, throughput, speed, and silicon area, and the correctness and performance of these fabrics in Field-Programmable Gate Array (FPGA) applications are assessed through experimentation and simulation. In this paper, we propose a consistent parametric method for evaluating the FPGA performance of three common on-chip interconnect architectures namely, the Mesh, Torus and Fat-tree architectures. We also investigate how NoC architectures are affected by interconnect and routing parameters, and demonstrate their flexibility and performance through FPGA synthesis and testing of 392 different NoC configurations. In this process, we found that the Flit Data Width (FDW) and Flit Buffer Depth (FBD) parameters have the heaviest impact on FPGA resources, and that these parameters, along with the number of Virtual Channels (VCs), significantly affect reassembly buffering and routing and logic requirements at NoC endpoints. Applying our evaluation technique to a detailed and flexible cycle accurate simulation, we drive the three NoC architectures using benign (Nearest Neighbor and Uniform) and adversarial (Tornado and Random Permutation) traffic patterns with different numbers of VCs, producing a set of load–delay curves. The results show that by strategically tuning the router and interconnect parameters, the Fat-tree network produces the best utilization of FPGA resources in terms of silicon area, clock frequency, critical path delays, network cost, saturation throughput, and latency, whereas the Mesh and Torus networks showed comparatively high resource costs and poor performance under adversarial traffic patterns. From our findings it is clear that the Fat-tree network proved to be more efficient in terms of FPGA resource utilization and is compliant with the current Xilinx FPGA devices. This approach will assist engineers and architects in establishing an early decision in the choice of right interconnects and router parameters for large and complex NoCs. We demonstrate that our approach substantially improves performance under a large variety of experimentation and simulation which confirm its suitability for real systems. 相似文献

20.

Combined projection and kernel basis functions for classification in evolutionary neural networks

P.A. C. M. J.C. 《Neurocomputing》2009,72(13-15):2731

This paper proposes a hybrid neural network model using a possible combination of different transfer projection functions (sigmoidal unit, SU, product unit, PU) and kernel functions (radial basis function, RBF) in the hidden layer of a feed-forward neural network. An evolutionary algorithm is adapted to this model and applied for learning the architecture, weights and node typology. Three different combined basis function models are proposed with all the different pairs that can be obtained with SU, PU and RBF nodes: product–sigmoidal unit (PSU) neural networks, product–radial basis function (PRBF) neural networks, and sigmoidal–radial basis function (SRBF) neural networks; and these are compared to the corresponding pure models: product unit neural network (PUNN), multilayer perceptron (MLP) and the RBF neural network. The proposals are tested using ten benchmark classification problems from well known machine learning problems. Combined functions using projection and kernel functions are found to be better than pure basis functions for the task of classification in several datasets. 相似文献