首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The aim of this work is to decrease the bit precision required in computations without affecting the precision of the answer, whether this is computed exactly or within some tolerance. By precision we understand the number of bits in the binary representation of the values involved in the computation, hence a smaller precision requirement leads to a smaller complexity. We achieve this by combining the customary numerical techniques of rounding the least significant bits with the algebraic technique of reduction modulo an integer, which we extend to the reduction modulo a positive number. In particular, we show that, if the sum of several numbers has small magnitude, relative to the magnitude of the summands, then the precision used in the computation of this sum can be decreased without affecting the precision of the answer. Furthermore, if the magnitude of the inner product of two vectors is small and if one of them is filled with “short” binary numbers, then again we may decrease the precision of the computation. The method is applied to the iterative improvement algorithm for a linear system of equations whose coefficients are represented by “short” binary numbers, as well as to the solution of PDEs by means of multigrid methods. Some results of numerical experiments are presented to demonstrate the power of the method.  相似文献   

2.
S.  A.  A. 《Journal of Systems Architecture》2008,54(10):957-966
Square root is an operation performed by the hardware in recent generations of processors. The hardware implementation of the square root operation is achieved by different means. One of the popular methods is the non-restoring algorithm. In this paper, the classical non-restoring array structure is improved in order to simplify the circuit. This reduction is done by eliminating a number of circuit elements without any loss in the precision of the square root or the remainder. For a 64-bit non-restoring circuit the area of the suggested circuit is about 44% smaller than that of a conventional non-restoring array circuit. Furthermore, in order to create an environment for modular design of the non-restoring square root circuit, a number of modules are suggested. Using these modules it is possible to construct any square root circuit with an arbitrary number of input bits. The suggested methodology results in an expandable design with reduced-area. Analytical and simulation results show that the delay of the proposed circuit, for a 64-bit radicand, is 80% less than that of a conventional non-restoring array circuit.  相似文献   

3.
4.
The number of precision bits for operations and data are limited in the hardware implementations of backpropagation (BP). Reduction of rounding error due to this limited precision is crucial in the implementation. The new learning algorithm is based on overestimation of significant error in order to alleviate underflow and omission of weight updating for correctly recognized patterns. While the conventional BP algorithm minimizes the squared error between output signals and supervising data, the new learning algorithm minimizes the weighted error function. In the learning simulation of multifont capital recognition, this algorithm converged recognition accuracy to 100% with only 8-b precision. In addition, the recognition accuracy for characters that did not appear in the training data reached 94.9%. This performance is equivalent to that of a conventional BP with 12-b precision. Moreover, it is found that the performance of the weighted error function is high even when only a small number of hidden neurons is used. Consequently, the algorithm reduces the required amount of weight memory.  相似文献   

5.
N-step incremental straight-line algorithms   总被引:7,自引:0,他引:7  
This class of algorithms extends Bresenham's (1965) integer straight-line algorithm to generate more than one pixel per inner loop, thus reducing inner loop overhead. The quad-step algorithm is too large to justify its use in older hardware with limited memory space, but it can be viable in the context of modern memory and software sizes. Because the algorithm reduces both calculation overhead and the number of memory accesses for adjacent pixels, it can improve the performance of current systems that are limited in their processor speed and of future systems that might be limited in their memory speed. The algorithm gives results identical to those from Bresenham's single-step routine while drawing pixels in the expected direction from start to end point. Furthermore, as the gradual trend towards more bits per pixel continues, a processor supporting multi-word burst data instructions could make good use of this algorithm in speeding up line drawing into a 24-bits-per-pixel, 1-pixel-per-word color frame buffer. I chose to implement 4 steps per loop because it gave a useful performance improvement without exceeding the resources of the target processor, and it was small enough to hand-code. However, the techniques described can be used to construct a straight-line algorithm that generates more than 4 steps per loop. The relatively small average decision tree sizes indicate that algorithms of greater than 4 pixels per step might further improve line-drawing efficiency  相似文献   

6.
杨智应  朱洪  宋建涛 《软件学报》2004,15(5):650-659
算法的复杂度平滑分析是对许多算法在实际应用中很有效但其最坏情况复杂度却很糟这一矛盾给出的更合理的解释.高性能计算机被广泛用于求解大规模线性系统及大规模矩阵的分解.求解线性系统的最简单且容易实现的算法是高斯消元算法(高斯算法).用高斯算法求解n个方程n个变量的线性系统所需要的算术运算次数为O(n3).如果这些方程中的系数用m位表示,则最坏情况下需要机器位数mn位来运行高斯算法.这是因为在消元过程中可能产生异常大的中间项.但大量的数值实验表明,在实际应用中,需要如此高的精度是罕见的.异常大的矩阵条件数和增长因子是导致矩阵A病态,继而导致解的误差偏大的主要根源.设-A为任意矩阵,A是-A受到微小幅度的高斯随机扰动所得到的随机矩阵,方差σ2≤1.Sankar等人对矩阵A的条件数及增长因子进行平滑分析,证明了Pr[K(A)≥α]≤(3.64n(1+4√log(α)))/ασ.在此基础上证明了运行高斯算法输出具有m位精度的解所需机器位数的平滑复杂度为m+71og2(n)+3log2(1/σ)+log2log2n+7.在上述结果的证明过程中存在错误,将其纠正后得到以下结果:m+71og2n+3log2(1/σ)+4√2+log2n+log2(1/σ)+7.367.通过构造两个分别关于矩阵范数和随机变量乘积的不等式,将关于矩阵条件数的平滑分析结果简化到Pr[K(A)≥α]≤(6√2n2)/α·σ.部分地解决了Sankar等人提出的猜想:Pr[K(A)≥α]≤O(n/α·σ).并将运行高斯算法输出具有m位精度的解所需机器位数的平滑复杂度降低到m+81og2n+3log2(1/σ)+7.实验结果表明,所得到的平滑复杂度更好.  相似文献   

7.
Ramirez算法是一种对过程方程组进行分解的算法。此算法将过程方程组分解后,可以使求解计算更快速可靠。本文指出,选择输出变量顺序不同,过程方程组分解过程中所形成的平台变量也不同,从而影响决策变量的选择。在确定决策变量时,应优先选取平台变量为决策变量。将Ramirez算法用于催化裂化反应再生模型的分解,找出了合适的求解策略,验证了提出的观点。  相似文献   

8.
Low density parity check codes (LDPC) exhibit near capacity performance in terms of error correction. Large hardware costs, limited flexibility in terms of code length/code rate and considerable power consumption limit the use of belief-propagation algorithm based LDPC decoders in area and energy sensitive mobile environment. Serial bit flipping algorithms offer a trade-off between resource utilization and error correction performance at the expense of increased number of decoding iterations required for convergence. Parallel weighted bit flipping decoding and its variants aim at reducing the decoding iteration and time by flipping the potential erroneous bits in parallel. However, in most of the existing parallel decoding methods, the flipping threshold requires complex computations.In this paper, Hybrid Weighted Bit Flipping (HWBF) decoding is proposed to allow multiple bit flipping in each decoding iteration. To compute the number of bits that can be flipped in parallel, a criterion for determining the relationship between the erroneous bits in received code word is proposed. Using the proposed relation the proposed scheme can detect and correct a maximum of 3 erreneous hard decision bits in an iteration. The simulation results show that as compared to existing serial bit flipping decoding methods, the number of iterations required for convergence is reduced by 45% and the decoding time is reduced by 40%, by the use of proposed HWBF decoding. As compared to existing parallel bit flipping decoding methods, the proposed HWBF decoding can achieve similar bit error rate (BER) with same number of iterations and lesser computational complexity. Due to reduced number of decoding iterations, less computational complexity and reduced decoding time, the proposed HWBF decoding can be useful in energy sensitive mobile platforms.  相似文献   

9.
Virtualization facilitates the provision of flexible resources and improves energy efficiency through the consolidation of virtualized servers into a smaller number of physical servers. As an increasingly essential component of the emerging cloud computing model, virtualized environments bill their users based on processor time or the number of virtual machine instances. However, accounting based only on the depreciation of server hardware is not sufficient because the cooling and energy costs for data centers will exceed the purchase costs for hardware. This paper suggests a model for estimating the energy consumption of each virtual machine without dedicated measurement hardware. Our model estimates the energy consumption of a virtual machine based on in-processor events generated by the virtual machine. Based on this estimation model, we also propose a virtual machine scheduling algorithm that can provide computing resources according to the energy budget of each virtual machine. The suggested schemes are implemented in the Xen virtualization system, and an evaluation shows that the suggested schemes estimate and provide energy consumption with errors of less than 5% of the total energy consumption.  相似文献   

10.
The paper considers a sensor network whose sensors observe a common quantity and are affected by arbitrary additive bounded noises with a known upper bound. During the experiment, any sensor can communicate only a finite and given number of bits of information to the decision center. The contributions of the particular sensors, the rules of data encoding, decoding, and fusion, as well as the estimation scheme should be designed to achieve the best overall performance in estimation of the observed quantity by the decision center. An optimal algorithm is obtained that minimizes the maximal feasible error. It is shown that it considerably outperforms the algorithm proposed in recent papers in the area and examined only in the idealized case of noiseless sensors.  相似文献   

11.
Compact algorithms are Estimation of Distribution Algorithms which mimic the behavior of population-based algorithms by means of a probabilistic representation of the population of candidate solutions. These algorithms have a similar behaviour with respect to population-based algorithms but require a much smaller memory. This feature is crucially important in some engineering applications, especially in robotics. A high performance compact algorithm is the compact Differential Evolution (cDE) algorithm. This paper proposes a novel implementation of cDE, namely compact Differential Evolution light (cDElight), to address not only the memory saving necessities but also real-time requirements. cDElight employs two novel algorithmic modifications for employing a smaller computational overhead without a performance loss, with respect to cDE. Numerical results, carried out on a broad set of test problems, show that cDElight, despite its minimal hardware requirements, does not deteriorate the performance of cDE and thus is competitive with other memory saving and population-based algorithms. An application in the field of mobile robotics highlights the usability and advantages of the proposed approach.  相似文献   

12.
由于H.264算法中宏块划分的灵活性导致了头信息的数据量变化剧烈而难以预测,同时由于变换系数的数据量大大的减小,又使头信息量在码流中所占的比重远高于在先前各个标准的码流中的比重,因此准确估计头信息量能有效提高H.264码率控制算法的性能。为了准确估计头信息,提出了一种应用卡尔曼滤波器估计帧级总头信息比特数的方法,并将该方法应用于帧级的码率控制。实验结果表明,该算法能准确估计头信息,并能在一定程度上提高相同码率下的总体图像质量。  相似文献   

13.
Digital signal processing algorithms often rely heavily on a large number of multiplications, which is both time and power consuming. However, there are many practical solutions to simplify multiplication, like truncated and logarithmic multipliers. These methods consume less time and power but introduce errors. Nevertheless, they can be used in situations where a shorter time delay is more important than accuracy. In digital signal processing, these conditions are often met, especially in video compression and tracking, where integer arithmetic gives satisfactory results. This paper presents a simple and efficient multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure, prior to achieving the exact result. The multiplier is based on the same form of number representation as Mitchell’s algorithm, but it uses different error correction circuits than those proposed by Mitchell. In such a way, the error correction can be done almost in parallel (actually this is achieved through pipelining) with the basic multiplication. The hardware solution involves adders and shifters, so it is not gate and power consuming. The error summary for operands ranging from 8 bits to 16 bits indicates a very low relative error percentage with two iterations only. For the hardware implementation assessment, the proposed multiplier is implemented on the Spartan 3 FPGA chip. For 16-bit operands, the time delay estimation indicates that a multiplier with two iterations can work with a clock cycle more than 150 MHz, and with the maximum relative error being less than 2%.  相似文献   

14.
针对直接数字频率合成器(DDS)芯片因存储空间开销大导致功耗增加,可靠性降低的问题 ,设计了一种将改进sunderland算法与QE-ROM技术相结合的一种用于直接数字频率合成器(DDS)的紧凑型16位精度正弦查找表(ROM);对所设计的正弦查表算法进行了系统级仿真与硬件描述语言(Verilog HDL)实现,并最终在FPGA上进行了整体算法功能与性能的验证;基于AD5360芯片制作了一款多通道16位输出数模转换器(DAC),并搭载降压稳压芯片LM317和LM337实现了一款可以将220V工频转换为DAC所需的±9V和3.75V的供电电源。测试结果显示,设计的正弦查找表算法在达到16位精度的同时,只占据8576bit的存储空间。所使用的正弦数据优化算法相比较传统的DDS正弦波形发生器资源节省99.2%,实现了122:1的压缩比,有效降低了DDS的芯片面积和功耗;  相似文献   

15.
自动文本分类的效果在很大程度上依赖于属性特征的选择。针对传统基于频率阈值过滤的特征选择方法会导致有效信息丢失,影响分类精度的不足,提出了一种基于粗糙集的文本自动分类算法。该方法对加权后的特征属性进行离散化,建立一个决策表;根据基于依赖度的属性重要度对决策表中条件属性进行适当的筛选;采用基于条件信息熵的启发式算法实现文本属性特征的约简。实验结果表明,该方法能约简大量冗余的特征属性,在不降低分类精度的同时,提高文本分类的运行效率。  相似文献   

16.
一种高效体数据压缩算法及其在地震数据处理中的应用   总被引:2,自引:0,他引:2  
采用可编程图形硬件对大规模体数据进行直接体绘制时常常受到图形卡容量的限制,导致数据在内存与显存之间频繁交换,从而成为绘制的瓶颈.为此,提出一种大规模体数据矢量量化压缩算法.首先对体数据分块,并依据块内数据平均梯度值是否为0对该块进行分类;然后用3层结构表示梯度值非0的块,对其中次高层和最高层采用基于主分量分析分裂法产生初始码书,用LBG算法进行码书优化和量化,而对最低层以及梯度值为0的块采用定比特量化.实验结果表明,在保证较好图像重构质量的前提下,该算法可获得50倍以上的压缩比和更快的解压速度.  相似文献   

17.
针对混合动力电动汽车(HEV)氮氧化物( )排放的问题,提出了一种基于决策树CART算法的柴油混合动力能源管理策略。首先,提出了一种结合决策树与回归树的分类算法(Classification and Regression Tress,CART),针对类别和变量特征,从一个或多个预测变量中预测出个例的趋势变化关系;然后,通过控制发动机和电动机之间的扭矩分配,引入了额外的自由度以调整从纯燃料经济性情况到纯 限制情况的优化权衡;最后,采用基于软件在环路和硬件在环仿真的方法,从而根据动力系统配置了解系统性能,并调整所提出的能源管理策略。实验结果表明,提出的柴油混合动力能源管理策略中, 的减少对燃料消耗的影响,且可以通过选择最佳工作点和限制发动机动力来限制 排放的潜力。相比其他几种较新的同类方案,提出的方案在同等燃料消耗的情况下 排放量更小,在燃料消耗略有下降的情况下,可以显着降低 。  相似文献   

18.
The problem of sensor faults on an AC-drive system for an electric train is considered here. Intermittent disconnections of these sensors produce severe transient errors in the estimator for the control loop. This paper uses a bilinear model of the motors and model-based techniques to produce estimates of control variables that are tolerant to intermittent disconnections, without degrading performance. The paper shows how such a system can be verified in hardware, on a small test-rig with a DSP used to run the fault-tolerant algorithm.  相似文献   

19.
The shuffled frog leaping (SFL) optimization algorithm has been successful in solving a wide range of real-valued optimization problems. In this paper we present a discrete version of this algorithm and compare its performance with a SFL algorithm, a binary genetic algorithm (BGA), and a discrete particle swarm optimization (DPSO) algorithm on seven low dimensional and five high dimensional benchmark problems. The obtained results demonstrate that our proposed algorithm, i.e. the DSFL, outperforms the BGA and the DPSO in terms of both success rate and speed. On low dimensional functions and for large values of tolerance the DSFL is slower than the SFL, but their success rates are equal. Part of this slowness could be attributed to the extra bits used for data coding. By increasing number of variables and the required precision of answer, the DSFL performs very well in terms of both speed and success rate. For high dimensional problems, for intrinsically discrete problems, also when the required precision of answer is high, the DSFL is the most efficient method.  相似文献   

20.
随着集成密度的增大以及工作电压的降低,基于SRAM的FPGA芯片更加容易受到单粒子翻转的影响。提出了一种基于通用布局布线工具VPR的抗辐射布线算法,通过改变相关布线资源节点的成本函数,来减少因单粒子翻转引起的桥接错误,并与VPR比较下板测试结果。实验结果表明,该布线算法可以使芯片的容错性能提升20%左右,并且不需要增加额外的硬件资源或引入电路冗余。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号