共查询到20条相似文献,搜索用时 0 毫秒
1.
Use of low-precision logarithms can minimize power consumption and increase the speed of multiply-intensive signal-processing systems, such as FIR filters. Although straight table lookup is the most obvious way to compute the logarithm, Maenner claims to have discovered a technique that produces four extra bits at no cost. We analyze Maenner's technique and show that in fact the technique provides only one extra bit of precision. A related technique by Kmetz, which has never been analyzed before, is shown here to be more accurate than Maenner's. We compare these techniques to the more complex bipartite technique, and show that Kmetz's technique takes less memory for systems requiring fewer than ten bits of precision. 相似文献
2.
Parameterized High Throughput Function Evaluation for FPGAs 总被引:1,自引:0,他引:1
This paper presents parameterized module-generators for pipelined function evaluation using lookup tables, adders, shifters, multipliers, and dividers. We discuss trade-offs involved between (1) full-lookup tables, (2) bipartite (lookup-add) units, (3) lookup-multiply units, (4) shift-and-add based CORDIC units, and (5) rational approximation. Our treatment mainly focuses on explaining method (3), and briefly covers the background of the other methods. For lookup-multiply units, we provide equations for estimating approximation errors and rounding errors which are used to parameterize the hardware units. The resources and performance of the resulting design can be estimated given the input parameters. A selection of the compared methods are implemented as part of the current PAM-Blox module generation environment. An example shows that the lookup-multiply unit produces competitive designs with data widths up to 20 bits when compared with shift-and-add based CORDIC units. Additionally, the lookup-multiply method or rational approximation can produce efficient designs for larger data widths when evaluating functions not supported by CORDIC. 相似文献
3.
在浮点除法器的设计中,若用函数迭代或高基数算法进行除法运算,计算开始时,通过浮点倒数查找表获得一个较精确的初始除数倒数近似值,可以减少除法的迭代次数,缩短运算的延迟。即将除数的前几位作为表的一个入口地址,该地址指示的位置存放着满足一定精度的初值倒数近似值。文中详述了几种获得除数倒数近似值的方法,其中包括算法、误差限和精度等。 相似文献
4.
Increasing chip densities and transistor counts provide more room for designers to add functionality for important application
domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications,
hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications
for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic
(IEEE P754). In this paper, we presents an efficient arithmetic algorithm and hardware design for decimal floating-point division.
The design uses an efficient piecewise linear approximation, a modified Newton–Raphson iteration, a specialized rounding technique,
and a simplified decimal incrementer and decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the
decimal divider, which is compliant with the current version of IEEE P754, has an estimated critical path delay of 0.69 ns
(around 13 FO4 inverter delays) when implemented using LSI Logic’s 0.11 micron Gflx-P standard cell library.
相似文献
Michael J. SchulteEmail: |
5.
6.
7.
本文提出了一种新的BP网络模型及其相应的算法。这种模型采用具有非线性特性的自适应查表单元来模拟神经元的突触,从而为网络的学习提供了全局最优的收敛特性以及省时的迭代和快速收敛等优点。 相似文献
8.
Jing-Ming Guo 《电子科技学刊:英文版》2011,9(4):306-311
A halftone watermarking method of high quality, robustness, and capacity flexibility is presented in this paper. An objective halftone image quality evaluation method based on the human visual system obtained by a least-mean-square algorithm is also introduced. In the encoder, the kernels-alternated error diffusion (KAEDF) is applied. It is able to maintain the computational complexity at the same level as ordinary error diffusion. Compared with Hel-Or using ordered dithering, the proposed KAEDF yields a better image quality through using error diffusion. We also propose a weighted lookup table (WLUT) in the decoder instead of lookup table (LUT), as proposed by Pei and Guo, so as to achieve a higher decoded rate. As the experimental results demonstrate, this technique is able to guard against degradation due to tampering, cropping, rotation, and print-and-scan processes in error-diffused halftone images. 相似文献
9.
Earl E. Swartzlander Jr. 《The Journal of VLSI Signal Processing》2007,49(1):177-183
The two’s complement fractional fixed-point number system is widely used to implement digital signal processing on VLSI chips.
It has a range of values from −1 to one least significant bit below +1. Either the multiplication of −1 • −1 or taking the
absolute value of −1 produces a result (+1) that cannot be represented. A new system, the negative two’s complement number
system, is described here that has a range of one least significant bit above −1 to +1 which eliminates the problem. This
paper presents the new number system and describes algorithms for the basic arithmetic operations.
相似文献
Earl E. Swartzlander Jr.Email: |
10.
IP路由表查找是实现高性能路由器的主要瓶颈。根据IP业务流量分布特性,在现有的路由表查找技术的基础上,提出了基于流量分布的高速路由表查找算法。 相似文献
11.
12.
提出一种应用Givens变换实现QR分解的脉动阵列中的边界单元和内部单元电路结构。在硬件成本增加不多的条件下,应用牛顿-拉夫逊迭代方法和查询表技术,实现了平方根倒数的快速计算。综合的电路工作时钟达100MHz,两种单元的数据处理周期均为120ns,阵列数据处理速度高于相关文献报道40%以上。 相似文献
13.
根升余弦脉冲成形滤波器FPGA实现 总被引:2,自引:1,他引:2
提出了基于电路分割技术实现通信系统发送端根升余弦波形成形滤波器查表法的FPGA结构,节省了ROM单元,讨论了其ROM初始化时形渡数据的组织方法,完成了该结构的VHDI。实现,给出了该设计在Modelsim环境下的时序仿真结果。通过对仿真结果分析,表明所述的设计方法是可行的。该设计方案不随波形样本数目的增多而使电路系统变得更为复杂,它所实现的成形滤波器满足于高速成形的应用需求。 相似文献
14.
15.
针对高动态范围图像传感器MT9M034输出图像时进行实时数据压缩引入的误差,分析研究了该误差对基于CLAHE色调映射算法的影响,并给出了误差校正方法和FPGA实现电路。实验结果显示,误差校正方法减小了该误差影响,且改善了CLAHE色调映射处理的效果。 相似文献
16.
In this work we present an implementation of the exponential function in double precision, in a unit that supports IEEE floating-point arithmetic. As existing proposals, the implementation is based on the use of a floating-point multiplier and additional hardware. We decompose the computation into three subexponentials. The first and third subexponentials are computed in a conventional way (table look-up and polynomial approximation). The second subexponential is computed based on a transformation of the slow radix-2 digit-recurrence algorithm into a fast computation by using the multiplier and additional hardware. We present a design process that permits the selection of the most convenient trade-off between hardware complexity and latency. We discuss the algorithm, the implementation, and perform a rough comparison with three proposed designs. Our estimations indicate that the implementation proposed in this work presents better trade-off between hardware complexity and latency than the compared designs. 相似文献
17.
在4G无线通信系统中,需要自动增益控制 (Automatic Gain Control,AGC) 模块来扩大接收机的动态范围,同时保证系统响应时间。提出了一种基于4G下行同步码平均功率检测的数字AGC算法,在同步信号的控制下,此算法运用滑动窗方法在一定时间内来提取相关运算平均功率的最大值,然后在查找表内查找出此时的控制字,最后经过运算后送入数控衰减器。此算法运用MATLAB进行仿真验证,然后在FPGA内进行了设计与实现。测试结果表明,该算法功能正确,且系统响应时间快速。 相似文献
18.
Elliptic curve cryptography (ECC) is recognized as a fast cryptography system and has many applications in security systems.
In this paper, a novel sharing scheme is proposed to significantly reduce the number of field multiplications and the usage
of lookup tables, providing high speed operations for both hardware and software realizations.
相似文献
Brian KingEmail: |
19.