首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
To design a 32-bit logarithmic number system (LNS) processor, this paper presents two novel techniques: Digit-Partition (DP) to design log2(1.x) function and Iterative Difference by Linear Approximation (IDLA) to design 20.x function. The basic concept behind DP is that variablex can be divided into two parts in bit representation to be implemented. Thus, ROM or PLA table can be reduced to a reasonable size and this will make a high precision design allowable. The basic idea of IDLA is that the function 20.x can be obtained approximately through iterative linear approximations. By this method, only adder, shifter and a small PLA are required, unlike the previous designs which require ROM and multiplier. The experiment results reveal that the proposed design is more attractive than the previous researches in the LNS processor.This work was supported by the National Science Council under Grant NSC 84-2215-E002-020.  相似文献   

2.
Over the last few years, the Logarithmic Number System (LNS) has played a pivotal and decisive role in the field of Digital Signal Processing (DSP) and Image processing. Multiplication is a ubiquitous thirsty area to perform arithmetic operations in DSP applications and researchers have found that LNS is the possible solution for multiplication to be performed for a DSP application. In this paper, we propose a novel approach based on the Improved Operand Decomposition (IOD) to make an efficient logarithmic multiplier and subsequent achievement through scale realization. The Pipeline technique and the efficient correction circuit are used for error minimization at the cost of minimal hardware and delay. Reported and proposed multiplier is evaluated and compared in terms of Data Arrival Time (DAT), area, power, Area Delay Product (ADP), and EPS (Energy per Sample) at 90 nm CMOS technology by using Synopsys Design Compiler. Simulation results show that the proposed IOD method for logarithmic multiplication without the pipelining gives maximum of 35.39% less ADP and 11.15% less EPS for 32-bit architecture than of the reported logarithmic multiplier architecture. The proposed IOD based logarithmic multiplier with the pipelining gives a maximum of 20.17% less ADP for 8-bit architecture and 21.72% for 32-bit architecture than of the reported iterative pipelined architecture of logarithmic multiplier. Simulation results show that the optimized logarithmic converter gives 7.32%, and optimized antilogarithmic converter gives 41.59% less ADP respectively than of the reported logarithmic and antilogarithmic converter structures. The optimized antilogarithmic converter architecture gives a maximum of 43.94% less EPS than of the reported antilogarithmic converter structure.  相似文献   

3.
This paper presents a method for evaluating functions based on piecewise polynomial approximations (splines) with a hierarchical segmentation scheme targeting hardware implementation. The methodology provides significant reduction in table size compared to traditional uniform segmentation approaches. The use of hierarchies involving uniform splines and splines with size varying by powers of two is particularly well suited for the coverage of nonlinear regions. The segmentation step is automated and supports user-supplied precision requirements and approximation method. Bit-widths of the coefficients and arithmetic operators are optimized to minimize circuit area and enable a guarantee of 1 unit in the last place (ulp) accuracy at the output. A coefficient transformation technique is also described, which significantly reduces the dynamic ranges of the fixed-point polynomial coefficients. The hierarchical segmentation method is illustrated using a set of functions including $-(x/2) log_2{x}, cos^{-1}(x), sqrt{-ln(x)}$ , a high-degree rational function, $ln(1+x)$, and $1/(1+x)$. Various degree-1 and degree-2 approximation results for precisions between 8 to 24 bits are given. Hardware realizations are demonstrated on a Xilinx Virtex-4 field-programmable gate array (FPGA).   相似文献   

4.
Economical hardware often uses a FiXed-point Number System (FXNS), whose constant absolute precision is acceptable for many signal-processing algorithms. The almost-constant relative precision of the more expensive Floating-Point (FP) number system simplifies design, for example, by eliminating worries about FXNS overflow because the range of FP is much larger than FXNS for the same wordsize; however, primitive FP introduces another problem: underflow. The Signed Logarithmic Number System (SLNS) offers similar range and precision as FP with much better performance (in terms of power, speed and area) for multiplication, division, powers and roots. This paper proposes three variations of a new number system, respectively called the Denormal LNS (DLNS), Denormal Mitchell LNS (DMLNS) and Denormal Offset Mitchell LNS (DOMLNS), which are all hybrids of the properties of FXNS and SLNS. The inspiration for D(OM)LNS comes from the denormal (aka subnormal) numbers found in IEEE-754 (that provide better, gradual underflow) and the μ-law often used for speech encoding; the novel DLNS circuit here allows arithmetic to be performed directly on such encoded data. The proposed approach allows customizing the range in which gradual underflow occurs. Our first DLNS implementation leverages existing SLNS basic blocks. Synthesis shows the novel circuit primarily consists of traditional SLNS addition and subtraction tables, with additional datapaths that allow the novel arithmetic unit to act on conventional SLNS as well as DLNS and mixed data, for a worst-case area overhead of 26 %. Unlike SLNS, this DLNS implementation is still costly for general (non-constant) multiplication, division and roots. To overcome this difficulty, this paper proposes the other variations called Denormal Mitchell LNS (DMLNS) and Denormal Offset Mitchell LNS (DOMLNS), in which the well-known Mitchell’s method makes the cost of general multiplication, division and roots closer to that of SLNS. Taylor-series computations suggest subnormal values in DMLNS and DOMLNS also behave similarly to those in the IEEE-754 FP standard. Synthesis shows that DMLNS and DOMLNS respectively have average area overheads of 25 % and 17 % compared to an equivalent SLNS 5-operation unit.  相似文献   

5.
A logarithmic processor is proposed that uses external RAM for holding the table required for logarithmic subtraction. The proposed processor requires that the RAM be initialized before any computations occur. We give an algorithm to initialize the RAM using the limited arithmetic unit of the processor. The algorithm is ten times faster than a bit by bit computation of the logarithm and antilogarithm. Bounds are developed for comparing the error of this algorithm against the error of earlier algorithms. Simulation results show that this algorithm avoids catastrophic cancellation, and is as accurate as any previously known single precision algorith.  相似文献   

6.
A high-performance data execution unit suitable for computation-intensive digital signal processing systems is described. This unit uses the hybrid number system approach to speed up the basic arithmetic operations while remaining compatible with a standard IEEE 32-b floating-point format. However, all the arithmetic operations are performed in the 32 b logarithmic number system (LNS) domain. This chip is designed using a 3.4 V 0.8 μm CMOS technology with double-layer metallization. Conversion algorithms, chip architecture, design methodology, and major circuit components are discussed. A macrocell design methodology is adopted in order to achieve high-performance custom design circuits with the convenience of an automatic layout system. Computer simulations indicate that all the 32 b floating-point arithmetic operations (multiplication, division, squaring, and square root) can be executed in 10 ns. Extension of this unit into a 64 b double-precision floating-point system and multiply-accumulation applications are also presented  相似文献   

7.
对数数值系统的研究   总被引:1,自引:0,他引:1  
李炜  沈绪榜 《微电子学与计算机》2004,21(12):154-158,162
对数数值系统(LNS)能够简化部分算术运算,从而能够提高运算效率以及降低系统功耗。文章从各个方面对LNS进行了详细的分析与讨论,主要包括LNS下数的表示、精度、范围以及LNS下数的运算,并且讨论了LNS数与普通浮点数之间的相互转换。同时对减小用于加减法运算中的查找表的大小提出了几点建议,最后针对LNS的低功耗问题做了简要的讨论。  相似文献   

8.
A low-power and high-performance 4-way 32-bit stream processor core is developed for handheld low-power 3-D graphics systems. It contains a floating-point unified matrix, vector, and elementary function unit. By exploiting the logarithmic arithmetic and the proposed adaptive number conversion scheme, a 4-way arithmetic unit achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way. The processor featured by this functional unit and several proposed architectural schemes including embedded register index calculations, functional unit reconfiguration, and operand forwarding in logarithmic domain achieves 19.1% cycle count reduction for OpenGL transformation and lighting (TnL) operation from the latest work.   相似文献   

9.
提出了一种基于电流模式的算法型A/D转换器电路结构,分析说明了其基本原理和具体实现方法;构造了一个6位50 MSPS算法型A/D转换器,给出了在OrCAD/PSpice 10.5下的仿真结果,得出用电流模式电路设计高速A/D转换器有优势的结论.  相似文献   

10.
With the upcoming fourth generation wireless systems and convergence of multiple radio standards into a single terminal, building blocks are needed that can be configured for computing various algorithms used in different standards. Given these trends and requirements, in our previous paper we successfully proposed a circuit-sharing design for the DAB receiver to integrate the FFT (fast Fourier transform) and IMDCT (inverse modified discrete cosine transform) operations into the same functional circuit. The proposed technique reduces hardware overhead, enhances circuit efficiency and significantly reduces the cost of DAB receivers. To further improve the efficiency of our previous work, this investigation proposes another alternative design named the circuit-sharing pipeline design (CSPD) using a single processor with a pipeline scheme to combine two functions, FFT and IMDCT, into the same circuit. The proposed method reduces the required chip area and cost of DAB receivers. Analyzing the existing relationship among IMDCT, DCT (discrete cosine transform) and FFT, the IMDCT function can be replaced using a FFT function. Therefore, it is not necessary to design an extra circuit for IMDCT. The arithmetic unit in the FFT processor can be significantly reduced due to employing the pipeline scheme. Consequently, the circuit redundancies in the IMDCT and FFT functions can be easily eliminated to allow exploitation of the decreased chip area. Results of this study demonstrate that the proposed design can provide advantages such as low gate count and small memory size in the DAB (digital audio broadcasting) receiver.  相似文献   

11.
The authors discuss employing alternative number systems to reduce power dissipation in portable devices and high-performance systems. They focus on two alternative number systems that are quite different from the conventional linear number representations, namely the logarithmic number system (LNS) and the residue number system (RNS). Both have recently attracted the interest of researchers for their low-power properties. The authors address aspects of the conventional arithmetic representations, the impact of logarithmic arithmetic on power dissipation, and discuss the low-power aspects of residue arithmetic  相似文献   

12.
A low-power, area-efficient four-way 32-bit multifunction arithmetic unit has been developed for programmable shaders for handheld 3D graphics systems. It adopts the logarithmic number system (LNS) at the arithmetic core for the single-cycle throughput and the small-size low-power unification of various complicated arithmetic operations such as power, logarithm, trigonometric functions, vector-SIMD multiplication, division, square root and vector dot product. 24-region and 16-region piecewise linear logarithmic and antilogarithmic converters are proposed with 0.8% and 0.02% maximum conversion error, respectively. All the supported operations are implemented with less than 6.3% operation error and unified into a single arithmetic platform with maximum four-cycle latency and single-cycle throughput. A 93 K gate test chip is fabricated using one-poly five-metal 0.18-mum CMOS technology. It operates at 210 MHz with maximum power consumption of 15.3 mW at 1.8 V.  相似文献   

13.
This paper provides a personal perspective on developments in the implementation of two systolic fast Fourier transform processors over the last 25 years and identifies some of the lessons learned. This has been a period of tremendous advancements in integrated circuit technology that is demonstrated by the resulting processors. The first processor is the Modular Transform Processor that was developed at TRW in the 1982–1984 time frame using VLSI technology. It is a set of six large circuit boards that computes 4,096-point fast Fourier transforms using 22-bit floating-point arithmetic at sustained data rates of 40 MSPS. The second processor is a single ASIC chip systolic FFT processor developed by the Mayo Foundation in the 2001–2002 time frame that computes 4,096-point FFTs using 16-bit fixed-point arithmetic at sustained data rates of 200 MSPS. Some thoughts on the future directions of systolic FFT processor development are offered. Future systems will compute large transforms (e.g., 16 K-point to 1 M-point) at high data rates (e.g., 500 MSPS to 1 GSPS), will employ more precise arithmetic (e.g., 32-bit single precision IEEE Standard floating-point arithmetic), will consume very low power (e.g., on the order of one watt) and will be realized on a single chip.  相似文献   

14.
岳梦云  白冰 《电子学报》2000,48(10):2041-2046
本文设计了一种适用于电机矢量控制算法的数字信号处理系统的微架构定义,包括其指令集定义、存储器模型以及与主CPU的交互模式.该设计具有通过固定部分多操作数有效缩减指令编码长度提高代码密度以及后台执行多周期指令提高ALU并行效率的显著优点.文中给出了典型的FOC控制算法在DSP (Digital Signal Processor)指令集上实现的指令周期数,也给出了对应架构的电路实现情况,最终以ARM CORTEX-M0及几款主流DSP作为比较基线,通过实测实验数据证明了体系结构的高能效比,以较为有限的电路面积代价,极大提高了集成DSP的嵌入式系统的运行效率.  相似文献   

15.
The architecture and implementation of a programmable video signal processor dedicated as building block of a multiple instruction multiple data (MIMD)-based bus-connected multiprocessor system is presented. This system can either be constructed from several single processor chips, or it can be integrated on a large area integrated circuit containing several processors. The processor allows an efficient implementation of different video coding standards like H.261, H.263, MPEG-1 and MPEG-2. It consists of a RISC processor supplemented by a coprocessor for computation intensive convolution-like tasks, which provides a peak performance of more than 1 giga-arithmetic operations per second (GOPS). A large area integrated circuit integrating 9 processor elements (PE's) on an area of 16.6 cm2 has been designed. Due to yield considerations redundancy concepts have been implemented, that-even in the presence of production defects-result in working chips utilizing a lower number of PE's. Each PE has built-in self-test (BIST) capabilities, which allow for an independent test of itself under the control of its integrated fault-tolerant BIST controller. Defective PE's are switched off. Only the PE's passing the BIST are used for video processing tasks. Prototypes have been fabricated in a 0.8 μm complementary metal-oxide-semiconductor (CMOS) process structured by masks using wafer stepping with overlapping exposures. Employing redundancy, up to 6 PE's per chip were functional at 66 MHz, thus providing a peak arithmetic performance of up to 6 GOPS  相似文献   

16.
Many early vision tasks require only 6 to 8 b of precision. For these applications, a special-purpose analog circuit is often a smaller, faster, and lower power solution than a general-purpose digital processor, but the analog chips lack the programmability of digital image processors. This paper presents a programmable mixed-signal array processor which combines the programmability of a digital processor with the small area and low power of an analog circuit. Each processor cell in the array utilizes a digitally programmable analog arithmetic unit with an accuracy of 1.3%. The analog arithmetic unit utilizes a unique circuit that combines a cyclic switched-capacitor analog-to-digital converter (ADC) and digital-to-analog converter (DAC) to perform addition, subtraction, multiplication, and division, Each processor cell, fabricated in a 0.8-μm triple-metal CMOS process, operates at a speed of 0.8 MIPS, consumes 1.8 mW of power at 5 V, and uses 700 μm by 270 μm of silicon area. An array of these processor cells performed an edge detection algorithm and a subpixel resolution algorithm  相似文献   

17.
This paper presents floating point design and implementation of System on Chip (SoC) based Differential Evolution (DE) algorithm using Xilinx Virtex-5 Field Programmable Gate Array (FPGA). The hardware implementation is carried out to enhance the execution speed of the embedded applications. Intellectual Property (IP) of DE algorithm is developed and interfaced with the 32-bit PowerPC 440 processor using processor local bus (PLB) of Xilinx Virtex-5 FPGA. In the proposed architecture the algorithmic parameters of DE are scalable. The software and hardware implementation of the DE algorithm is carried out in PowerPC embedded processor and hardware IP respectively. The optimization of numerical benchmark functions and system identification in control systems are implemented to verify the proposed hardware SoC platform. The performance of the IP is measured in terms of acceleration gain of the DE algorithm. The optimization problems are solved by using floating point arithmetic in both embedded processor and hardware. The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor. Further, as a case study an Infinite Impulse Response (IIR) based system identification task on SoC using the developed hardware accelerator is implemented.  相似文献   

18.
This paper presents new techniques to implement direct digital frequency synthesizers (DDFSs) with optimized piecewise-polynomial approximation. DDFS performances with piecewise-polynomial approximation are first analyzed, providing theoretical upperbounds for the spurious-free dynamic range (SFDR), the maximum absolute error, and the signal-to-noise ratio. A novel approach to evaluate, with reduced computational effort, the near optimal fixed-point coefficients which maximize the SFDR is described. Several piecewise-linear and quadratic DDFS are implemented in the paper by using novel, single-summation-tree architectures. The tradeoff between ROM and arithmetic circuits complexity is discussed, pointing out that a sensible silicon area reduction can be achieved by increasing ROM size and reducing arithmetic circuitry. The use of fixed-width arithmetic can be combined with the single-summation-tree approach to further increase performances. It is shown that piecewise-quadratic DDFSs become effective against piecewise-linear designs for an SFDR higher than 100 dBc. Third-order DDFSs are expected to give advantages for an SFDR higher than 180 dBc. The DDFS circuits proposed in this paper compare favorably with previously proposed approaches.  相似文献   

19.
基于分布式算法的色彩空间转换的硬件实现   总被引:1,自引:0,他引:1  
张娜  郭炜 《信息技术》2007,31(3):26-28,122
用硬件实现色彩空间转换有多种方法.现比较了查表法、直接硬件运算法和分布式算法,并对分布式算法提出改进,进而使其硬件实现的面积减小约50%,速度进一步提高.详细介绍了改进的分布式算法及硬件实现方法,给出了相应的硬件面积、延时、画面的质量等结果.  相似文献   

20.
This paper illustrates the relationship between clock and data rates for a variety of generic processor architectures. A significant result for signal processing is that if memory and arithmetic chips operate at the same clock rate, processor performance is constrained by the rate at which memory may be accessed. The paper presents a case study of the implementation of a typical generic signal processor, the AOSP Macro Function Signal Processor using Very High Speed Integrated Circuit (VHSIC) technology. It is shown that VHSIC technology, with high speeds and high levels of integration is well suited to the implementation of this generic signal processor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号