首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
在由通用RISC处理器核和附加定点硬件加速器构成的定点SoC(System-on-Chip)芯片体系架构基础上,提出了一种新颖的基于统计分析的定点硬件加速器字长设计方法。该方法利用统计参数在数学层面上求解计算出满足不同信噪比要求下的最小字长,能有效地降低芯片面积、功耗和制作成本,从而在没有DSP协处理器的低成本RISC处理器核SoC芯片上运行高计算复杂度应用。  相似文献   

2.
最佳精度定点运算的FPGA实现   总被引:1,自引:0,他引:1  
邵正芬 《通信技术》2009,42(7):279-281
目前,多数通用的FPGA芯片仅支持整数和标准逻辑矢量的运算。而整数运算的数值表示的范围小、精度低,一般不能满足数字滤波器及数字控制器的计算精度要求,因此使得FPGA实现的高速数值计算、数值分析和信号处理等方面的应用受到了限制。为改善FPGA在数字信号处理方面的适应性问题,文中研究了如何用硬件描述语言来实现最佳精度定点数的数值运算算法,其中重点阐述了定点数的表示、定标、保持最佳精度的定点数运算法则以及如何用VHDL语言实现宽位最佳精度的定点加法器和乘法器,并扩展到定点减法器和除法器。  相似文献   

3.
基于高速DSP的红外图像处理电路研究   总被引:15,自引:5,他引:10  
介绍了一种基于高性能定点数字信号处理器(DSP)TMS320C6201的图像处理系统,该系统主要包括驱动电路模块、A/D和D/A转换模块、数字信号处理模块和视频显示模块。由于该系统采用了高性能、低功耗的处理芯片,使整个系统具有实时性高、性能稳定、精度高、体积小、功耗低等优点,在红外热成像系统中有着广泛的应用前景。  相似文献   

4.
介绍了一种基于高性能定点数字信号处理器(DSP) TMS320C6201 的图像处理系统,该 系统主要包括驱动电路模块、A/ D 和D/ A 转换模块、数字信号处理模块和视频显示模块。由于该系统采用了高性能、低功耗的处理芯片,使整个系统具有实时性高、性能稳定、精度高、体积小、功耗低等优点,在红外热成像系统中有着广泛的应用前景。  相似文献   

5.
设计一种用于雷达波生命探测仪的信号处理系统。该系统主要由TI公司高性能浮点DSP芯片TMS320C6711B和AD公司16位A/D转换芯片AD7707构成,具有体积小、功耗低、性能强、处理快、实时多窗口显示等优点。给出了系统的硬件电路构成和软件设计,并介绍了生命参数信号的采集处理流程及人体状态、数量等识别显示的综合算法流程。  相似文献   

6.
过去的十年里,无线基站设计者们在努力降低成本、功耗和占板空间方面已经取得了巨大进展。对于这些设计者来说,3G基站开发的目的非常明确,以十分之一的成本实现十倍的带宽。处理基带算法所需的处理能力随着新的无线协议的出现正在不断增加。传统数字信号处理器(DSP)的速度无法实现基带处理,因此需要硬件加速来补充DSP。一个典型架构可能由一系列的数字信号处理器和基带卡上的硬件加速器模块构成,这里需要多通道处理。  相似文献   

7.
综合考虑助听器体积小,功耗低等特点,本文设计了一种基于嵌入式系统的实时语音处理算法的开发平台。该平台的硬件核心主要由CortexA8嵌入式处理芯片和FPGA芯片综合构成,共包含四个关键模块:音频输入模块、内部时钟模块、FPGA控制模块和信号处理模块。为了提高系统的处理效率,系统设计了基于FPGA的多路语音处理转换模块。为验证平台性能,本文设计并实现了基于维纳滤波的助听器语音增强算法,并进行了主观测试,实验效果良好。  相似文献   

8.
对ADI(Analog Device)公司新一代定浮点兼容数字信号处理芯片ADSP-TS101S TigerSHARC和TI(Texas Instruments)公司的高性能定点数字信号处理芯片TMS320C6416的性能进行了对比和分析,总结了二者的功能结构、特点以及使用范围。  相似文献   

9.
在选择数字信号处理器时,恰当地选用数据格式是设计决策的关键之一,因为对于浮点数据类型和定点数据类型要提供不同类型的支撑硬件。数字信号处理采用的数据格式确定了它处理不同精度、动态范围和信噪比的信号的能力。在考虑应用中表示数据的正确格式时,对其易用性和进入市场的速度加以考虑也是同等重要的。  相似文献   

10.
本文介绍了TMS320C542芯片进行500KVOMU (Optical Metering Unit)中数字信号处理的新方案,并对该方案的定点DSP系统的硬件设计、外围芯片设计、OMU系统软件算法作了详细讲述。最后给出了整个OMU系统的整体性能和测试结论。  相似文献   

11.
浮点乘法器是高动态范围(HDR)图像处理、无线通信等系统中的关键运算单元,其相比于定点乘法器动态范围更广,但复杂度更高。近似计算作为一种新兴范式,在受限的精度损失范围内,可大幅降低硬件资源和功耗开销。该文提出一种16 bit半精度近似浮点乘法器(App-Fp-Mul),针对浮点乘法器中的尾数乘法模块,根据其部分积阵列中出现1的概率,提出一种对输入顺序不敏感的近似4-2压缩器及低位或门压缩方法,在精度损失较小的条件下有效降低了浮点乘法器资源及功耗。相较于精确设计,所提近似浮点乘法器在归一化平均错误距离(NMED)为0.0014时,面积及功耗延时积方面分别降低20%及58%;相较于现有近似设计,在近似位宽相同时具有更高的精度及更小的功耗延时积。最后将该文所提近似浮点乘法器应用于高动态范围图像处理,相比现有主流方案,峰值信噪比和结构相似性分别达到83.16 dB 和 99.9989%,取得了显著的提升。  相似文献   

12.
苏丽 《电子科技》2013,26(5):71-73
无论在雷达系统还是在通信系统当中,对其各种信号处理方法进行仿真时,数据是以浮点形式参与运算,当把算法移植到硬件中实现时,数据是以固定长度的二进制形式参与运算。文中介绍如何利用Matlab定点工具箱实现数据的浮点到定点转换,并结合设计实例,阐明了定点仿真在FPGA验证平台中的应用。实践证明,进行定点仿真是FPGA实现的前提,同时还可以验证FPGA中运算结果的正确性。  相似文献   

13.
高吞吐浮点可灵活重构的快速傅里叶变换(FFT)处理器可满足尖端雷达实时成像和高精度科学计算等多种应用需求。与定点FFT相比,浮点运算复杂度更高,使得浮点型FFT的运算吞吐率与其实现面积、功耗之间的矛盾问题尤为突出。鉴于此,为降低运算复杂度,首先将大点数FFT分解成若干个小点数基2k 级联子级实现,提出分别针对128/256/512/1024/2048点FFT的优化混合基算法。同时,结合所提出同时支持单通道单精度和双通道半精度两种浮点模式的新型融合加减与点乘运算单元,首次提出一款高吞吐率双模浮点可变点FFT处理器结构,并在28 nm标准CMOS工艺下进行设计并实现。实验结果表明,单通道单精度和双通道半精度浮点两种模式下的运算吞吐率和输出平均信号量化噪声比分别为3.478 GSample/s, 135 dB和6.957 GSample/s, 60 dB。归一化吞吐率面积比相比于现有其他浮点FFT实现可提高约12倍。  相似文献   

14.
The Extended Kalman Filter (EKF) computation is a core task for the simultaneous localization and mapping (SLAM) problem in autonomous mobile robots. The SLAM problem involves operations over high dimension data sets, requiring high throughput and performance, given the real-time nature of the robotics, control-decision algorithm this task is a part of. The lightweight and power restricted computing environments in mobile robotics requires customized processing systems such as Field-Programmable Gate Arrays (FPGAs). This work presents an arithmetic precision analysis and a Faddeev algorithm to calculate the Schur’s Complement hardware architecture implementation for the EKF-SLAM using a Systolic Array (SA). While it is widely believed that fixed-point implementations of arithmetic operations lead to area and performance benefits on FPGAs, the results in this article reveal that each Processing Element (PE) in the SA consumes 25% more logic and about 30% more register resources for the fixed-point 13.23 representation than if using the IEEE-754 single precision floating-point format. In addition, for FPGA devices with hardware support for key components of floating-point computations, a single PE floating-point implementation can achieve a maximum frequency up to 50% higher than a corresponding fixed-point implementation for the same relative numeric errors.  相似文献   

15.
A new model for predicting truncation error variance in fixed-point filter implementations is introduced. The proposed model is shown to be more accurate than existing models, particularly for some direct hardware implementations. In addition, some comments are made on the applicability of existing error models  相似文献   

16.
A 36 mm/sup 2/ graphics processor with fixed-point programmable vertex shader is designed and implemented for portable two-dimensional (2-D) and three-dimensional (3-D) graphics applications. The graphics processor contains an ARM-10 compatible 32-bit RISC processor,a 128-bit programmable fixed-point single-instruction-multiple-data (SIMD)vertex shader, a low-power rendering engine, and a programmable frequency synthesizer (PFS). Different from conventional graphics hardware, the proposed graphics processor implements ARM-10 co-processor architecture with dual operations so that user-programmable vertex shading is possible for advanced graphics algorithms and various streaming multimedia processing in mobile applications. The circuits and architecture of the graphics processor are optimized for fixed-point operations and achieve the low power consumption with help of instruction-level power management of the vertex shader and pixel-level clock gating of the rendering engine. The PFS with a fully balanced voltage-controlled oscillator (VCO) controls the clock frequency from 8 MHz to 271 MHz continuously and adaptively for low-power modes by software. The chip shows 50 Mvertices/s and 200 Mtexels/s peak graphics performance, dissipating 155 mW in 0.18-/spl mu/m 6-metal standard CMOS logic process.  相似文献   

17.
This paper presents a novel modified Coordinate Rotation Digital Computer (CORDIC) architecture that computes values of sine and cosine in a single cycle. The proposed method utilises angle-recoding technique to design a modified CORDIC algorithm. Multiple iterations are merged in the modified algorithm using memory storage for initial iterations and employing inverse recoding to generate constant multiplication factors for the remaining iterations. Scale factor of the algorithm remains constant, as these factors are independent of intermediate directions of rotation. In addition, the architecture is mapped onto a single CORDIC computation element that requires only a single cycle to compute the result. These multiplications are implemented using dedicated hardware multipliers in Field Programmable Gate Arrays and customised fixed-point multiplication techniques for Application Specific Integrated Circuits. Implementation results show that the proposed IS-CORDIC architecture is 7.9 times more efficient than basic CORDIC and has reduced area-delay product than current state of the art implementations.  相似文献   

18.
The efficient hardware implementation of signal processing algorithms requires a rigid characterization of the interdependencies between system parameters and hardware costs. Pure software simulation of bit-true implementations of algorithms with high computational complexity is prohibitive because of the excessive runtime. Therefore, we present a field-programmable gate array (FPGA) based hybrid hardware-in-the-loop design space exploration (DSE) framework combining high-level tools (e.g. MATLAB, C++) with a System-on-Chip (SoC) template mapped on FPGA-based emulation systems. This combination significantly accelerates the design process and characterization of highly optimized hardware modules. Furthermore, the approach helps to quantify the interdependencies between system parameters and hardware costs. The achievable emulation speedup using bit-true hardware modules is a key enabling the optimization of complex signal processing systems using Monte Carlo approaches which are infeasible for pure software simulation due to the large required stimuli sets. The framework supports a divide-and-conquer approach through a flexible partitioning of complex algorithms across the system resources on different layers of abstraction. This facilitates to efficiently split the design process among different teams. The presented framework comprises a generic state of the art SoC infrastructure template, a transparent communication layer including MATLAB and hardware interfaces, module wrappers and DSE facilities. The hardware template is synthesizable for a variety of FPGA-based platforms. Implementation and DSE results for two case studies from the different application fields of synthetic aperture radar image processing and interference alignment in communication systems are presented.  相似文献   

19.
Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area.  相似文献   

20.
This paper proposes a hardware–software (HW-SW) co-simulation framework that provides a unified system-level power estimation platform for analyzing efficiently both the total power consumption of the target SoC and the power profiles of its individual components. The proposed approach employs the trace-based technique that reflects the real-time behavior of the target SoC by applying various operation scenarios to the high-level model of target SoC. The trace data together with corresponding look-up table (LUT) is utilized for the power analysis. The trace data is also used to reduce the number of input vectors required to analyze the power consumption of large H/W designs through the trade-offs between the signal probability in the trace results and its effect on the power consumption. The effect of cache miss on power, occurring in the S/W program execution, is also considered in the proposed framework. The performance of the proposed approach was evaluated through the case study using the SoC design example of IEEE 802.11a wireless LAN modem. The case study illustrated that, by providing fast and accurate power analysis results, the proposed approach can enable SoC designers to manage the power consumption effectively through the reconstruction of the target SoC. The proposed framework maps all hardware IPs into FPGA. The trace based approach gets input vectors at transactor of the each IP and gets power consumption indexing a LUT. This hardware oriented technique reports the power estimation result faster than the conventional ones doing it at S/W level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号