首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A design technique based on a combination of Common Sub-Expression Elimination and Bit-Slice (CSE-BitSlice) arithmetic for hardware and performance optimization of multiplier designs with variable operands is presented in this paper. The CSE-BitSlice technique can be extended to hardware optimization of multiplier circuits operating on vectors or matrices of variables. The CSE-BitSlice technique has been applied to the design and implementation of 12 × 12 and 42 × 42 bit real multipliers, a complex multiplier, a 6-tap FIR filter, and a 5-point DFT circuit. For comparison purposes, circuit implementations of the same arithmetic and DSP functions have been carried out using Radix-4 Booth and CSA algorithms. Simulation results based on implementations using the Xilinx FPGA 5VLX330FF1760-2 device shows that the circuits based on the CSE-BitSlice techniques require fewer logic resources and yield higher throughput as compared to the CSA and Radix-4 Booth based circuits.  相似文献   

2.
High efficiency video coding (HEVC) transform algorithm for residual coding uses 2-dimensional (2D) 4×4 transforms with higher precision than H.264's 4×4 transforms, resulting in increased hardware complexity. In this paper, we present a shared architecture that can compute the 4×4 forward discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) of HEVC using a new mapping scheme in the video processor array structure. The architecture is implemented with only adders and shifts to an area-efficient design. The proposed architecture is synthesized using ISE14.7 and implemented using the BEE4 platform with the Virtex-6 FF1759 LX550T field programmable gate array (FPGA). The result shows that the video processor array structure achieves a maximum operation frequency of 165.2 MHz. The architecture and its implementation are presented in this paper to demonstrate its programmable and high performance.  相似文献   

3.
The H.264/AVC video coding standard features diverse computational hot spots that need to be accelerated to cope with the significantly increased complexity compared to previous standards. In this paper, we propose an optimized application structure (i.e. the arrangement of functional components of an application determining the data flow properties) for the H.264 encoder which is suitable for application-specific and reconfigurable hardware platforms. Our proposed application structural optimization for the computational reduction of the Motion Compensated Interpolation is independent of the actual hardware platform that is used for execution. For a MIPS processor we achieve an average speedup of approximately 60× for Motion Compensated Interpolation. Our proposed application structure reduces the overhead for Reconfigurable Platforms by distributing the actual hardware requirements amongst the functional blocks. This increases the amount of available reconfigurable hardware per Special Instruction (within a functional block) which leads to a 2.84× performance improvement of the complete encoder when compared to a Benchmark Application with standard optimizations. We evaluate our application structure by means of four different hardware platforms.  相似文献   

4.
Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from super-linear increases in computational time with increasing data size and number of signals being processed (data dimension). Certain principal machine-learning algorithms are commonly found embedded in larger detection, estimation, or classification operations. Three such principal algorithms are the Parzen window-based, non-parametric estimation of Probability Density Functions (PDFs), K-means clustering and correlation. Because they form an integral part of numerous machine-learning applications, fast and efficient execution of these algorithms is extremely desirable. FPGA-based reconfigurable computing (RC) has been successfully used to accelerate computationally intensive problems in a wide variety of scientific domains to achieve speedup over traditional software implementations. However, this potential benefit is quite often not fully realized because creating efficient FPGA designs is generally carried out in a laborious, case-specific manner requiring a great amount of redundant time and effort. In this paper, an approach using pattern-based decomposition for algorithm acceleration on FPGAs is proposed that offers significant increases in productivity via design reusability. Using this approach, we design, analyze, and implement a multi-dimensional PDF estimation algorithm using Gaussian kernels on FPGAs. First, the algorithm’s amenability to a hardware paradigm and expected speedups are predicted. After implementation, actual speedup and performance metrics are compared to the predictions, showing speedup on the order of 20× over a 3.2 GHz processor. Multi-core architectures are developed to further improve performance by scaling the design. Portability of the hardware design across multiple FPGA platforms is also analyzed. After implementing the PDF algorithm, the value of pattern-based decomposition to support reuse is demonstrated by rapid development of the K-means and correlation algorithms.  相似文献   

5.
Wave pipelining is a design methodology that can increase the clock frequency of digital systems. Also known asmaximum-rate pipelining, it has long been considered a technique for approaching the physical speed limit of a digital circuit. Unlike conventional pipelining, wave pipelining does not require internal clocked elements to increase throughput. The synchronization of internal computations is achieved by balancing inherent RC delays of combinational logic elements, thus allowing circuits to be pipelined at a very fine-grain level. In this article, we describe the design of a 16×16 wave-pipelined multiplier using a 1.0 μm CMOS process. The multiplier is designed using a conventional static CMOS technology. Simulation results show a speedup of about 7× over a nonpipeline implementation.  相似文献   

6.
This paper presents a method for functional verification of HDL models of digital circuits. The method is based on a co-operation between a simulator and an emulator and utilizes the advantages of both simulation-based and emulation-based verification to form a fast co-verification approach. This is done by verifying the intensive time-consuming part of the circuit in the emulator and the non-synthesizable part as well as the part of the circuit that needs intensive redesign process during the early steps of the design phase in the simulator. To demonstrate the co-verification approach, a tool was developed, which supports Verilog, VHDL, and mixed Verilog-VHDL models. Three benchmarks including a simple 32-bit processor (DP32), a 16-bit arithmetic RISC processor, and a 256-point FFT unit were used in the experiments. The experimental results show that the co-verification approach gives up to 15,000 times speedup for gate-level and up to 100 times speedup for RTL abstractions as compared with the simulation-based verification. Finally, an analytical study on the speedups of the co-verification approach is also presented, which supports the experimental speedups results.  相似文献   

7.
A new resource efficient FPGA-based hardware architecture for real-time edge detection using Sobel operator for video surveillance applications has been proposed. The choice of Sobel operator is due to its property to counteract the noise sensitivity of the simple gradient operator. FPGA is chosen for this implementation due to its flexibility to provide the possibility to perform algorithmic changes in later stage of the system development and its capability to provide real-time performance, hard to achieve with general purpose processor or digital signal processor, while limiting the extensive design work, time and cost required for application specific integrated circuit. The proposed architecture uses single processing element for both horizontal and vertical gradient computation for Sobel operator and utilised approximately 38% less FPGA resources as compared to standard Sobel edge detection architecture while maintaining real-time frame rates for high definition videos (1920 × 1080 image sizes). The complete system is implemented on Xilinx ML510 (Virtex-5 FX130T) FPGA board.  相似文献   

8.
9.
This study presents a design of two-dimensional (2D) discrete cosine transform (DCT) hardware architecture dedicated for High Efficiency Video Coding (HEVC) in field programmable gate array (FPGA) platforms. The proposed methodology efficiently proceeds 2D-DCT computation to fit internal components and characteristics of FPGA resources. A four-stage circuit architecture is developed to implement the proposed methodology. This architecture supports variable size of DCT computation, including 4 × 4, 8 × 8, 16 × 16, and 32 × 32. The proposed architecture has been implemented in System Verilog and synthesized in various FPGA platforms. Compared with existing related works in literature, this proposed architecture demonstrates significant advantages in hardware cost and performance improvement. The proposed architecture is able to sustain 4 K@30 fps ultra high definition (UHD) TV real-time encoding applications with a reduction of 31–64% in hardware cost.  相似文献   

10.
In this paper, we propose a uniform quantization likelihood evaluation (UQLE) algorithm for particle filters (PFs). This algorithm simplifies the exact likelihood evaluation (ELE) algorithm, the most computationally demanding function in PFs, by using a uniform quantization scheme to generate approximated weights. Simulation results indicate that PFs using UQLE can achieve comparable or better accuracy than PFs using ELE. The software implementation of UQLE for the bearing-only tracking (BOT) model in fixed-point arithmetic with 32 quantized intervals achieves 39.5× average speedup over the software implementation of ELE. An Application-specific Instruction-set Processor instruction was designed to accelerate the UQLE algorithm in a hardware implementation. The custom instruction implementation of UQLE for the BOT model with 32 intervals achieves 23.0× average speedup over the software implementation on a general-purpose processor with 5 % additional gates.  相似文献   

11.
王丽 《现代电子技术》2011,(23):20-22,26
介绍了基于DSP和FPGA的末制导雷达信号处理机分选软硬件的设计及实现,该系统以一片DSP为主处理器,配合FPGA及其他外围电路用于实现雷达信号的分选跟踪。由于采用DSP+FPGA,不仅很好地实现了实时性,而且系统集成度高,可靠性好,易于修改,使用灵活,因此具有较强的实用价值和参考价值。  相似文献   

12.
介绍高频电磁场阻垢技术原理,利用可编程芯片FPGA和SOPC嵌入式技术,开发带触摸屏高频阻垢水处理器系统,包括高频振荡信号的产生和大功率功放电路设计,带彩色触摸屏的人机交互界面及各种控制功能,给出系统的硬、软件设计,实现了系统智能化设计和实时监控功能。  相似文献   

13.
Malicious modification of hardware in untrusted fabrication facilities, referred to as hardware Trojan, has emerged as a major security concern. Comprehensive detection of these Trojans during post-manufacturing test has been shown to be extremely difficult. Hence, it is important to develop design techniques that provide effective countermeasures against hardware Trojans by either preventing Trojan attacks or facilitating detection during test. Obfuscation is a technique that is conventionally employed to prevent piracy of software and hardware intellectual property (IP). In this work, we propose a novel application of key-based circuit structure and functionality obfuscation to achieve protection against hardware Trojans triggered by rare internal circuit conditions. The proposed obfuscation scheme is based on judicious modification of the state transition function, which creates two distinct functional modes: normal and obfuscated. A circuit transitions from the obfuscated to the normal mode only upon application of a specific input sequence, which defines the key. We show that it provides security against Trojan attacks in two ways: (1) it makes some inserted Trojans benign, i.e. they become effective only in the obfuscated mode; and (2) it prevents an adversary from exploiting the true rare events in a circuit to insert hard-to-detect Trojans. The proposed design methodology can thus achieve simultaneous protection from hardware Trojans and hardware IP piracy. Besides protecting ICs against Trojan attacks in foundry, we show that it can also protect against malicious modifications by untrusted computer-aided design (CAD) tools in both SoC and FPGA design flows. Simulation results for a set of benchmark circuits show that the scheme is capable of achieving high levels of security against Trojan attacks at modest area, power and delay overhead.  相似文献   

14.
针对目前PC算法无法实现图像实时处理以及固定硬件平台很难实现算法修改或者升级的问题,设计一种基于SOPC可重构的图像采集与处理系统,实现了图像数据的片上实时处理以及在不改变硬件电路结构而完成算法修改或者升级的功能。此系统围绕两块Xilinx FPGA芯片进行设计,通过FPGA以及其Microblaze 32 bit软核处理器和相关接口模块实现硬件电路设计,结合FPGA开发环境ISE工具和EDK工具协作完成软件设计。由于采用SOPC技术和可重构技术,此设计具有设计灵活、处理速度快和算法可灵活升级等特点。  相似文献   

15.
计算机故障诊断系统是诊断计算机硬件故障的一个强有力的工具。本系统基于PCI总线传输原理,以FPGA作为控制核心。通过在FPGA上编程搭建信号采集与处理电路,根据计算机自身的加电自检原理.采用VHDL硬件描述语言在FPGA上设计硬件电路实现数据存储,分析,提取等一系列处理,通过51单片机控制液晶实现计算机硬件故障的汉字。当计算机主板上的硬件发生故障时,能迅速诊断出故障发生的部件,提高计算机维修效率。  相似文献   

16.
AFPGA芯片中嵌入处理器的硬核或软核,构成片上可壕程系统(SoPC)。对于专门的处理器体系结构,为了能够在源代码级别上对操作系统进行定制,以提供实时服务,一般采用将Linux内核进行剪裁并移植的方法。本文给出了在Xilinx Virtex4的PowerPC硬核环境下移植Linux内核的过程,并通过Vjrtex ML403开发板进行原型验证,以展示操作系统内按移植的整体思路以及各环节的关键步骤。  相似文献   

17.
传统“数字电路”课程以讲授中小规模电路为主,与产业和技术应用趋势严重脱节。本文以FPGA设计技术和硬件描述语言为基础,按照数字电路到复杂数字系统的顺序,从易到难地重新设计了教学内容,重点培养学生利用FPGA设计数字电路和数字系统的能力。在理论教学改革的同时,配套的实验课程也采用FPGA平台开展教学。课堂实践证明,改革后的新课程“数字电路与FPGA设计”能够满足应用型本科院校的人才培养需要。  相似文献   

18.
Reconfigurable hybrid processor systems provide a flexible platform for mapping data-parallel applications, while providing considerable speedup over software implementations. However, the overhead for reconfiguration presents a significant deterrent in mapping applications onto reconfigurable hardware. Partial runtime reconfiguration is one approach to reduce the reconfiguration overhead. In this paper, we present a methodology to map data-parallel tasks onto hardware that supports partial reconfiguration. The aim is to obtain the maximum possible speedup, for a given reconfiguration time, bus speed, and computation speed. The proposed approach involves using multiple, identical but independent processing units in the reconfigurable hardware. Under nonzero reconfiguration overhead, we show that there exists an upper limit on the number of processing units that can be employed beyond which further reduction in execution time is not possible. We obtain solutions for the minimum processing time, the corresponding load distribution, and schedule for data transfer. To demonstrate the applicability of the analysis, we present the following: 1) various plots showing the variation of processing time with different parameters; 2) hardware simulations for two examples, viz., 1-D discrete wavelet transform and finite impulse response filter, targeted to Xilinx field-programmable gate arrays (FPGAs); and 3) experimental results for a hardware prototype implemented on a FPGA board  相似文献   

19.
This paper describes the acceleration of an infrared automatic target recognition (IR ATR) application with a co-processor board that contains multiple field programmable gate array (FPGA) chips. Template and pixel level parallelism is exploited in an FPGA design for the bottleneck portion of the application. The implementation of this design achieved a speedup of 21 compared to running on the host processor. The paper then describes an FPGA resource manager (RM) developed to support concurrent applications sharing the FPGA board. With the RM, the system is dynamically reconfigurable. That is, while part of the co-processor board is busy computing, another part can be reconfigured for other purposes. The IR ATR application was ported to work with the RM and has been shown to adapt to the amount of reconfigurable hardware that is available at the time the application is executed.  相似文献   

20.
介绍了数字逻辑分析仪工作原理,分析了硬件电路的各部分组成及功能。讨论了在普通示波器上实现多路数字逻辑信号量化显示电路设计的方案。在研究FPGA设计电路方法和特点的基础之上,给出了硬件实现的VHDL程序,并进行了时序仿真,表明了FPGA技术在电路设计方面比传统方法有较强的优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号