期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Altera公司《电子产品世界》2008,(1):103-106

引言目前应用软件的需求已经远远超出了传统处理器的能力所及.一种解决方法是通过硬件加速,采用专用协处理器来提升处理性能.FPGA作为协处理器设计的基础,在价格、性能、易用性以及功耗方面有明显的优势. 相似文献

2.

A Parametrizable High-Level Synthesis Library for Accelerating Neural Networks on FPGAs

Kalms Lester Rad Pedram Amini Ali Muhammad Iskander Arsany Göhringer Diana 《Journal of Signal Processing Systems》2021,93(5):513-529

Journal of Signal Processing Systems - In recent years, Convolutional Neural Network CNN have been incorporated in a large number of applications, including multimedia retrieval and image... 相似文献

3.

采用内置于FPGA中的软核CPU加速嵌入式系统设计

程光尧《世界电子元器件》2005,(4):87-90

当前,嵌入式系统开发工程师面临的挑战有几个方面：一是要找到满足系统应用要求的处理器,同时兼顾降低产品成本,并达到一定的系统性能。二是要能够提供一个比较长的产品周期,以使客户不用担心会产品的过旧或停产等问题。相似文献

4.

不需要特征值分解的几种幂迭代算法研究

敖金莲吴长奇刘欣彤《无线电通信技术》2010,36(5):26-28

针对MUSIC算法中协方差矩阵特征值分解运算量大,难以在嵌入式系统中实现的问题,分析了普通幂迭代算法、逆幂迭代算法和特征值平移幂迭代算法3种幂迭代算法,比较了它们的优缺点。并且考虑到麦克风阵列采集到的声信号为宽带信号,分析了阵元间距选择的影响。并用计算机仿真来验证,在选择合适阵元间距的基础上选取特征值平移幂迭代算法可以代替特征值分解,降低了运算的复杂度。相似文献

5.

Implementation of lattice gases using FPGAs

Paul Shaw Paul Cockshott Peter Barrie 《The Journal of VLSI Signal Processing》1996,12(1):51-66

Lattice gas models have been widely studied over the last decade due to their simplicity and scope for parallelism. Standard parallel computers based on the stored-program paradigm can run such models quickly but are expensive. We report here a new approach based on reconfigurable logic circuits. A circuit is constructed to realize the behaviour of the model. The suitability of this method is demonstrated by modelling sound propagation in a lattice gas. For this application it is shown that supercomputer performance can be achieved at a fraction of supercomputer cost. 相似文献

6.

面向密码算法的大位宽比特置换操作高速实现方案

戴紫彬马超李伟南龙梅《电子与信息学报》2017,39(9):2119-2126

针对面向字级优化的通用处理器,在应对密码算法中大位宽比特置换操作时效率较低的问题,该文提出2N-2N和kN-kN(k2)的大位宽比特置换操作高速实现方案。并针对方案中涉及的比特提取和比特提取-移位两种操作,分别提出专用扩展指令BEX, BEX-ROT。在此基础上,对专用指令硬件架构的高效设计进行研究,提出一种基于Inverse Butterfly网络统一硬件架构-RERS(Reconfigurable Extract and Rotation Shifter)及相应可重构路由算法,以最大限度地共享硬件资源,减小电路面积。实验结果表明,所提方案能够将处理器架构执行大位宽比特置换操作的指令条数缩减约10倍,大幅提升其处理效率。同时,由专用指令所带来的硬件资源开销和延迟开销均较低,不会影响到原架构正常工作频率。相似文献

7.

采用FPGA实现视频和图像处理设计

Brian J.Jentz 《今日电子》2008,(10)

视频和图像处理发展趋势以视频和图像处理为核心的HDTV和数字影院等创新技术的进展非常迅速,其推动力量在于图像采集和显示分辨率、高级压缩方法以及视频智能的跨越式发展. 相似文献

8.

MIMO-OFDM系统中基于QR分解的检测算法研究

覃博林云《山西电子技术》2012,(5):53-54,69

重点介绍了MIMO-OFDM系统中基于QR分解的几种信号检测算法,分析了各种算法的优缺点;指出信号检测顺序是降低误差传播的关键。基于改进的Gram-Schmidt正交排序QR检测算法用迭代运算代替矩阵求逆运算,有效地改进了传统算法的缺点,降低了计算量,使系统在复杂度和性能之间取得了良好的折中,并在最后对该算法与MMSE准则联合的算法进行了介绍。相似文献

9.

A programmable image processing system using FPGAs

S. C. CHAN H. O. NGAI K. L. HO 《International Journal of Electronics》2013,100(4):725-730

Real-time image processing usually requires an enormous throughput rate and a huge number of operations. Parallel processing, in the form of specialized hardware, or multiprocessing are therefore indispensable. This piper describes a flexible programmable image processing system using the field programmable gate array (FPGA). The logic cell nature of currently available FPGA is most suitable for performing real-time bit-level image processing operations using the bit-level systolic concept. Here, we propose a novel architecture, the programmable image processing system (PIPS), for the integration of these programmable hardware and digital signal processors (DSPs) to handle the bit-level as well as the arithmetic operations found in many image processing applications. The versatility of the system is demonstrated by the implementation of a 1-D median filter. 相似文献

10.

Reconstruction and Decomposition Algorithms for Biorthogonal Multiwavelets 总被引：17，自引：0，他引：17

Micchelli Charles A. Xu Yuesheng 《Multidimensional Systems and Signal Processing》1997,8(1-2):31-69

We construct biorthogonal multiwavelets (abbreviated to wavelets) in a weighted Hilbert space L ²(E,) where E is a compact subset in . A recursive formula for biorthogonal wavelet construction is presented. The construction of the initial wavelets is reformulated as the solution of a certain matrix completion problem. A general solution of the matrix completion problem is identified and expressed conveniently in terms of any given particular solution. Several particular solutions are proposed. Reconstruction and decomposition algorithms are developed for the biorthogonal wavelets. Special results for the univariate case E=[0,1] are obtained. 相似文献

11.

Decomposition Algorithms for Solving NP-hard Problems on a Quantum Annealer

Pelofske Elijah Hahn Georg Djidjev Hristo 《Journal of Signal Processing Systems》2021,93(4):405-420

NP-hard problems such as the maximum clique or minimum vertex cover problems, two of Karp’s 21 NP-hard problems, have several applications in computational chemistry, biochemistry and computer network security. Adiabatic quantum annealers can search for the optimum value of such NP-hard optimization problems, given the problem can be embedded on their hardware. However, this is often not possible due to certain limitations of the hardware connectivity structure of the annealer. This paper studies a general framework for a decomposition algorithm for NP-hard graph problems aiming to identify an optimal set of vertices. Our generic algorithm allows us to recursively divide an instance until the generated subproblems can be embedded on the quantum annealer hardware and subsequently solved. The framework is applied to the maximum clique and minimum vertex cover problems, and we propose several pruning and reduction techniques to speed up the recursive decomposition. The performance of both algorithms is assessed in a detailed simulation study.

相似文献

12.

Narrow-band FIR filtering with FPGAs using sigma-delta modulation encoding

Chris Dick Fred Harris 《The Journal of VLSI Signal Processing》1996,14(3):265-282

This paper addresses the problem of implementing narrow-band FIR filters using FPGAs. Rather than employing a conventional multiply-accumulate unit to compute the inner-product, an alternative method based on re-quantization of the input data stream using a sigma-delta modulator is presented. The re-quantization process preserves the dynamic range of the signal components contained in the bandwidth of the filter, while shifting the re-quantization noise to the spectral region to be rejected by the filter. The reduced bit length representation of the re-quantized input data samples removes the requirment for a full multiplier in the filter hardware. This makes the method very attractive for realization using FPGA technology. The filtering technique is described and implementation results using a Xilinx XC4010 FPGA are presented. A 200-tap filter implemented in a single FPGA achieves a computation rate of 415 MOPS and has a memory bandwidth of 1.66 Gbytes/s. An extension of the method using a quadrature re-quantizer and filter is also presented. 相似文献

13.

基于区域分解的大规模并行有限元快速算法

下载免费PDF全文

王卫杰陈晓洁周海京《电子学报》2019,47(3):741-747

区域分解方法是近来发展迅速的有限元求解方法之一.基于有限元区域分解方法以及多重网格的思想,我们研究了自适应求解以及离散扫频快速算法,并采用自主研发的高性能计算并行框架,将基于区域分解的大规模并行有限元快速算法进行了实现,并行规模能够扩展到数万CPU 核.我们在文中将展示程序的核心架构,以及如何采用多重网格算法的思想实现有效的粗网格校正技术,从而实现有限元线性系统的多次快速求解,加速自适应求解和离散扫频.最后,对算法进行了准确性验证以及大规模并行测试. 相似文献

14.

Implementation of sphere decoder for MIMO-OFDM on FPGAs using high-level synthesis tools

Juanjo Noguera Stephen Neuendorffer Sven Van Haastregt Jesus Barba Kees Vissers Chris Dick 《Analog Integrated Circuits and Signal Processing》2011,69(2-3):119-129

相似文献

15.

Digit-Serial Complex-Number Multipliers on FPGAs

T. Sansaloni J. Valls K.K. Parhi 《The Journal of VLSI Signal Processing》2003,33(1-2):105-115

This paper presents an optimized implementation on FPGA of digit-serial Complex-Number Multipliers (CMs) using Booth recoding techniques and tree adders based on Carry Save (CS) and Ripple Carry Adders (RCA). This kind of Complex-Number multipliers can be pipelined at the same level independent of the digit-size. Variable and fixed coefficient CMs have been considered. In the first case an efficient mapping of the modified Booth recoding and the partial product generation is presented which results in a logic depth reduction. The combination of 5:3 and 4:3 converters in the CS structure and the utilization of RCA trees lead to a minimum area requirement. In the case of fixed coefficient CMs, partial products generator is based on look-up tables and multi-bit Booth recoding is used to reduce the area and increase the performance of the circuit. The study reveals that efficient mapping of the 5-bit Booth recoding to generate the partial products is the optimum multibit recoding when Xilinx FPGA devices are used. 相似文献

16.

Hardware realization of Krawtchouk transform using VHDL modeling and FPGAs

Botros N.M. Jian Yang Feinsilver P. Schott R. 《Industrial Electronics, IEEE Transactions on》2002,49(6):1306-1312

相似文献

17.

Implementation of a Communications Channelizer using FPGAs and RNS Arithmetic

Uwe Meyer-Bäse Antonio García Fred Taylor 《The Journal of VLSI Signal Processing》2001,28(1-2):115-128

Field-programmable logic (FPL), often grouped under the popular name field-programmable gate arrays (FPGA), are on the verge of revolutionizing sectors of digital signal processing (DSP) industry as programmable DSP microprocessor did nearly two decades ago. Historically, FPGAs were considered to be only a rapid prototyping and low-volume production technology. FPGAs are now attempting to move into the mainstream DSP as their density and performance envelope steadily improve. While evidence now supports the claim that FPGAs can accelerate selected low-end DSP applications (e.g., FIR filter), the technology remains limited in its ability to realize high-end DSP solutions. This is due primarily to systemic weaknesses in FPGA-facilitated arithmetic processing. It will be shown that in such cases, the residue number system (RNS) can become an enabling technology for realizing embedded high-end FPGA-centric DSP solutions. This thesis is developed in the context of a demonstrated RNS/FPGA synergy and the application of the new technology to communication signal processing. 相似文献

18.

Implementation of a Communications Channelizer using FPGAs and RNS Arithmetic

Meyer-Bäse Uwe García Antonio Taylor Fred 《Journal of Signal Processing Systems》2001,29(1-2):115-128

The emergence of mobile Internet services has introduced a set of transmission and presentation standards that aim to address multimedia. At the same time, the limited resources of cellular networks and terminals set strict requirements to the bandwidth-limited transmission and presentation. We propose a new abstract model to adapt and represent multimedia in mobile environments to meet these restrictions. The model includes a layered mapping of semantic and physical entities and is combined under the taxonomy of “multimedia adaptation” to optimize an end-to-end mobile service. Our technique is media and presentation format independent and easily absorbs additions. We conclude with an example mobile service that utilizes the proposed techniques. 相似文献

19.

基于QR分解V-BLAST检测算法研究和比较

孙艳华吴伟陵《无线电工程》2006,36(12):26-29

分层空时码是一类具有可执行解码复杂度的空时编码技术,最大似然检测在误比特率最小的意义下是最优的接收,但是其复杂度不可实现。在D.Wubben提出的基于Gram-Schmidt正交化排序QR分解检测算法的基础上,提出了另外2种可以排序的QR分解检测算法,取得了和基于Gram-Schmidt正交化QR分解算法相同的性能。与V-BLAST算法相比,避免了多次矩阵求逆的计算,以很小的性能损失为代价,降低了复杂度。相似文献

20.

Accelerated image processing on FPGAs 总被引：3，自引：0，他引：3

Draper B.A. Beveridge J.R. Bohm A.P.W. Ross C. Chawathe M. 《IEEE transactions on image processing》2003,12(12):1543-1551

The Cameron project has developed a language called single assignment C (SA-C), and a compiler for mapping image-based applications written in SA-C to field programmable gate arrays (FPGAs). The paper tests this technology by implementing several applications in SA-C and compiling them to an Annapolis Microsystems (AMS) WildStar board with a Xilinx XV2000E FPGA. The performance of these applications on the FPGA is compared to the performance of the same applications written in assembly code or C for an 800 MHz Pentium III. (Although no comparison across processors is perfect, these chips were the first of their respective classes fabricated at 0.18 microns, and are therefore of comparable ages.) We find that applications written in SA-C and compiled to FPGAs are between 8 and 800 times faster than the equivalent program run on the Pentium III. 相似文献