首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We summarize our recent state-of-the-art programmable and reconfigurable detector and QR decomposition (QRD) implementations targeting 3G long term evolution (LTE) downlink and uplink requirements. The downlink transmission is based on the orthogonal frequency division multiplexing, whereas the uplink transmission uses a single-carrier frequency-division multiple access. The downlink implementations are based on the programmable transport triggered architecture (TTA) which provides a flexible and energy efficient architecture template. In TTA detector implementation, the LTE detection rate requirements up to 20 MHz bandwidth and 4 × 4 antenna system with 64-QAM, are achieved by using 1–6 programmable cores in parallel. Each core runs at 277 MHz clock frequency and consumes 55.5–64.0 mW depending on the detector configuration. The downlink detector is based on the selective spanning with fast enumeration algorithm. The uplink field-programmable gate array (FPGA) detector implementation is targeted for 4 × 4 antenna system and 64-QAM achieving a detection rate requirement for 20 MHz bandwidth. The used FPGA board for uplink implementation is Xilinx Virtex-6 and the implementation has been carried out using Xilinx Vivado high level synthesis tool. Two different detector architectures are implemented. The first one achieves the detection rate requirement with a single processing block running at 231 MHz and the latter one with four blocks in parallel, each running at 247 MHz. The implemented detector is based on the K-best algorithm. A multiple-input multiple-output receiver requires QRD to produce valid inputs for the detector. In addition to detector implementations, QRD is also implemented on both TTA and FPGA. Modified Gram–Schmidt algorithm is used in both QRD implementations.  相似文献   

2.
Deploying protocols is an expensive and time-consuming process today. One reason is the high cost of developing, testing, and installing protocol implementations. To reduce this difficulty, protocols are developed and executed within environments called protocol subsystems, and protocol software is often ported instead of being coded from scratch. Unfortunately, today a variety of protocol subsystems offer a plethora of features, functionality, and drawbacks; the differences among them often reduce the portability and reusability of protocol code, and therefore present barriers to the deployment of new protocols. In this paper, we consider differences in subsystems and their effect on the portability and reusability of protocols and protocol implementations. We then propose two different approaches, each optimized for a different situation, that allow protocol code implemented in one subsystem to be used without modification within other subsystems, and thus reduce the barriers to protocol deployment. We relate our experiences designing, implementing, and measuring the performance of each approach using, as a baseline, an AppleTalk protocol stack we have developed  相似文献   

3.
A direct, unified approach for deriving fast multichannel QR decomposition (QRD) least squares (LS) adaptive algorithms is introduced. The starting point of the new methodology is the efficient update of the Cholesky factor of the input data correlation matrix. Using the new technique, two novel fast multichannel algorithms are developed. Both algorithms comprise scalar operations only and are based exclusively oh numerically robust orthogonal Givens rotations. The first algorithm assumes channels of equal orders and processes them all simultaneously. It is highly modular and provides enhanced pipelinability, with no increase in computational complexity, when compared with other algorithms of the same category. The second multichannel algorithm deals with the general case of channels with different number of delay elements and processes each channel separately. A modification of the algorithm leads to a scheme that can be implemented on a very regular systolic architecture. Moreover, both schemes offer substantially reduced computational complexity compared not only with the first algorithm but also with previously derived multichannel fast QRD schemes. Experimental results in two specific application setups as well as simulations in a finite precision environment are also included  相似文献   

4.
李春亭  陈泳恩 《电子学报》1999,27(10):140-142,144
传统QR分解(GIVENS旋转)中存在着平方根运算,在用数字电路实现时,会在时间和面积上花费较大的代价,因此,已有无平方根的QR分解算法出现,这些算法避免了平方根运算,但遇射为Systolic阵列以实现RLS算法时,却不能实现单元内部的流水,本文提出了一种新的无平方要QR分解,能够实现单元内部的流水,进一步提高了速度。  相似文献   

5.
New fast QR decomposition least squares adaptive algorithms   总被引:1,自引:0,他引:1  
This paper presents two new, closely related adaptive algorithms for LS system identification. The starting point for the derivation of the algorithms is the inverse Cholesky factor of the data correlation matrix, obtained via a QR decomposition (QRD). Both algorithms are of O(p) computational complexity, with p being the order of the system. The first algorithm is a fixed order QRD scheme with enhanced parallelism. The second is an order recursive lattice type algorithm based exclusively on orthogonal Givens rotations, with lower complexity compared to previously derived ones. Both algorithms are derived following a new approach, which exploits efficient the and order updates of a specific state vector quantity  相似文献   

6.
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable.  相似文献   

7.
QR decomposition techniques are well known for their good numerical behavior and low complexity. Fast QRD recursive least squares adaptive algorithms benefit from these characteristics to offer robust and fast adaptive filters. This paper examines two different versions of the fast QR algorithm based on a priori backward prediction errors as well as two other corresponding versions of the fast QR algorithm based on a posteriori backward prediction errors. The main matrix equations are presented with different versions derived from two distinct deployments of a particular matrix equation. From this study, a new algorithm is derived. The discussed algorithms are compared, and differences in computational complexity and in finite-precision behavior are shown.  相似文献   

8.
Digital designs can be mapped to different implementations using diverse approaches, with varying cost criteria. Post-processing transforms, such as transistor sizing, can significantly improve circuit performance by optimizing critical paths to meet timing specifications. However, most transistor sizing tools have high execution times, and the possible delay gains due to sizing, and the associated costs are not known prior to sizing. In this paper, we present two metrics for comparing different implementations-the minimum achievable delay and the cost of achieving a target delay-and show how these can be estimated without running a sizing tool. Using these fast and accurate performance estimators, a designer can determine the tradeoffs between multiple functionally identical implementations, and size only the selected implementation.  相似文献   

9.
The design of flexible elliptic curve cryptography processors (ECP) is considered in this paper. Novel word-level algorithms and implementations for the underlying GF(2/sup m/) multiplication and squaring arithmetic which enable improved flexibility versus performance tradeoffs, are presented and employed in the design of an efficient flexible ECP architecture; corresponding field-programmable gate-array (FPGA) prototyping results for two different processor word lengths are also included for evaluation.  相似文献   

10.
The least squares (LS) minimization problem constitutes the core of many real-time signal processing problems, such as adaptive filtering, system identification and adaptive beamforming. Recently efficient implementations of the recursive least squares (RLS) algorithm and the constrained recursive least squares (CRLS) algorithm based on the numerically stable QR decomposition (QRD) have been of great interest. Several papers have proposed modifications to the rotation algorithm that circumvent the square root operations and minimize the number of divisions that are involved in the Givens rotation. It has also been shown that all the known square root free algorithms are instances of one parametric algorithm. Recently, a square root free and division free algorithm has also been proposed. In this paper, we propose a family of square root and division free algorithms and examine its relationship with the square root free parametric family. We choose a specific instance for each one of the two parametric algorithms and make a comparative study of the systolic structures based on these two instances, as well as the standard Givens rotation. We consider the architectures for both the optimal residual computation and the optimal weight vector extraction. The dynamic range of the newly proposed algorithm for QRD-RLS optimal residual computation and the wordlength lower bounds that guarantee no overflow are presented. The numerical stability of the algorithm is also considered. A number of obscure points relevant to the realization of the QRD-RLS and the QRD-CRLS algorithms are clarified. Some systolic structures that are described in this paper are very promising, since they require less computational complexity (in various aspects) than the structures known to date and they make the VLSI implementation easier  相似文献   

11.
Multiple-input-multiple-output (MIMO) technique is often employed to increase capacity in comparing to systems with single antenna. However, the computational complexity in evaluating channel capacity or transmission rate (data rate) grows proportionally to the number of employed antennas at both ends of the wireless link. Recently, the QR decomposition (QRD) based detection schemes have emerged as a low-complexity solution. After conducting QRD on a full channel matrix that results in a triangular matrix, we claim that computational complexity can be simplified by the following ways. First, to simplify channel capacity calculation, we prove that eigenvalues of the full channel matrix multiplication equals eigenvalues of the triangular channel matrix multiplication. Second, to simplify the calculation of the optimal transmission rate constrained constellation, we propose a simplistic multiplication of the resulted simple triangular matrix and a transmitted signal vector. Then, we also propose a modified mutual information calculation (MMIC) to achieve a quite low-complexity via the divided calculation. By using computer simulation and field- programmable gate array (FPGA) implementation, simulation results show that the proposed QRD-based schemes are capable of achieving conventional performance, but at a low-complexity level.  相似文献   

12.
Algorithmic engineering provides a rigorous framework for describing and manipulating the type of building blocks commonly used to define parallel algorithms and architectures for digital signal processing. So far, the concept has only been illustrated by means of relatively simple examples relating to the use of QR decomposition (QRD) by Givens rotations for the purposes of adaptive filtering and beamforming. Two more challenging examples are presented that illustrate the use of simple diagrammatic transformations to develop novel algorithms and architectures, and demonstrate the potential power of algorithmic engineering as a formal design technique. The first example constitutes the only known derivation of a modular processing architecture for generalised sidelobe cancellation based on QR decomposition. The second provides a simple derivation of the QRD-based lattice algorithm for multichannel least-squares linear prediction  相似文献   

13.
Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from super-linear increases in computational time with increasing data size and number of signals being processed (data dimension). Certain principal machine-learning algorithms are commonly found embedded in larger detection, estimation, or classification operations. Three such principal algorithms are the Parzen window-based, non-parametric estimation of Probability Density Functions (PDFs), K-means clustering and correlation. Because they form an integral part of numerous machine-learning applications, fast and efficient execution of these algorithms is extremely desirable. FPGA-based reconfigurable computing (RC) has been successfully used to accelerate computationally intensive problems in a wide variety of scientific domains to achieve speedup over traditional software implementations. However, this potential benefit is quite often not fully realized because creating efficient FPGA designs is generally carried out in a laborious, case-specific manner requiring a great amount of redundant time and effort. In this paper, an approach using pattern-based decomposition for algorithm acceleration on FPGAs is proposed that offers significant increases in productivity via design reusability. Using this approach, we design, analyze, and implement a multi-dimensional PDF estimation algorithm using Gaussian kernels on FPGAs. First, the algorithm’s amenability to a hardware paradigm and expected speedups are predicted. After implementation, actual speedup and performance metrics are compared to the predictions, showing speedup on the order of 20× over a 3.2 GHz processor. Multi-core architectures are developed to further improve performance by scaling the design. Portability of the hardware design across multiple FPGA platforms is also analyzed. After implementing the PDF algorithm, the value of pattern-based decomposition to support reuse is demonstrated by rapid development of the K-means and correlation algorithms.  相似文献   

14.
The high-order Yule-Walker (HOYW) method of sinusoidal frequency estimation based on a singular value decomposition (SVD) is known to have excellent statistical performance. Here, we show that the SVD-based step of the HOYW method can be replaced by a computationally more convenient QR decomposition (QRD)-based step, without affecting the asymptotic properties of the frequency estimates.  相似文献   

15.
Most existing techniques for reconfigurable processors focus on the computation model. This paper focuses on increasing the granularity of configurable units without compromising flexibility. This is carried out by matching the granularity to the degree-of-freedom processing in most wireless systems. A design flow that accelerates the exploration of tradeoffs among various architectures for the configurable unit is discussed. A prototype processor is implemented using the Intel 0.13-mum CMOS standard cell library. The estimated energy efficiency is in the same order as dedicated hardware implementations  相似文献   

16.
The authors present scalar implementations of multichannel and multiexperiment fast recursive least squares algorithms in transversal filter form, known as fast transversal filter (FTF) algorithms. By processing the different channels and/or experiments one at a time, the multichannel and/or multiexperiment algorithm decomposes into a set of intertwined single-channel single-experiment algorithms. For multichannel algorithms, the general case of possibly different filter orders in different channels is handled. Geometrically, this modular decomposition approach corresponds to a Gram-Schmidt orthogonalization of multiple error vectors. Algebraically, this technique corresponds to matrix triangularization of error covariance matrices and converts matrix operations into a regular set of scalar operations. Modular algorithm structures that are amenable to VLSI implementation on arrays of parallel processors naturally follow from the present approach. Numerically, the resulting algorithm benefits from the advantages of triangularization techniques in block processing  相似文献   

17.
The differences in the initial ISDN switch implementations by different manufacturers are sufficiently profound that ISDN telephones designed for one switch may not be compatible with other manufacturers' switches. In addition, as new features are added to a given switch, older telephones may not be able to use them. The author examines two solutions to these problems: Bellcore-specified ISDN interfaces and programmable terminals. Bellcore-specified ISDN interfaces are desired by the Regional Bell Operating Companies and support considerable portability. Although conformance to Bellcore specifications will produce a high degree of terminal portability, it is unlikely ever to produce full portability and will not produce any portability in 1990. Programmable terminals are shown to offer interesting potentials for portability and extensibility  相似文献   

18.
A new adaptive filtering algorithm for time-series data based on the QRD inverse updates method of Pan and Plemmons (1989) is derived from first principles. In common with other fast algorithms for time-series adaptive filtering, this algorithm only requires O(p) operations for the solution of a pth-order problem. Unlike previous fast algorithms based on the QRD technique, the algorithm presented here explicitly produces the transversal filter weights. Furthermore the derivation of the algorithm is achieved, quite simply, by means of signal-flow-graph manipulation. The relationship between this fast QRD inverse updates algorithm and the FTF algorithm is briefly discussed. The results of some preliminary computer simulations of the algorithm, using finite-precision floating-point arithmetic, are presented  相似文献   

19.
胡冰新  董玮  于全 《信号处理》2006,22(1):53-56
本文提出了一种采用Householder变换实现的递归QRD-LS算法,该算法通过采用Householder变换取代Giv- ens旋转递归实现复矩阵的QR分解来求解LS问题,可以获得比基于Givens旋转的QRD—LS算法更快的处理速度。此外, 算法引入了复数QR分解,解决了算法只能处理实数信号的问题。通过定义新的数据矩阵,算法还可以合并求解数据域正规方程中的系数矩阵和右侧向量,从而提高了计算效率。通过对其在智能天线中的应用进行仿真,验证了算法的性能。  相似文献   

20.
刘先锋  刘勤 《无线通信技术》2008,17(4):17-19,24
认知无线电作为一种智能无线电技术,可赋予无线通信系统以电磁环境感知能力,有效解决频谱利用率和管理的问题.软件通信体系结构(简称SCA)已被软件无线电论坛采纳为嵌入式系统的标准通信软件结构.SCA提供了一种支持通信软件和硬件可移植、可配置、可扩充和可重用的软件平台.本文首先论述了认知无线电的概念、功能,接着结合开源认知无线电的概念,提出了一种基于SCA的认知无线电台的新结构,并进行了详细的阐述.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号