首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
主要阐述太赫兹(THz)通信系统中的信道编码部分,利用CPU多核进行并行计算,实现对Turbo码的编译码程序的加速。通过4个方面对Turbo码的编译码进行优化加速,包括预留内存空间、并行循环以及对编码结构和译码公式的优化,从而实现代码运行时间的缩短。经实验验证,经过对不同码长的数据进行编译码运算,发现在输入码长为10 000 bit时,并行计算时间可以缩短56.6%。  相似文献   

2.
Embedded systems are characterized by the requirement of demanding small memory footprint code. A popular architectural modification to improve code density in RISC embedded processors is to use a reduced bit-width instruction set. This approach reduces the length of the instructions to improve code size. However, having less addressable registers by the reduced instructions, these architectures suffer a slight performance degradation as more reduced instructions are required to execute a given task. On the other hand, 0-operand computers such as stack and queue machines implicitly access their source and destination operands making instructions naturally short. Queue machines offer a highly parallel computation model, unlike the stack model. This paper proposes a novel alternative for reducing code size by using a queue-based reduced instruction set while retaining the high parallelism characteristics in programs. We introduce an efficient code generation algorithm to generate programs for our reduced instruction set. Our algorithm successfully constrains the code to the reduced instruction set with the addition of only 4% extra code, in average. We show that our proposed technique is able to generate about 16% more compact code than MIPS16, 26% over ARM/Thumb, and 50% over MIPS32 code. Furthermore, we show that our compiler is able to extract about the same parallelism than fully optimized RISC code.  相似文献   

3.
Vector coding for partial response channels   总被引:1,自引:0,他引:1  
A linear technique for combining equalization and coset codes on partial response channels with additive white Gaussian noise is developed. The technique, vector coding, uses a set of transmit filters or `vectors' to partition the channel into an independent set of parallel intersymbol interference (ISI)-free channels for any given finite (or infinite) block length. The optimal transmit vectors for such channel partitioning are shown to be the eigenvectors of the channel covariance matrix for the specified block length, and the gains of the individual channels are the eigenvalues. An optimal bit allocation and energy distribution, are derived for the set of parallel channels, under an accurate extension of the continuous approximation for power in optimal multidimensional signal sets for constellations with unequal signal spacing in different dimensions. Examples are presented that demonstrate performance advantages with respect to zero-forcing decision feedback methods that use the same coset code on the same partial response channel. Only resampling the channel at an optimal rate and assuming no errors in the feedback path will bring the performance of the decision feedback methods up to the level of the vector coded system  相似文献   

4.
We present an efficient approach for the partitioning of algorithms implementing long convolutions. The dependence graph (DG) of a convolution algorithm is locally sequential globally parallel (LSGP) partitioned into smaller, less complex convolution algorithms. The LSGP partitioned DG is mapped onto a signal flow graph (SFG), in which each processor element (PE) performs a small convolution algorithm. The key is then to reduce the complexity of the SFG in two steps: 1. local reduction of complexity: the short Fast Fourier Transform (FFT) is used to perform the small convolution within the PE; and 2. global reduction of complexity: the short FFTs within the PEs are relocated to the global level, where redundant short FFT operations are eliminated. The remaining operation within the PEs is now a simple element-wise multiply-add. After a graph transform, the structure of the SFG kernel is recognized as a set of parallel small convolutions. If we use the short FFT to perform these short convolutions, we come to our final realization of the long convolution algorithm. The computational complexity of this realization is close to the optimum for convolutions, that is, O(N log N). Our approach is thus achieving this N log N –low without having to implement large-size FFTs. We use, instead, small FFT blocks. The advantage is that small FFT transforms are commercially available, and that they can even be implemented in single-chip VLSI architectures. Our final SFG is three dimensional and can be mapped efficiently onto prototype architectures or dedicated VLSI processors. We demonstrate the procedure in the paper by a design example: the implementation of a prototype convolution architecture that we designed for a real-time radar imaging system.  相似文献   

5.
Fast and precise Fourier transforms   总被引:2,自引:0,他引:2  
Many applications of fast Fourier transforms (FFTs), such as computer tomography, geophysical signal processing, high-resolution imaging radars, and prediction filters, require high-precision output. An error analysis reveals that the usual method of fixed-point computation of FFTs of vectors of length 2l leads to an average loss of l/2 bits of precision. This phenomenon, often referred to as computational noise, causes major problems for arithmetic units with limited precision which are often used for real-time applications. Several researchers have noted that calculation of FFTs with algebraic integers avoids computational noise entirely. We combine a new algorithm for approximating complex numbers by cyclotomic integers with Chinese remaindering strategies to give an efficient algorithm to compute b-bit precision FFTs of length L. More precisely, we approximate complex numbers by cyclotomic integers in Z[e(2πi/2n)] whose coefficients, when expressed as polynomials in e(2πi/2n), are bounded in absolute value by some integer M. For fixed n our algorithm runs in time O(log(M)), and produces an approximation with worst case error of O(1/M(2n-2-1)). We prove that this algorithm has optimal worst case error by proving a corresponding lower bound on the worst case error of any approximation algorithm for this task. The main tool for designing the algorithms is the use of the cyclotomic units, a subgroup of finite index in the unit group of the cyclotomic field. First implementations of our algorithms indicate that they are fast enough to be used for the design of low-cost high-speed/high-precision FFT chips  相似文献   

6.
A method for the automatic generation of compact symmetric fast Fourier transforms (FFTs) from high-level specifications is presented. The generated code eliminates all redundant computations induced by the symmetries in the FFT data flow, minimizing storage requirements, input/output and arithmetic operations, and preserves the attractive computational feature of FFT algorithms. The code-generating method can be expressed as a simple and well-structured meta-algorithm whose inputs are the dimension and edge size of the data array and a matrix representation of the data symmetries  相似文献   

7.
雷红轩  彭家寅  刘熠 《电子学报》2016,44(12):2932-2938
程序验证是保证程序正确性的关键技术.由于经典世界和量子世界的本质不同,经典程序验证的技术和工具不能直接应用到量子系统.而量子程序设计语言是描述量子系统的一种新的形式化模型,量子程序的验证问题就显得更为迫切和必要.本文首先讨论了量子通讯中常用的比特翻转、相位翻转、去极化、幅值阻尼、相位阻尼等信道作为特殊的非确定型量子程序从计算基态开始运行时的可达集合和终止集合等程序验证问题.其次,把上述五种量子程序两两组合组成非确定型量子程序,根据这五种量子程序的可达集合之相似点,最终合并成三种非确定型量子程序,重点讨论了这三种非确定型量子程序从计算基态开始运行时的终止和发散等程序验证问题.研究表明:这三种非确定型量子程序从计算基态0开始运行时都是终止的;而从计算基态1开始运行时:比特翻转信道和去极化信道组成的非确定型量子程序的终止和发散与分别刻画它们的两个参数有关;比特翻转信道和相位翻转信道组成的非确定型量子程序的终止和发散只与刻画比特翻转信道的参数有关;幅值阻尼信道和相位阻尼信道组成的非确定型量子程序是发散的,其发散条件与刻画量子信道的两个参数都没有关系.本文的结果可以为量子信息安全中量子通讯协议的验证提供理论和技术支持.  相似文献   

8.
A hybrid correlator architecture is described which combines the serial structure of an active correlator with the parallel structure of a matched filter correlator. The mean PN code acquisition time performance of this hybrid serial-parallel correlator structure is analysed. Results are shown which compare the acquisition performance of the serial, parallel, and serial-parallel structures. The results are for a PN code length of 64 code chips and assumes a Gaussian channel with the receiver detection threshold set to obtain a constant false alarm rate. An enhancement to the serial-parallel acquisition algorithm is also described which can increase the acquisition time performance by about 15% for typical operating conditions. Overall the results demonstrate that the hybrid correlator can provide rapid code acquisition with a limited receiver complexity.  相似文献   

9.
2n modified prime codes are designed for all-optical code-division multiple access (CDMA) networks using very simple encoders and decoders. The proposed code is obtained from an original 2n prime code of prime number P. By padding P-1 zeros in each `subsequence' of codewords in the corresponding 2n prime code. The cross-correlation constraint of the resulting 2n modified prime code is equal to one, as opposed to two for a 2n prime code. For a given bit error rate (BER), the proposed code can thus be used to support a larger number of active users in the fibre optic CDMA network than a 2n prime code. Moreover, using the former can also reduce code length and weight compared with employing the latter to achieve the same BER  相似文献   

10.
A new practical method for decoding low-density parity check (LDPC) codes is presented. The followed approach involves reformulating the parity check equations using nonlinear functions of a specific form, defined over Rrho, where rho denotes the check node degree. By constraining the inputs to these functions in the closed convex subset [0,1]rho ("box" set) of Rrho, and also by exploiting their form, a multimodal objective function that entails the code constraints is formulated. The gradient projection algorithm is then used for searching for a valid codeword that lies in the vicinity of the channel observation. The computational complexity of the new decoding technique is practically sub-linearly dependent on the code's length, while processing on each variable node can be performed in parallel allowing very low decoding latencies. Simulation results show that convergence is achieved within 10 iterations, although some performance degradations relative to the belief propagation (BP) algorithm are observed  相似文献   

11.
In embedded control applications, system cost and power/energy consumption are key considerations. In such applications, program memory forms a significant part of the chip area. Hence reducing code size reduces the system cost significantly. A significant part of the total power is consumed in fetching instructions from the program memory. Hence reducing instruction fetch power has been a key target for reducing power consumption. To reduce the cost and power consumption, embedded systems in these applications use application specific processors that are fine tuned to provide better solutions in terms of code density, and power consumption. Further fine tuning to suit each particular application in the targeted class can be achieved through reconfigurable architectures. In this paper, we propose a reconfiguration mechanism, called Instruction Re-map Table, to re-map the instructions to shorter length code words. Using this mechanism, frequently used set of instructions can be compressed. This reduces code size and hence the cost. Secondly, we use the same mechanism to target power reduction by encoding frequently used instruction sequences to Gray codes. Such encodings, along with instruction compression, reduce the instruction fetch power. We enhance Texas Instruments DSP core TMS320C27x to incorporate this mechanism and evaluate the improvements on code size and instruction fetch energy using real life embedded control application programs as benchmarks. Our scheme reduces the code size by over 10% and the energy consumed by over 40%. *A preliminary version of this paper has appeared in the International Conference on Computer Aided Design (ICCAD-2001), San Jose, CA, November 2001.  相似文献   

12.
Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low‐power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low‐power commutators based on an advanced interconnection, and parallel‐pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel‐pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.  相似文献   

13.
14.
该文提出了一种针对高误码条件下Turbo码交织器的恢复方法,应用于码率为1/3的并行级联Turbo码。信道编码识别是非合作信号处理领域的重要内容,Turbo码交织器的恢复是其中的一个难点。现有的识别方法可以有效地处理无误码时的问题,而实际通信中Turbo码经常应用于信道质量较差的情况,此时误码率会较高,且码长较长,这些方法将失效。利用校验向量的特征,可将交织器的每个位置分离开来,单独求解,使得交织器中每个位置的恢复仅依赖于几个相关的位置,避免了误码累加效应,从而解决了在高误码率,长码长时的识别问题,其复杂度较低。在仿真结果中,对典型的长度达10000的随机交织器,接收序列10%误码率的情况下,实现了正确的恢复。  相似文献   

15.
This paper presents an implementation of a fuzzy controller for DC-DC power converters using an inexpensive 8-bit microcontroller. An “on-chip” analog-to-digital (A/D) converter and PWM generator eliminate the external components needed to perform these functions. Implementation issues include limited on-chip program memory of 2 kB, unsigned integer arithmetic and computational delay. The duty cycle for the DC-DC power converter can only be updated every eight switching cycles because of the time required for the A/D conversion and the control calculations. However, it is demonstrated here that stable responses can be obtained for both buck and boost power converters under these conditions. Another important result is that the same microcontroller code, without any modifications, can control both power converters because their behavior can be described by the same set of linguistic rules. The contribution shows that a nonlinear controller such as fuzzy logic can be inexpensively implemented with microcontroller technology  相似文献   

16.
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel code tailored to the characteristics of the target architecture. Several components of the synthesis system are described, focusing on performance optimization issues that they address.  相似文献   

17.
The future of computational electromagnetics is changing drastically with the new generation of computer chips, which are multi-cored instead of single-cored. Previously, advancements in chip technology meant an increase in clock speed, which was typically a benefit that computational code users could enjoy. This is no longer the case. In the new roadmaps for chip manufacturers, speed has been sacrificed for improved power consumption, and the direction is multi-core processors. The burden now falls on the software programmer to revamp existing codes and add new functionality to enable computational codes to run efficiently on this new generation of multi-core processors. In this paper, a new roadmap for computational code designers is provided, demonstrating how to navigate along with the chip designers through the multi-core advancements in chip design. A new parallel code, using the Method of Moments (MoM) and higher-order functions for expansion and testing, and executed on a range of computer platforms, will illustrate this roadmap. The advantage of a higher-order basis over a subdomain basis is a reduction in the number of unknowns. This means that with the same computer resources, a larger problem can be solved using higher-order basis than using a subdomain basis. The matrix filling for MoM with subdomain basis must be programmed with multiple loops over the edges of the patches to account for the interactions. However, higherorder basis functions, such as polynomials, can be calculated more efficiently with fewer integrations, at least for the senial code. In terms of parallel integral-equation solvers, the differences between these categories of basis functions must be understood and accommodated. If computational codes are not written properly for parallel operation, taking into account the central processing unit (CPU) architecture and operating system, the result will be an extremely inefficient code. The research presented here will show how to take th  相似文献   

18.
We introduce the concept of "parallel error correcting" codes, the error correcting codes for parallel channels. Here, a parallel channel is a set of channels such that the additive error over a finite field occurs in one of its members at time T if the same error occurs in all members at the same time. The set of codewords of a parallel error correcting code has to be a product set, if the messages transmitted are from independent information sources. We present a simple construction of optimal parallel error correcting codes based on ordinary optimal error correcting codes and a construction of optimal linear parallel codes for independent sources based on optimal ordinary linear error correcting codes. The decoding algorithms for these codes are provided as well  相似文献   

19.
Bounds on information combining   总被引:1,自引:0,他引:1  
When the same data sequence is transmitted over two independent channels, or when a data sequence is transmitted twice but independently over the same channel, the independent observations can be combined at the receiver side. From an information-theory point of view, the overall mutual information between the data sequence and the received sequences represents a combination of the mutual information of the two channels. This concept is termed information combining. A lower bound and an upper bound on the combined information is presented, and it is proved that these bounds are tight. Furthermore, this principle is extended to the computation of extrinsic information on single code bits for a repetition code and for a single parity-check code of length three, respectively. For illustration of the concept and the bounds on information combining, two applications are considered. First, bounds on the information processing characteristic (IPC) of a parallel concatenated code are derived from its extrinsic information transfer (EXIT) chart. Second, bounds on the EXIT chart for an outer repetition code and for an outer single parity-check code of a serially concatenated coding scheme are computed.  相似文献   

20.
Performance problems in asynchronous massively parallel programs are often the result of unforeseen and complex asynchronous interactions between autonomous processing elements. Then performance problems are not inefficiencies in source code, but gaps in the algorithm designer's understanding of a complex physical system. The analyst forms hypotheses about the probable causes or possible improvements, and verifies these hypotheses by modifying the program and testing it again. These hypotheses can be formed by a variety of methods, from simple and mostly fruitful techniques for suggesting possible source code improvements to the difficult, indirect, and possibly futile activity of visualizing execution. The author describes a visualization system for massively parallel execution data and shows how drawbacks in other analysis methods sometimes make visualization necessary despite its difficulty  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号