首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
陈海燕  杨超  刘胜  刘仲 《电子学报》2016,44(2):241-246
随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元,并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题,采用简单的部分地址异或逻辑完成SIMD并行访存地址转换,实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令,可完全消除SIMD结构下基2 FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明,采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计,不同点数FFT运算可获得1.32~2.66的加速比.  相似文献   

2.
介绍一种基于32位PCI总线结构的高性能数据采集卡,该卡的最高采样率可达到20M,并且可以连续地、不间断地把采样数据流高速传送到主机内存中。该卡的高速采样性能使其特别适用于数字信号处理、快速傅里叶变换、数字滤波和图像处理等方面。  相似文献   

3.
郭骁  张月  陈曾平  李涛 《信号处理》2013,29(11):1488-1494
随着宽带雷达系统所采用的信号带宽不断增加,为了实时完成脉冲压缩处理,需要进行超长点数FFT运算。本文提出一种超长点数FFT运算的实现方法。运用二维FFT算法,基于高性能FPGA处理平台,将超长点数FFT运算转换为两级短点数FFT的级联处理,并通过片外存储器解决片内存储资源有限的问题。实现结构上采用并行处理结构,显著提升了运算速度,可以实现在5ms内完成4M点数的FFT运算。实验结果表明,在相应的处理平台上,本文提出的超长点数FFT实现方法可以满足雷达系统的实时性需求,解决了宽带雷达实时脉冲压缩的关键问题。   相似文献   

4.
The performance of high-speed, high-quantum-efficiency GaAlAsSb avalanche photodetectors suitable for a1.0-1.4 mum high-performance fiber-optical communication system is described. The incorporation of these APD's with state-of-the-art GaAs FET electronics can lead to hybrid integrated optical receivers with 10-20 times better sensitivity at a 100-MHz bandwidth than is available with germanium APD's.  相似文献   

5.
A dynamic scaling FFT processor for DVB-T applications   总被引:1,自引:0,他引:1  
This paper presents an 8192-point FFT processor for DVB-T systems, in which a three-step radix-8 FFT algorithm, a new dynamic scaling approach, and a novel matrix prefetch buffer are exploited. About 64 K bit memory space can be saved in the 8 K point FFT by the proposed dynamic scaling approach. Moreover, with data scheduling and pre-fetched buffering, single-port memory can be adopted without degrading throughput rate. A test chip for 8 K mode DVB-T system has been designed and fabricated using 0.18-/spl mu/m single-poly six-metal CMOS process with core area of 4.84 mm/sup 2/. Power dissipation is about 25.2 mW at 20 MHz.  相似文献   

6.
Scalable IP lookup for Internet routers   总被引:2,自引:0,他引:2  
Internet protocol (IP) address lookup is a central processing function of Internet routers. While a wide range of solutions to this problem have been devised, very few simultaneously achieve high lookup rates, good update performance, high memory efficiency, and low hardware cost. High performance solutions using content addressable memory devices are a popular but high-cost solution, particularly when applied to large databases. We present an efficient hardware implementation of a previously unpublished IP address lookup architecture, invented by Eatherton and Dittia (see M.S. thesis, Washington Univ., St. Louis, MO, 1998). Our experimental implementation uses a single commodity synchronous random access memory chip and less than 10% of the logic resources of a commercial configurable logic device, operating at 100 MHz. With these quite modest resources, it can perform over 9 million lookups/s, while simultaneously processing thousands of updates/s, on databases with over 100000 entries. The lookup structure requires 6.3 bytes per address prefix: less than half that required by other methods. The architecture allows performance to be scaled up by using parallel fast IP lookup (FIPL) engines, which interleave accesses to a common memory interface. This architecture allows performance to scale up directly with available memory bandwidth. We describe the tree bitmap algorithm, our implementation of it in a dynamically extensible gigabit router being developed at Washington University in Saint Louis, and the results of performance experiments designed to assess its performance under realistic operating conditions.  相似文献   

7.
In recent years, the booming bandwidth demands of dedicated mobile services have driven the rapid development of optical transport networks (OTNs). Through the in-novative use of emerging coherent optical communication technology and the advancement of microelectronics technology, the new-generation 100Gb/s transport technology offers a high line rate and unprecedented resilience to optical transmission impairments. This paper overviews the bandwidth demands of China Mobile driven by the upcoming rollout of Time Division-Long Term Evolution (TD-LTE) and presents the 100Gb/s trials at China Mobile that were used to verify the performance of a 100Gb/s system. China Mobile’s considerations, which were based on the trial results, regarding the deployment of 100Gb/s transport systems are introduced, and the requirements of China Mobile for the evolution of 100Gb/s transport systems are summarized.  相似文献   

8.
Charge coupled device (CCD) memory technology offers potential economic advantages over semiconductor random-access memory technology. However, the limitations incurred by the serial nature of CCD's have previously restricted their application to computer mainframe memories. The 64 kbyte CCD memory system described in this paper demonstrates the feasibility of CCD memory technology for moderate size memory systems applicable to microcomputer systems. Design objectives included low cost, adequate performance, reliable operation, small size, and low power consumption as well as simple interfacing to standard microprocessors. A simple two-level organization employing a random access memory (RAM) to buffer the serial CCD memory was used to improve the memory system performance and to simplify the interfacing of microcomputers. It is anticipated that the memory system can be easily modified to use 64 kbit and larger CCD memory devices as these become available. Furthermore, the memory system control logic could be integrated on a single large-scale integration (LSI) chip, thereby facilitating the fabrication of relatively large and economical memory systems with a low component count.  相似文献   

9.
博微DSP1042(BWDSP1042)是我国自主研发的一款高性能数字信号处理器.现阶段,由于BWDSP硬件计算资源和访存带宽限制,通过调优快速傅里叶变换(Fast Fourier Transform,FFT)算法结构运算时间仍可减少.基于高性能多核BWDSP1042体系架构以及指令编排原则,优化了基-2FFT算法结构...  相似文献   

10.
Low rate convolutional and turbo codes that output non‐linear cyclic (NLC) codewords of length n = 2m, m being a positive integer, are described. These codes have a very low coding rate, which makes them especially suitable for spread spectrum systems where they can be used for simultaneously achieving error correction and bandwidth expansion. Due to the cyclic properties and codeword length of the component codes, branch metrics can be efficiently computed using the fast Fourier transform (FFT), enabling simple implementation of the encoder and decoder. Among the possible NLC base codes, special attention is given to the Tomlinson, Cercas, Hughes (TCH) codes family due to their good autocorrelation properties. It is shown by simulation that the turbo codes schemes studied usually perform better than traditional turbo codes (in this paper the universal mobile telecommunications system (UMTS), rate 1/3 turbo code was used as a reference). This improvement is accomplished at the cost of bandwidth expansion. One of the advantages of the presented solutions over other low rate codes is their ability to improve the synchronization process at the receiver due to the good autocorrelation properties of the available NLC codes (especially TCH codes). A comparison of performance between the UMTS uplink connection and an equivalent system using the proposed codes for a multiuser scenario in a multipath fading channel is presented showing the possibility of capacity increase when using these codes. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

11.
In this paper, we describe the implementation of MorphoSys, a reconfigurable processing system targeted at data-parallel and computation-intensive applications. The MorphoSys architecture consists of a reconfigurable component (an array of reconfigurable cells) combined with a RISC control processor and a high bandwidth memory interface. We briefly discuss the system-level model, array architecture, and control processor. Next, we present the detailed design implementation and the various aspects of physical layout of different sub-blocks of MorphoSys. The physical layout was constrained for 100 MHz operation, with low power consumption, and was implemented using 0.35 m, four metal layer CMOS (3.3 Volts) technology. We provide simulation results for the MorphoSys architecture (based on VHDL model) for some typical data-parallel applications (video compression and automatic target recognition). The results indicate that the MorphoSys system can achieve significantly better performance for most of these applications in comparison with other systems and processors.  相似文献   

12.
FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm2, respectively.  相似文献   

13.
14.
This paper aims to compare the electrical chip-to-chip interconnects and optical interconnect from the physical view and the computing performance view. Using transmission line theory, the constraints of transmission bandwidth is obtained. The calculation indicates that ideal maximum electrical capacity density is much lower than the experimental optical interconnected one. In this calculation, all the parameters come from International Technology Roadmap for Semiconductors (ITRS) and reports of world-class laboratories. Compared with bandwidth, application is a more important factor, which has great influence on the technology development direction and deployment in the real scenarios. Therefore, we take fast Fourier transform (FFT) computation as an example to study whether and how the optical interconnect technology has influences on the computing performance. One of most popular topology, mesh architecture is evaluated in this paper. The results illustrate the bandwidth increase will lift up the speedup and efficiency, especially for the clusters with many processors working cooperatively. From the above analysis, it is shown that the unique features of optical interconnects make it possible to provide more bandwidth, and then bring great advantage to computing performance.  相似文献   

15.
Advanced optical interconnection technology in switching equipment   总被引:2,自引:0,他引:2  
Demands for increased interconnection density and higher bandwidth, coupled with stringent cost constraints of advanced wide bandwidth telecommunication switching equipment, are exhausting conventional electrical interconnection capabilities. The requirement for greater interconnection capabilities, spawned in part by the advances in integrated circuit technologies and the need for enhanced digital services, dictate that technology advancement must occur in traditional electronic packaging and/or interconnection techniques. The resolution of these technological needs is paramount for the successful competitive introduction of these systems. Presently, a “bottle-neck” occurs at the board-to-board level of the interconnection hierarchy. Therefore, an opportunity exists for the development of new optical interconnection techniques which can be incorporated into system designs beginning at this interconnection level and beyond. The strategic insertion of optical interconnection technology into these electronic processing systems not only meets projected performance requirements, but potentially offers them at a competitive cost. This paper describes some of the new optical strategies switching equipment designers are incorporating into today's products. These strategies range from optical data links to an implementation of a flexible optical backplane called OptiFlex  相似文献   

16.
Speeding up fast Fourier transform (FFT) computations is critical for today's real-time systems targeting signal processing and telecommunication applications. Aiming at the performance improvement and the efficiency of FFT architectures, this paper presents an address generation technique which enables a radix-$b$ processor to access in parallel $b$ memory banks without conflicts during each stage's computations. Using $kb$ memory banks at each stage leads to increasing the speedup of the algorithm by a factor of $kb$ . The address generation can be realized in each radix-$b$ stage by the use of lookup tables of size $O(kb^{2})$ bits. The proposed technique is cost efficient and leads to the design of FFT architectures of high speedup and high sustained throughput.   相似文献   

17.
模拟预失真器具有带宽宽、结构简单、功耗低和延时少等优点,满足第五代移动通信系统(5G)及超 5G 的功放线性化对大带宽、低功耗和低延时的要求。然而随着移动通信系统的发展,信号的带宽和调制度越来越 高,功率放大器的记忆效应影响也越来越强,而传统的模拟预失真器无法补偿功放的记忆效应。为了解决模拟预失 真电路的记忆效应补偿问题,文中提出了一种基于延迟线补偿记忆效应的肖特基二极管模拟预失真器(SDD-APD)。 该模拟预失真器采用不等长微带线作为延迟线,用来补偿功放的记忆效应。采用100 MHz 带宽5G 新无线电(NR) 信号对工作在3. 5 GHz 的AB 类功放进行测试,结果表明该模拟预失真器可以补偿功放的记忆效应,并能将功放的 非线性改善10 dB 以上。  相似文献   

18.
An InAlAs-InGaAs-InP HBT CPW distributed amplifier (DA) with a 2-30 GHz 1-dB bandwidth has been demonstrated which benchmarks the widest bandwidth reported for an HBT DA. The DA combines a 100 GHz fmax and 60 GHz fT HBT technology with a cascode coplanar waveguide DA topology to achieve this record bandwidth. The cascode gain cell offers 5-7 dB more available gain (MAG) than a common-emitter, and is used to extend the amplifier's upper frequency performance. A coplanar waveguide design environment is used to simplify the modeling and fabrication, as well as to reduce the size of the amplifier. Novel active load terminations for extending the DA's lower frequency response were separately demonstrated. The active loads are capable of extending the lower bandwidth performance by two decades resulting in performance below 45 MHz. This work explores both design techniques and technology capability which can be applied to other distributively matched HBT circuits such as active baluns for mixers, active combiners/dividers, and low DC power-broadband amplifiers  相似文献   

19.
程俊 《现代电子技术》2005,28(21):58-59,62
随着集成电路技术的发展,电子设计自动化逐渐成为重要的设计手段,已经广泛应用于数字电路和数字信号处理系统等许多领域.文中介绍了基于VHDL语言设计的浮点FFT,本设计采用基2算法,单精度32位二进制的浮点形式,主控制器采用状态机建模.整个设计利用Xilinx公司提供的先进的ISE 5.3系列软件,采用了先进的结构化设计思想.总设计通过了Modelsim仿真与验证,二十多个模块的代码覆盖率达到100%.实践结果表明,应用VHDL实现的FFT处理器可快速完成浮点数据快速傅式变换,代码覆盖率也表明系统的测试工作比较完备.该系统可扩展到16点,32点的浮点FFT运算.  相似文献   

20.
Entropy Based Adaptive Flow Aggregation   总被引:1,自引:0,他引:1  
Internet traffic flow measurement is vitally important for network management, accounting and performance studies. Cisco's NetFlow is a widely deployed flow measurement solution that uses a configurable static sampling rate to control processor and memory usage on the router and the amount of reporting flow records generated. But during flooding attacks the memory and network bandwidth consumed by flow records can increase beyond what is available. Currently available countermeasures have their own problems: 1) reject new flows when the cache is full - some legitimate new flows will not be counted; 2) export not-terminated flows to make room for new ones - this will exhaust the export bandwidth; and 3) adapt the sampling rate to traffic rate - this will reduce the overall accuracy of accounting, including legitimate flows. In this paper, we propose an entropy based adaptive flow aggregation algorithm. Relying on information-theoretic techniques, the algorithm efficiently identifies the clusters of attack flows in real time and aggregates those large number of short attack flows into a few metaflows. Compared to currently available solutions, our solution not only alleviates the problem in memory and export bandwidth, but also significantly improves the accuracy of legitimate flows. Finally, we evaluate our system using both synthetic trace file and real trace files from the Internet.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号