首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
介绍了交错并联技术的工作原理和性能优点,给出了其设计要求,总结目前常见的交错并联控制方法及它们各自的优势和劣势。  相似文献   

2.
针对开关电源在传统的Boost功率因数校正电路中有着明显的开关损耗,使得电路具有较高成本和低效率。文中在传统单相Boost变换器的基础上,采纳多通道交错并联技术来进行有源功率因数校正的主电路拓扑。以三相交错并联Boost变换器为例,分析其工作过程,并通过仿真实验证明了多相交错并联Boost PFC变换器具有减小输入电流纹波和输入电感值,以及提高变换器的效率等优点。  相似文献   

3.
This paper presents a technique for simulating processors based on the principle of compiled simulation. Unlike existing, commercially available instruction set simulators for DSPs, which are of interpretive character, the proposed technique performs instruction decoding and simulation scheduling at compile time. The technique offers up to three orders of magnitude faster simulation. The high speed allows the user to explore algorithms and hardware/software trade-offs before any hardware implementation. Moreover, the user can tailor the compiled simulation to trade speed for more accuracy. In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented.  相似文献   

4.
DSP结构可以分为定点型(FXP)和浮点型(FLP).虽然FXP型DSP只能实现整数运算,但是它运算速度快,占用资源少,比FLP型成本低.而FXP型DsP使用FLP算法能够实现更高的精度和动态运算范围.对FXP DSP结构支持下的FLP需求不断增长,这主要有以下原因:第一,实现算法代码通常用C/C (采用浮点数形式)编写,将FLP算法转换成FXP格式是比较麻烦的.而将浮点算法移植到DSP平台所花费的时间较少,因而FLP降低了研发成本.另外,常用的算法得益于浮点运算提供的较大的运算范围.最后,在某些情况下应用FXP算法无法获得期望的精度和动态范围.  相似文献   

5.
The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially.  相似文献   

6.
The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. Moreover, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straightforward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7 VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results.  相似文献   

7.
Long Bose–Chaudhuri–Hocquenghen (BCH) codes are used as the outer error correcting codes in the second-generation Digital Video Broadcasting Standard from the European Telecommunications Standard Institute. These codes can achieve around 0.6-dB additional coding gain over Reed–Solomon codes with similar code rate and codeword length in long-haul optical communication systems. BCH encoders are conventionally implemented by a linear feedback shift register architecture. High-speed applications of BCH codes require parallel implementation of the encoders. In addition, long BCH encoders suffer from the effect of large fanout. In this paper, three novel architectures are proposed to reduce the achievable minimum clock period for long BCH encoders after the fanout bottleneck has been eliminated. For an (8191, 7684) BCH code, compared to the original 32-parallel BCH encoder architecture without fanout bottleneck, the proposed architectures can achieve a speedup of over 100%.  相似文献   

8.
Scalable Parallel Memory Architectures for Video Coding   总被引:1,自引:0,他引:1  
Current video compression standards, which process frames macroblock by macroblock, employ several processing functions to achieve the compression. These functions refer to data memory address space in different ways. E.g., performing motion estimation and motion compensation functions requires many times data accesses unaligned to word boundaries. On the other hand, Discrete Cosine Transformation (DCT) and inverse of it (IDCT) for 8 × 8 block can be performed first for rows and then for columns. Thus, transposition is needed between these two stages. Among other things, parallel memory architecture can provide a solution for these tasks. In our other paper, we shortly surveyed parallel memory architectures and proposed parallel memory architecture designs for different data path widths for video coding applications. In this paper, we construct video coding function examples by using the proposed parallel data memory efficiently. Furthermore, performance and implementation cost of the parallel memory architecture are estimated and compared to more conventional memory architectures. The examples are given for different data bus widths (16, 32, 64, and 128 bits). We show that the parallel memory can keep the data path fully utilized in many video coding function implementations. This ensures high-speed operation and full utilization of the processing resources.  相似文献   

9.
10.
The stickers model is a model of DNA computation that is computationally complete and universal. Many NP complete problems can be described by stickers programs that have polynomial runtime and are exponential in space. The stickers model can be viewed as a bit-vertically operating register machine. This makes it attractive for in silico implementation. This paper describes a stickers model for the maximum clique problem and its implementation by an FPGA architecture. The results show that the FPGA based algorithm is comparable with existing software algorithms for moderate problem sizes. More generally, the stickers model seems to be a well-suited programming model for dedicated hardware.  相似文献   

11.
DSP的并行处理方法   总被引:3,自引:0,他引:3  
TI公司TMS320C6x和AD公司ADSP2106x是目前业界使用广泛的数字信号处理嚣(DSP).本文详细地介绍了利用TMS320C6x的接口HPI、接口McBSP以及ADSP2106x的Link接口分别组成并行DSP处理系统的方法.同时介绍了这些方法的优缺点。  相似文献   

12.
张思栋  黄鲁  林贝元 《微电子学》2007,37(5):712-716
提出了一种基于优化时间重叠技术的10位300 MHz采样率4路并行流水线A/D转换器的设计方法,该方法降低了对运算放大器的要求。通过理论计算和实例设计,证明了此低功耗设计方法的显著效果。设计了一个用于前端的运算放大器,在CSM 0.35μm CMOS工艺、3.3 V电源电压下,该运放的增益为106 dB,单位增益带宽为402 MHz,建立时间为8.8 ns。采用优化时间重叠技术后,可满足4路并行300 MHz采样率的要求,功耗仅为8.57 mW,可大大降低整个并行流水线A/D转换器的功耗。  相似文献   

13.
针对EMI干扰源,设计了一种交错并联控制方式的多路输出BUCK变换器。详细介绍了交错并联控制的工作方式,对其进行了时域和频域分析,与同相控制方式进行了比较,并通过Saber仿真软件对其进行了仿真验证。最后完成了一台28V输入,四路18V/1.5A输出的实验样机,测试结果表明,交错并联控制方式可以有效的减小干扰源的1次谐波的幅值,提高基波频率,为实现LOW-EMI变换器提供有效途径。  相似文献   

14.
Speeding up fast Fourier transform (FFT) computations is critical for today's real-time systems targeting signal processing and telecommunication applications. Aiming at the performance improvement and the efficiency of FFT architectures, this paper presents an address generation technique which enables a radix-$b$ processor to access in parallel $b$ memory banks without conflicts during each stage's computations. Using $kb$ memory banks at each stage leads to increasing the speedup of the algorithm by a factor of $kb$ . The address generation can be realized in each radix-$b$ stage by the use of lookup tables of size $O(kb^{2})$ bits. The proposed technique is cost efficient and leads to the design of FFT architectures of high speedup and high sustained throughput.   相似文献   

15.
随着信号处理性能需求不断提高,多核DSP软件开发是一个重要发展趋势。指令并行、多核并行处理、计算和传输并行等都是提高处理性能的方法。多核DSP处理器多级存储器中,越靠近内核存储器容量越少。在大数据量处理中需要相应的大存储器容量,无法直接将任务分配到各个处理器内核。针对这一问题,探讨了基于8核处理器的并行任务分配,并根据多核DSP的架构,采用两级乒乓的方式来实现大点数FFT的设计。该设计采用DMA方式实现了处理和传输并行,提高了处理性能。  相似文献   

16.
介绍了逆变电源并联的原理、技术要求和特点,概述了逆变电源并联技术的现状和意义,分析了环流的产生,对当前采用的逆变电源并联方案进行了总结和分类,指出了逆变电源并联技术发展的趋势。简要地介绍了TMS320LF_2407A数字信号处理器芯片的特点,阐明了系统的软、硬件结构和工作原理。仿真结果表明,该系统能够达到比较理想的控制效果。  相似文献   

17.
In this paper we consider the l 1-compressive sensing problem. We propose an algorithm specifically designed to take advantage of shared memory, vectorized, parallel and many-core microprocessors such as the Cell processor, new generation Graphics Processing Units (GPUs) and standard vectorized multi-core processors (e.g. quad-core CPUs). Besides its implementation is easy. We also give evidence of the efficiency of our approach and compare the algorithm on the three platforms, thus exhibiting pros and cons for each of them.  相似文献   

18.
In this paper we propose a technique to implement in a parallel fashion a turbo decoder based on an arbitrary permutation, and to expand its interleaver in order to produce a family of prunable S-random interleavers suitable for parallel implementations. We show that the spread properties of the obtained interleavers are almost optimal and we prove by simulation that they are very competitive in terms of error floor performance. A few details on the decoder architecture are also provided  相似文献   

19.
以并联有源电力滤波器为研究对象,并对其拓扑结构、补偿分量的检测算法、控制策略等问题做了较系统的研究,在该基础上介绍一种基于DSP的并联型有源电力滤波器的设计。通过仿真实验对有源电力滤波器数学模型、检测算法及控制策略的有效性和实用性进行了验证。结果表明所设计的有源滤波器具有良好的谐波补偿特性、自适应补偿能力。  相似文献   

20.
为提高视频去雾的实时性,结合DSP芯片DM6437的硬件特性,提出了一种基于暗通道模型的去雾并行处理方法。采用乒乓式DMA传输方法优化了暗通道处理的并行数据访问和运算效率,综合运用单指令多数据、数组扩展、查表法等方法建立了环境光图生成、像素校正等核心处理的并行计算模型,重点改进了一种均值滤波算法,其平均处理时间约4ms。最终对分辨率 的视频实现了11.4ms/帧的实时去雾处理,并保证了良好的去雾质量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号