期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers 总被引：3，自引：0，他引：3

Takahashi Daisuke Kanada Yasumasa 《The Journal of supercomputing》2000,15(2):207-228

In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2^p3^q5^r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2. 相似文献

2.

Algorithms for pipeline control

Jürgen Tappe 《Parallel Computing》1984,1(2):185-188

The control of a statically configured pipeline corresponds to certain paths in its state graph. Properties of this graph and algorithms for optimal paths are discussed. 相似文献

3.

Algorithms and timing for identification of objects from 2-D images

W. Lin D. A. Fraser 《Concurrency and Computation》1991,3(4):325-331

相似文献

4.

Algorithms for search trees on message passing architectures

Colbrook A. Brewer E.A. Dellarocas C.N. Weihl W.E. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(2):97-108

In this paper we describe a new algorithm for maintaining a balanced search tree on a message-passing MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2^B-2, 2^B) search tree that uses a bidirectional ring of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs significantly better than top-down search tree algorithms. The bottom-up algorithm requires many fewer messages and results in less blocking due to synchronization than top-down algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described 相似文献

5.

Algorithms and architectures for a class of non-linear hybrid filters

《Computer Vision, Graphics, and Image Processing》1990,49(2):280-281

相似文献

6.

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

Truong Vinh Truong Duy Taisuke Ozaki 《The Journal of supercomputing》2016,72(2):391-416

相似文献

7.

A novel conflict-free parallel memory access scheme for FFT constant geometry architectures

CuiMei Ma He Chen JiYang Yu Teng Long 《中国科学:信息科学(英文版)》2013,56(4):1-9

The challenges imposed by environmental issues, such as global warming and the energy crisis, are demanding more responsible energy usage, including in the optical networking field. In optical transmission networks, most of the electrical power is consumed by the optical-electrical-optical conversion in optical repeaters. Modern optical network control plane technologies allow idle optical repeaters to be put into a low-power sleep mode. Inspired by this, we propose a novel power-efficient routing and wavelength assignment (RWA) algorithm, called HTAPE. The HTAPE algorithm exploits the knowledge of the connection holding times to minimize the number of optical repeaters in the active mode, and hence reduce the total electricity consumption of the optical network. We test the new algorithm on the typical CERNET and USNET networks. Compared with traditional RWA algorithms without holding-time-awareness, it is observed that the HTAPE algorithm yields significant reductions in power consumption. 相似文献

8.

一种新型面积优化的二维IDCT处理器

于宝东邹雪城《微处理机》2005,26(5):86-88

本文提出了一种基于行列分解算法的8×8二维反向离散余弦变换(IDCT)处理器。不再需要传统的为保持输入列向量的输入寄存器和并串转换寄存器,这既减小了芯片面积又减小了处理延时。其中的一维离散余弦变换采用查找表实现,作为查找表的ROM比传统的分布式算法的ROM也小的多。我们提出的二维IDCT处理器不仅具有面积优化、低延时、高吞吐率的特点,并且具有规整的、全流水结构,因此非常适合VLSI和FPGA实现。相似文献

9.

Performance Analysis of FFT Algorithms on Multiprocessor Systems

《IEEE transactions on pattern analysis and machine intelligence》1983,(4):512-521

A decimation-in-time radix-2 fast Fourier transform (FFT) algorithm is considered here for implementation in multiprocessors with shared bus, multistage interconnection network (MIN), and in mesh connected computers. Results are derived for data allocation, interprocessor communication, approximate computation time, and speedup of an N point FFT on any P available processing elements (PE's). Further generalization is obtained for a radix-r FFT algorithm. An N X N point two-dimensional discrete Fourier transform (DFT) implementation is also considered when one or more rows of the input data matrix are allocated to each PE. 相似文献

10.

Efficient 3-D model search and retrieval using generalized 3-D radon transforms 总被引：4，自引：0，他引：4

Daras P. Zarpalas D. Tzovaras D. Strintzis M.G. 《Multimedia, IEEE Transactions on》2006,8(1):101-114

相似文献

11.

Projection-based geometrical feature extraction for computer vision: algorithms in pipeline architectures 总被引：1，自引：0，他引：1

Sanz JL Dinstein I 《IEEE transactions on pattern analysis and machine intelligence》1987,(1):160-168

In this correspondence, some image transforms and features such as projections along linear patterns, convex hull approximations, Hough transform for line detection, diameter, moments, and principal components will be considered. Specifically, we present algorithms for computing these features which are suitable for implementation in image analysis pipeline architectures. In particular, random access memories and other dedicated hardware components which may be found in the implementation of classical techniques are not longer needed in our algorithms. The effectiveness of our approach is demonstrated by running some of the new algorithms in conventional short-pipelines for image analysis. In related papers, we have shown a pipeline architecture organization called PPPE (Parallel Pipeline Projection Engine), which unleashes the power of projection-based computer vision, image processing, and computer graphics. In the present correspondence, we deal with just a few of the many algorithms which can be supported in PPPE. These algorithms illustrate the use of the Radon transform as a tool for image analysis. 相似文献

12.

A framework for rapid evaluation of heterogeneous 3-D NoC architectures

Efstathios Sotiriou-XanthopoulosAuthor Vitae Dionysios DiamantopoulosAuthor VitaeKostas Siozios George EconomakosAuthor VitaeDimitrios SoudrisAuthor Vitae 《Microprocessors and Microsystems》2014

The scalability of communication infrastructure in modern Integrated Circuits (ICs) becomes a challenging issue, which might be a significant bottleneck if not carefully addressed. Towards this direction, the usage of Networks-on-Chip (NoC) is a preferred solution. In this work, we propose a software-supported framework for quantifying the efficiency of heterogeneous 3-D NoC architectures. In contrast to existing approaches for NoC design, the introduced heterogeneous architecture consists of a mixture of 2-D and 3-D routers, which reduces the delay and power consumption with a slight impact on packet hops. More specifically, the experimental results with a number of DSP applications show the effectiveness of the introduced methodology, as we achieve on average 25% higher maximum operation frequency and 39% lower power consumption compared to the uniform 3-D NoCs. 相似文献

13.

基于软硬件的协同支持在众核上对1-DFFT算法的优化研究 总被引：2，自引：0，他引：2

周永彬张军超张帅张浩《计算机学报》2008,31(11)

随着高性能计算需求的日益增加,片上众核(many-core)处理器成为未来处理器架构的发展方向.快速傅立叶变换(FFT)作为高性能计算中的重要应用,对计算能力和通信带宽都有较高的要求.因此基于众核处理器平台,实现高效、可扩展的FFT算法是算法和体系结构设计者共同面临的挑战.文中在众核处理器Godson-T平台上对1-D FFT算法进行了优化和评估,在节省几乎三分之一L2 Cache存储开销的情况下,通过隐藏矩阵转置,计算与通信重叠等优化策略,使得优化后的1-D FFT算法达到3倍以上的性能提升.并通过片上网络拥塞状况的实验分析,发现对于像FFT这样访存带宽受限的应用,增加L2 Cache的访问带宽,可以缓解因为爆发式读写带给片上网络和L2 Cache的压力,进一步提高程序的性能和扩展性. 相似文献

14.

Designing OP2 for GPU architectures

M.B. Giles G.R. Mudalige B. Spencer C. Bertolli I. Reguly 《Journal of Parallel and Distributed Computing》2013

OP2 is an “active” library framework for the solution of unstructured mesh applications. It aims to decouple the specification of a scientific application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the current OP2 library for generating efficient code targeting contemporary GPU platforms. In this we focus on some of the software architecture design choices and low-level optimizations to maximize performance on NVIDIA’s Fermi architecture GPUs. The performance impact of these design choices is quantified on two NVIDIA GPUs (GTX560Ti, Tesla C2070) using the end-to-end performance of an industrial representative CFD application developed using the OP2 API. Results show that for each system, a number of key configuration parameters need to be set carefully in order to gain good performance. Utilizing a recently developed auto-tuning framework, we explore the effect of these parameters, their limitations and insights into optimizations for improved performance. 相似文献

15.

二维环/双环互连Petersen图网络及其路由算法 总被引：4，自引：1，他引：4

王雷林亚平陈治平文学《计算机学报》2004,27(9):1290-1296

基于双环结构提出了一种Petersen图的新扩展方法 ,并在此基础上构造了一个 2维双环互连Petersen图网络DCP(k) .分析了 2维环互连Petersen图网络TCP(k)的特性 ,给出了TCP(k)优于 2 DTorus互联网络的直径及可分组性的条件 .证明了DCP(k)和TCP(k)具有良好的可扩性和连接度 ;而且对 10×k个节点组成的互联网络 ,DCP(k)和TCP(k)均具有比RP(k)及 2 DTorus互联网络更小的直径和更优越的可分组性 .最后 ,分别设计了DCP(k)和TCP(k)上的单播和广播路由算法 ,证明了其通信效率较RP(k)上的对应算法均分别有明显提高 ,且DCP(k)更优于TCP(k) . 相似文献

16.

Analysis of parallel algorithms using pipeline architectures in computer vision applications

Amelia Fong Lochovsky 《Annals of Mathematics and Artificial Intelligence》1991,4(1-2):177-209

Recently a number of machine vision systems have been successfully implemented using pipeline architectures and various new algorithms have been proposed. In this paper we propose a method of analysis of both time complexity and space complexity for algorithms using conventional general purpose pipeline architectures. We illustrate our method by applying it to an algorithm schema for local window operations satisfying a property we define as decomposability. It is shown that the proposed algorithm schema and its analysis generalize previous published results. We further analyse algorithms implementing operators that are not decomposable. In particular the complexities of several median-type operations are compared and the implication on algorithm choice is discussed. We conclude with discussions on space-time trade-offs and implementation issues.This research was partially supported by a grant from the Natural Science and Engineering Research Council of Canada. Part of this work was done while the author was at the University of Guelph, Guelph, Ontario, Canada. 相似文献

17.

基于光纤传感和小波变换的管道泄漏定位技术 总被引：2，自引：0，他引：2

赵红杭利军李港《传感器与微系统》2009,28(9)

研究了一种基于Sagnac光纤干涉仪原理的分布式光纤管道泄漏检测和定位技术,利用单模光纤作为分布式传感器,可以有效地检测管道沿线所发生的泄漏信息,并实现泄漏点定位。阐述了检测系统的组成和工作原理,分析了检测系统定位方法。采用小波阈值消噪对泄漏信号进行处理,可以有效地提高零点频率的辨识性。结果表明:该检测技术具有较高的测试灵敏度和定位精度。相似文献

18.

A Galerkin Method for the Simulation of the Transient 2-D/2-D and 3-D/3-D Linear Boltzmann Equation

Matthias K. Gobbert Samuel G. Webster Timothy S. Cale 《Journal of scientific computing》2007,30(2):237-273

Many production steps used in the manufacturing of integrated circuits involve the deposition of material from the gas phase onto wafers. Models for these processes should account for gaseous transport in a range of flow regimes, from continuum flow to free molecular or Knudsen flow, and for chemical reactions at the wafer surface. We develop a kinetic transport and reaction model whose mathematical representation is a system of transient linear Boltzmann equations. In addition to time, a deterministic numerical solution of this system of kinetic equations requires the discretization of both position and velocity spaces, each two-dimensional for 2-D/2-D or each three-dimensional for 3-D/3-D simulations. Discretizing the velocity space by a spectral Galerkin method approximates each Boltzmann equation by a system of transient linear hyperbolic conservation laws. The classical choice of basis functions based on Hermite polynomials leads to dense coefficient matrices in this system. We use a collocation basis instead that directly yields diagonal coefficient matrices, allowing for more convenient simulations in higher dimensions. The systems of conservation laws are solved using the discontinuous Galerkin finite element method. First, we simulate chemical vapor deposition in both two and three dimensions in typical micron scale features as application example. Second, stability and convergence of the numerical method are demonstrated numerically in two and three dimensions. Third, we present parallel performance results which indicate that the implementation of the method possesses very good scalability on a distributed-memory cluster with a high-performance Myrinet interconnect. 相似文献

19.

Plan-based boundary extraction and 3-D reconstruction for orthogonal 2-D echocardiography

《Pattern recognition》1987,20(2):155-162

This paper describes an automatic boundary extraction method from 2-D echocardiograms and 3-D reconstruction based on the extracted boundary lines. We use an orthogonal 2-D echocardiography which has two probes with different frequencies and can take two sectional cardiac ultrasound images simultaneously. When we extract the boundary lines, we use plans to reduce the processing time and to interpolate the boundary of no echo area. 相似文献

20.

A Hybrid Hyperquadric Model for 2-D and 3-D Data Fitting

Issac Cohen Laurent D. Cohen 《Computer Vision and Image Understanding》1996,63(3):527-541

相似文献