首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Increasing acceptance of the necessity for high-order parallelism in order to progress digital processing still leaves open the large question of what machine architectures are best for which class of problem.

To help answer this, we are investigating and comparing the use of both SIMD and MIMD architectures for programmable processing in real-time systems. A distributed array machine, Mil-DAP (derived from the original ICL DAP) has been developed and benchmarked on radar, image processing, and on terrain modelling problems. Multi-transputer arrays have been applied to an overlapping set of problems in image processing, FFT and terrain-based computation.

The results are compared and preliminary conclusions drawn.  相似文献   


2.
A digital filter and detection chip (FAD) has been developed by British Telecom in the UK for application as a tone detector in digital telephone exchanges. The chip has a flexible structure and has since found application in many other communication systems. This paper describes the chip and indicates how it can be used to perform a variety of different algorithms for these purposes.  相似文献   

3.
A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example.  相似文献   

4.
The use of multiprocessors for discrete event simulation is an active research area where work has focused on strategies for model execution with little regard for the underlying formalism in which models may be expressed. However, a formalism-based approach offers several advantages including the ability to migrate models from sequential to parallel platforms and the ability to calibrate simulation architectures to model structural properties. In this article, we extend the DEVS (discrete event system specification) formalism, originally developed for sequential simulation, to accommodate the full potential of parallel processing. The extension facilitates exploitation of both internal and external event parallelism manifested in hierarchical, modular DEVS models. After developing a mapping of the extended formalism to parallel architectures, we describe an implementation of the approach on a massively parallel architecture, the Connection Machine. Execution results are discussed for a class of models exhibiting high external and internal event parallelism, the so-called broadcast models. These verify the tenets of the underlying theory and demonstrate that significant reduction in execution time is possible compared to the same model executed in serial simulation.  相似文献   

5.
This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201.  相似文献   

6.
We present a new method for predicting RNA secondary structure based on a genetic algorithm. The algorithm is designed to run on a massively parallel SIMD computer. Statistical analysis shows that the program performs well when compared to a dynamic programming algorithm used to solve the same problem. The program has also pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm that sometimes causes it not to find the optimal secondary structure.  相似文献   

7.
This paper presents the implementation of two connected component labelling algorithms on the BLITZEN massively parallel processor that was developed recently for NASA. The topology of BLITZEN is a two-dimensional mesh that can be dynamically configured to also support diagonal data transfers. It is shown that an algorithm based on Levialdi's connected component shrinking process performs much better than a straightforward algorithm for connected component labelling.  相似文献   

8.
Digital signal processing using special processors seems to have an important future. The TMS320 is one of the best known of such processors. Experiences of building and using TMS320 development tools — a simulator and a ROM emulator — in a university environment are described. Conclusions drawn from this work are presented and possible further avenues of exploration are indicated.  相似文献   

9.
The development and implementation of systems for the more complex realtime image processing and scene understanding tasks, such as robot vision and remote surveillance, calls for faster computation than that possible using the traditional serial computer. The advent of VLSI has made feasible the consideration of more specialized processing architectures, designed to support these datarates, while keeping systems compact and relatively cheap. Two approaches are discussed: the use of a programmable processor array, and the customizing of image processing algorithms in silicon. This paper examines designs based upon each approach in the light of the techniques and constraints of VLSI. In particular we describe in some detail an example of a VLSI parallel array processor, the Grid (GEC rectangular image and data processor), and a number of special-purpose CMOS/SOS chips based on systolic design techniques.  相似文献   

10.
Test generation for combinational circuits is an important step in the VLSI design process. Unfortunately, the problem is highly computation-intensive and for circuits encountered in practice, test generation time can often be enormous. In this paper, we present a parallel formulation of the backtrack search algorithm called PODEM, which is a highly used algorithm for this problem. It is known that the sequential PODEM algorithm consumes most of its execution time in generating tests for ‘hard-to-detect’ (HTD) faults and is often unable to detect them even after a large number of backtracks. Our parallel formulation overcomes these limitations by dividing the search space and searching it concurrently using multiple processes.

We present a number of experimental results and show that these match our theoretical results presented elsewhere. We show that the search efficiency of the parallel algorithm improves and even beats that of the sequential algorithm as the ‘hardness’ of a fault increases. We present speedup results and performance analyses of our formulation on a 128 processor Symult s2010 multicomputer. We also present preliminary results on a network of Sun workstations. Our results show that parallel search techniques provides good speedups as well as high fault coverage of the HTD faults in reasonable time when compared to the uniprocessor implementation. Our experimental validation of most of our theoretical results builds confidence in the following theoretical prediction: our parallel formulation of PODEM is highly scalable on a variety of commercially-available, large MIMD parallel processors (in additions to the ones with which we experimented).  相似文献   


11.
《Parallel Computing》1997,23(9):1365-1377
A finite element fluid analysis code, which is based on the matrix-storage free formulation and the element-by-element computation strategy, is developed. The code has reduced memory requirements due to the matrix-storage free formulation. Simulations involving one million elements can be carried out with less than 208 Mbytes of memory. The code is implemented on the massively parallel computers, KSR1 and CRAY T3D. In the case of KSR1, high parallel efficiency is achieved, i.e. 95.9% with 16 CPUs. In the case of T3D, excellent scalability is achieved. Each time step of a 3D cavity flow problem with one million elements required 36.3, 18.7 and 9.8 s of CPU time by using 32, 64 and 128 processors, respectively.  相似文献   

12.
This paper presents an investigation into the real-time performance of parallel architectures in signal-processing and control applications. Several algorithms of regular and irregular nature are implemented on a number of architectures. Hardware and software resources, and the capabilities of the architectures and characteristics of the algorithms are considered for suitable matching between the algorithms and the architectures. The partitioning and mapping of the algorithms on the architectures and inter-processor communication techniques are investigated. Finally, a comparison of the results of various implementations is made to establish the merits of the design and development of parallel architectures for real-time signal-processing and control applications.  相似文献   

13.
A large-scale structural optimization of an electronics package has been completed using a massively parallel structural dynamics code. The optimization goals were to maximize safety margins for stress and acceleration resulting from transient impulse loads, while remaining within strict mass limits. The optimization process utilized nongradient, gradient, and approximate optimization methods in succession to modify shell thickness and foam density values within the electronics package. This combination of optimization methods was successful in improving the performance from an infeasible design that violated response allowables by a factor of two to a completely feasible design with positive design margins, while remaining within the mass limits. In addition, a tradeoff curve of mass versus safety margin was developed to facilitate the design decision process. These studies employed the ASCI Red supercomputer and used multiple levels of parallelism on up to 2560 processors. In total, a series of calculations were performed on ASCI Red in five days, where an equivalent calculation on a single desktop computer would have taken greater than 12 years to complete. This paper conveys the approaches, results, and lessons learnt from this large-scale production design application.  相似文献   

14.
Implementation of GAMMA on a Massively Parallel Computer   总被引:1,自引:0,他引:1       下载免费PDF全文
The GAMMA paradigm is recently proposed by Banatre and Metayer to describe the systematic construction of parallel programs without introducing artificial sequentiality.This paper presents two synchronous execution models for GAMMA and discusses how to implement them on MasPar MP-1,a massively data parallel computer.The results show that GAMMA paradign can be implemented very naturally on data parallel machines,and very high level language,such as GAMMA in which parallelism is left implicit,is suitable for specifying massively parallel applications.  相似文献   

15.
16.
17.
Scientific applications represent a dominant sector of compute‐intensive applications. Using massively parallel processing systems increases the feasibility to automate such applications because of the cooperation among multiple processors to perform the designated task. This paper proposes a parallel hidden Markov model (HMM) algorithm for 3D magnetic resonance image brain segmentation using two approaches. In the first approach, a hierarchical/multilevel parallel technique is used to achieve high performance for the running algorithm. This approach can speed up the computation process up to 7.8× compared with a serial run. The second approach is orthogonal to the first and tries to help in obtaining a minimum error for 3D magnetic resonance image brain segmentation using multiple processes with different randomization paths for cooperative fast minimum error convergence. This approach achieves minimum error level for HMM training not achievable by the serial HMM training on a single node. Then both approaches are combined to achieve both high accuracy and high performance simultaneously. For 768 processing nodes of a Blue Gene system, the combined approach, which uses both methods cooperatively, can achieve high‐accuracy HMM parameters with 98% of the error level and 2.6× speedup compared with the pure accuracy‐oriented approach alone. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
《Parallel Computing》1997,23(8):1005-1019
This paper presents a block variant of the GMRES method for solving general unsymmetric linear systems. This algorithm generates a transformed Hessenberg matrix by solely using block matrix operations and block data communications. It is shown that this algorithm with block size s, denoted by BVGMRES(s, m), is theoretically equivalent to the GMRES(s, m) method. The numerical results demonstrate that this algorithm can be more efficient than the standard GMRES method on a cache based single CPU computer with optimized BLAS kernels. Furthermore, the gain in efficiency is more significant on MPPs due to both efficient block operations and efficient block data communications. Preliminary numerical results on some real-world problems also show that this algorithm may be stable up to some reasonable block size.  相似文献   

19.
目的几何校正(又称地理编码)是合成孔径雷达(SAR)影像处理流程中重要的一个步骤,具有一定的计算复杂度,需要用到几何定位模型。本文针对星载SAR影像,采用有理多项式系数(RPC)定位模型,提出了图形处理器(GPU)支持的几何校正大规模并行处理方法。方法该方法充分利用GPU计算资源强大及几何校正过程中每个像素处理步骤一致的特点,每次导入大量像素至GPU,为每个像素分配一个线程,每个线程执行有理函数计算、投影变换、插值采样等计算复杂度高的步骤,通过优化配置dim Grid和dim Block参数,提升GPU的并行性能。该方法通过分块处理实现SAR影像大幅面处理,且可适用于多个不同分块大小。结果实验结果显示其计算加速比为38 44,为全面客观地分析GPU并行处理的特点,还计算了整体加速比,通过多个实验分析影响整体加速性能的因素,提出大块读写提高I/O性能的优化方法。结论该方法形式简洁,通用性好,可适用于几乎所有的星载SAR影像、不同的影像幅面;且加速性能明显。  相似文献   

20.
s and t within a given planar figure F is considered. The approach contains basic methodology developed for any parallel or distributed system. The 2D scene or the edge of F are represented in the n Cartesian coordinate system (n-CCS). Several algorithms for the shortest path are given, each one to be applied in specified circumstances depending on the exact machine model or on additional information concerning geometrical properties of the figure. If these algorithms are implemented in a parallel depth search machine (PDSM), then the shortest path can be computed in time O(1). The maximum number of processors used is 0(n). The given methodology can also be adapted for producing an approximate solution when the shortest path is approximated by polygonal lines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号