期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating parallel processors for real-time applications

J. B. G. Roberts J. G. Harp B. C. Merrifield K. J. Palmer P. Simpson J. S. Ward H. C. Webber 《Parallel Computing》1988,8(1-3):245-254

Increasing acceptance of the necessity for high-order parallelism in order to progress digital processing still leaves open the large question of what machine architectures are best for which class of problem.

To help answer this, we are investigating and comparing the use of both SIMD and MIMD architectures for programmable processing in real-time systems. A distributed array machine, Mil-DAP (derived from the original ICL DAP) has been developed and benchmarked on radar, image processing, and on terrain modelling problems. Multi-transputer arrays have been applied to an overlapping set of problems in image processing, FFT and terrain-based computation.

The results are compared and preliminary conclusions drawn. 相似文献

2.

FAD-flexibility in digital signal processing

Paul Challener 《Microprocessors and Microsystems》1983,7(10):475-481

A digital filter and detection chip (FAD) has been developed by British Telecom in the UK for application as a tone detector in digital telephone exchanges. The chip has a flexible structure and has since found application in many other communication systems. This paper describes the chip and indicates how it can be used to perform a variety of different algorithms for these purposes. 相似文献

3.

Realtime digital signal processing system using a parallel processing architecture

PC Ching SW Wu 《Microprocessors and Microsystems》1989,13(10):653-658

A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example. 相似文献

4.

Extending the DEVS formalism for massively parallel simulation

Yung-Hsin Wang Bernard P. Zeigler 《Discrete Event Dynamic Systems》1993,3(2-3):193-218

The use of multiprocessors for discrete event simulation is an active research area where work has focused on strategies for model execution with little regard for the underlying formalism in which models may be expressed. However, a formalism-based approach offers several advantages including the ability to migrate models from sequential to parallel platforms and the ability to calibrate simulation architectures to model structural properties. In this article, we extend the DEVS (discrete event system specification) formalism, originally developed for sequential simulation, to accommodate the full potential of parallel processing. The extension facilitates exploitation of both internal and external event parallelism manifested in hierarchical, modular DEVS models. After developing a mapping of the extended formalism to parallel architectures, we describe an implementation of the approach on a massively parallel architecture, the Connection Machine. Execution results are discussed for a class of models exhibiting high external and internal event parallelism, the so-called broadcast models. These verify the tenets of the underlying theory and demonstrate that significant reduction in execution time is possible compared to the same model executed in serial simulation. 相似文献

5.

An efficient implementation of parallel eigenvalue computation for massively parallel processing 总被引：4，自引：0，他引：4

Takahiro Katagiri Yasumasa Kanada 《Parallel Computing》2001,27(14):1831-1845

This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献

6.

A massively parallel genetic algorithm for RNA secondary structure prediction 总被引：4，自引：0，他引：4

Bruce A. Shapiro Joseph Navetta 《The Journal of supercomputing》1994,8(3):195-207

We present a new method for predicting RNA secondary structure based on a genetic algorithm. The algorithm is designed to run on a massively parallel SIMD computer. Statistical analysis shows that the program performs well when compared to a dynamic programming algorithm used to solve the same problem. The program has also pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm that sometimes causes it not to find the optimal secondary structure. 相似文献

7.

TMS320 digital signal processor development system

Jim Chance 《Microprocessors and Microsystems》1985,9(2):50-56

Digital signal processing using special processors seems to have an important future. The TMS320 is one of the best known of such processors. Experiences of building and using TMS320 development tools — a simulator and a ROM emulator — in a university environment are described. Conclusions drawn from this work are presented and possible further avenues of exploration are indicated. 相似文献

8.

Connected component labelling on the BLITZEN massively parallel processor

Sotirios G Ziavras 《Image and vision computing》1993,11(10):665-668

This paper presents the implementation of two connected component labelling algorithms on the BLITZEN massively parallel processor that was developed recently for NASA. The topology of BLITZEN is a two-dimensional mesh that can be dynamically configured to also support diagonal data transfers. It is shown that an algorithm based on Levialdi's connected component shrinking process performs much better than a straightforward algorithm for connected component labelling. 相似文献

9.

Image processing with VLSI

AG Corry DK Arvind GLS Connolly RR Korya IN Parker 《Microprocessors and Microsystems》1983,7(10):482-486

The development and implementation of systems for the more complex realtime image processing and scene understanding tasks, such as robot vision and remote surveillance, calls for faster computation than that possible using the traditional serial computer. The advent of VLSI has made feasible the consideration of more specialized processing architectures, designed to support these datarates, while keeping systems compact and relatively cheap. Two approaches are discussed: the use of a programmable processor array, and the customizing of image processing algorithms in silicon. This paper examines designs based upon each approach in the light of the techniques and constraints of VLSI. In particular we describe in some detail an example of a VLSI parallel array processor, the Grid (GEC rectangular image and data processor), and a number of special-purpose CMOS/SOS chips based on systolic design techniques. 相似文献

10.

Automatic test pattern generation on parallel processors

Sunil Arvindam Vipin Kumar V. Nageshwara Rao Vineet Singh 《Parallel Computing》1991,17(12):1323-1342

Test generation for combinational circuits is an important step in the VLSI design process. Unfortunately, the problem is highly computation-intensive and for circuits encountered in practice, test generation time can often be enormous. In this paper, we present a parallel formulation of the backtrack search algorithm called PODEM, which is a highly used algorithm for this problem. It is known that the sequential PODEM algorithm consumes most of its execution time in generating tests for ‘hard-to-detect’ (HTD) faults and is often unable to detect them even after a large number of backtracks. Our parallel formulation overcomes these limitations by dividing the search space and searching it concurrently using multiple processes.

We present a number of experimental results and show that these match our theoretical results presented elsewhere. We show that the search efficiency of the parallel algorithm improves and even beats that of the sequential algorithm as the ‘hardness’ of a fault increases. We present speedup results and performance analyses of our formulation on a 128 processor Symult s2010 multicomputer. We also present preliminary results on a network of Sun workstations. Our results show that parallel search techniques provides good speedups as well as high fault coverage of the HTD faults in reasonable time when compared to the uniprocessor implementation. Our experimental validation of most of our theoretical results builds confidence in the following theoretical prediction: our parallel formulation of PODEM is highly scalable on a variety of commercially-available, large MIMD parallel processors (in additions to the ones with which we experimented). 相似文献

11.

Homogeneous and heterogeneous parallel architectures in real-time signal processing and control

M. O. Tokhi M. A. Hossain 《Control Engineering Practice》1995,3(12):1675-1686

This paper presents an investigation into the real-time performance of parallel architectures in signal-processing and control applications. Several algorithms of regular and irregular nature are implemented on a number of architectures. Hardware and software resources, and the capabilities of the architectures and characteristics of the algorithms are considered for suitable matching between the algorithms and the architectures. The partitioning and mapping of the algorithms on the architectures and inter-processor communication techniques are investigated. Finally, a comparison of the results of various implementations is made to establish the merits of the design and development of parallel architectures for real-time signal-processing and control applications. 相似文献

12.

Multilevel parallel optimization using massively parallel structural dynamics

M.S. Eldred A.A. Giunta B.G. van Bloemen Waanders 《Structural and Multidisciplinary Optimization》2004,27(1-2):97-109

A large-scale structural optimization of an electronics package has been completed using a massively parallel structural dynamics code. The optimization goals were to maximize safety margins for stress and acceleration resulting from transient impulse loads, while remaining within strict mass limits. The optimization process utilized nongradient, gradient, and approximate optimization methods in succession to modify shell thickness and foam density values within the electronics package. This combination of optimization methods was successful in improving the performance from an infeasible design that violated response allowables by a factor of two to a completely feasible design with positive design margins, while remaining within the mass limits. In addition, a tradeoff curve of mass versus safety margin was developed to facilitate the design decision process. These studies employed the ASCI Red supercomputer and used multiple levels of parallelism on up to 2560 processors. In total, a series of calculations were performed on ASCI Red in five days, where an equivalent calculation on a single desktop computer would have taken greater than 12 years to complete. This paper conveys the approaches, results, and lessons learnt from this large-scale production design application. 相似文献

13.

Representing matrices as quadtrees for parallel processors

David S. Wise 《Information Processing Letters》1985,20(4):195-199

相似文献

14.

Fast massively parallel algorithms for shortest path within planar figures

Adam Kapralski 《The Visual computer》1996,12(10):484-502

s and t within a given planar figure F is considered. The approach contains basic methodology developed for any parallel or distributed system. The 2D scene or the edge of F are represented in the n Cartesian coordinate system (n-CCS). Several algorithms for the shortest path are given, each one to be applied in specified circumstances depending on the exact machine model or on additional information concerning geometrical properties of the figure. If these algorithms are implemented in a parallel depth search machine (PDSM), then the shortest path can be computed in time O(1). The maximum number of processors used is 0(n). The given methodology can also be adapted for producing an approximate solution when the shortest path is approximated by polygonal lines. 相似文献

15.

A massively parallel fault-tolerant architecture for time-critical computing

Ishfaq Ahmad 《The Journal of supercomputing》1995,9(1-2):135-162

Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called thebisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, calledfault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures. 相似文献

16.

Lower bounds on precedence-constrained scheduling for parallel processors

Ivan D. BaevWaleed M. Meleis Alexandre Eichenberger 《Information Processing Letters》2002,83(1):27-32

We consider two general precedence-constrained scheduling problems that have wide applicability in the areas of parallel processing, high performance compiling, and digital system synthesis. These problems are intractable so it is important to be able to compute tight bounds on their solutions. A tight lower bound on makespan scheduling can be obtained by replacing precedence constraints with release and due dates, giving a problem that can be efficiently solved. We demonstrate that recursively applying this approach yields a bound that is provably tighter than other known bounds, and experimentally shown to achieve the optimal value at least 90.3% of the time over a synthetic benchmark.We compute the best known lower bound on weighted completion time scheduling by applying the recent discovery of a new algorithm for solving a related scheduling problem. Experiments show that this bound significantly outperforms the linear programming-based bound. We have therefore demonstrated that combinatorial algorithms can be a valuable alternative to linear programming for computing tight bounds on large scheduling problems. 相似文献

17.

用MAX7000A实现多普勒信号数字信号处理的分析与线路设计

张砚春施聚生《微计算机信息》2000,16(6):68-69

本文以连续波多普勒无线电引信为研究背景,建立目标信号数学模型,针对目前炸高控制方法及存在问题,提出了一种新的多普勒脉宽技术测高方法,在此基础上增加一个近表面炸的性能。并介绍了采用可编程逻辑器件实现的方法以及计算机仿真。实践证明,对Ｄｏｐｐｌｅｒ信号的数字信息处理,用一片ＣＰＬＤ实现,提高引信信号处理的速度,并极大提高了系统的可靠性。相似文献

18.

Interconnection network analysis for a compliant massively parallel processor

D. Bryan Perdue Daniel Tabak 《Journal of Systems Architecture》1997,42(9-10):665-678

The paper analyzes and selects an appropriate interconnection network for a compliant multiprocessor. The multiprocessor is compliant to the tasks assigned to it in the sense that it can be reconfigured to provide a more efficient fit to the tasks to be executed. A number of possible candidate networks for the multiprocessor is considered: Omega, ADM, Hypercube and Torus. The potential applicability of these networks to the multiprocessor is analyzed from the points of view of partitionability, inter-PE delay, fault impact, and cost. After the individual analysts of the above points of consideration is completed, a weighted network factor is formed, and the optimal type of network is selected, under different performance criteria. The overall results point to the selection of the Torus or Hypercube network for most cases under consideration. 相似文献

19.

Towards a single model of efficient computation in real parallel machines

Pilar de la Torre Clyde P Kruskal 《Future Generation Computer Systems》1992,8(4):395-408

We propose a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models. The basic model captures locality among processors, which is measured as a function of two parameters; latency and bandwidth.

We design YPRAM algorithms for solving several fundamental problems: parallel prefix, sorting, sorting numbers from a bounded range, and list ranking. We show that our model predicts, reasonably accurately, the actual known performances of several basic parallel models — PRAM, hypercube, mesh and tree — when solving these problems. 相似文献

20.

一种基于FPGA的通用雷达回波实时模拟器 总被引：1，自引：0，他引：1

孟庆虎陶青长梁志恒朱宁《电子技术应用》2012,38(3):82-84,87

提出了一种基于FPGA的雷达回波实时模拟器的实现方法.该模拟器采用cPCI标准总线,以FPGA为核心计算单元,配有高速数模、模数转换模块,可实现雷达回波信号实时在线注入模拟.该模拟器可实现多种体制下复杂回波的模拟,具有很好的工程应用价值. 相似文献