共查询到20条相似文献,搜索用时 15 毫秒
1.
Increasing acceptance of the necessity for high-order parallelism in order to progress digital processing still leaves open the large question of what machine architectures are best for which class of problem. To help answer this, we are investigating and comparing the use of both SIMD and MIMD architectures for programmable processing in real-time systems. A distributed array machine, Mil-DAP (derived from the original ICL DAP) has been developed and benchmarked on radar, image processing, and on terrain modelling problems. Multi-transputer arrays have been applied to an overlapping set of problems in image processing, FFT and terrain-based computation. The results are compared and preliminary conclusions drawn. 相似文献
2.
A digital filter and detection chip (FAD) has been developed by British Telecom in the UK for application as a tone detector in digital telephone exchanges. The chip has a flexible structure and has since found application in many other communication systems. This paper describes the chip and indicates how it can be used to perform a variety of different algorithms for these purposes. 相似文献
3.
A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example. 相似文献
4.
The use of multiprocessors for discrete event simulation is an active research area where work has focused on strategies for
model execution with little regard for the underlying formalism in which models may be expressed. However, a formalism-based
approach offers several advantages including the ability to migrate models from sequential to parallel platforms and the ability
to calibrate simulation architectures to model structural properties. In this article, we extend the DEVS (discrete event
system specification) formalism, originally developed for sequential simulation, to accommodate the full potential of parallel
processing. The extension facilitates exploitation of both internal and external event parallelism manifested in hierarchical,
modular DEVS models. After developing a mapping of the extended formalism to parallel architectures, we describe an implementation
of the approach on a massively parallel architecture, the Connection Machine. Execution results are discussed for a class
of models exhibiting high external and internal event parallelism, the so-called broadcast models. These verify the tenets
of the underlying theory and demonstrate that significant reduction in execution time is possible compared to the same model
executed in serial simulation. 相似文献
5.
This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献
6.
We present a new method for predicting RNA secondary structure based on a genetic algorithm. The algorithm is designed to run on a massively parallel SIMD computer. Statistical analysis shows that the program performs well when compared to a dynamic programming algorithm used to solve the same problem. The program has also pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm that sometimes causes it not to find the optimal secondary structure. 相似文献
7.
Digital signal processing using special processors seems to have an important future. The TMS320 is one of the best known of such processors. Experiences of building and using TMS320 development tools — a simulator and a ROM emulator — in a university environment are described. Conclusions drawn from this work are presented and possible further avenues of exploration are indicated. 相似文献
8.
This paper presents the implementation of two connected component labelling algorithms on the BLITZEN massively parallel processor that was developed recently for NASA. The topology of BLITZEN is a two-dimensional mesh that can be dynamically configured to also support diagonal data transfers. It is shown that an algorithm based on Levialdi's connected component shrinking process performs much better than a straightforward algorithm for connected component labelling. 相似文献
9.
The development and implementation of systems for the more complex realtime image processing and scene understanding tasks, such as robot vision and remote surveillance, calls for faster computation than that possible using the traditional serial computer. The advent of VLSI has made feasible the consideration of more specialized processing architectures, designed to support these datarates, while keeping systems compact and relatively cheap. Two approaches are discussed: the use of a programmable processor array, and the customizing of image processing algorithms in silicon. This paper examines designs based upon each approach in the light of the techniques and constraints of VLSI. In particular we describe in some detail an example of a VLSI parallel array processor, the Grid (GEC rectangular image and data processor), and a number of special-purpose CMOS/SOS chips based on systolic design techniques. 相似文献
10.
Test generation for combinational circuits is an important step in the VLSI design process. Unfortunately, the problem is highly computation-intensive and for circuits encountered in practice, test generation time can often be enormous. In this paper, we present a parallel formulation of the backtrack search algorithm called PODEM, which is a highly used algorithm for this problem. It is known that the sequential PODEM algorithm consumes most of its execution time in generating tests for ‘hard-to-detect’ (HTD) faults and is often unable to detect them even after a large number of backtracks. Our parallel formulation overcomes these limitations by dividing the search space and searching it concurrently using multiple processes. We present a number of experimental results and show that these match our theoretical results presented elsewhere. We show that the search efficiency of the parallel algorithm improves and even beats that of the sequential algorithm as the ‘hardness’ of a fault increases. We present speedup results and performance analyses of our formulation on a 128 processor Symult s2010 multicomputer. We also present preliminary results on a network of Sun workstations. Our results show that parallel search techniques provides good speedups as well as high fault coverage of the HTD faults in reasonable time when compared to the uniprocessor implementation. Our experimental validation of most of our theoretical results builds confidence in the following theoretical prediction: our parallel formulation of PODEM is highly scalable on a variety of commercially-available, large MIMD parallel processors (in additions to the ones with which we experimented). 相似文献
11.
This paper presents an investigation into the real-time performance of parallel architectures in signal-processing and control applications. Several algorithms of regular and irregular nature are implemented on a number of architectures. Hardware and software resources, and the capabilities of the architectures and characteristics of the algorithms are considered for suitable matching between the algorithms and the architectures. The partitioning and mapping of the algorithms on the architectures and inter-processor communication techniques are investigated. Finally, a comparison of the results of various implementations is made to establish the merits of the design and development of parallel architectures for real-time signal-processing and control applications. 相似文献
12.
A large-scale structural optimization of an electronics package has been completed using a massively parallel structural dynamics code. The optimization goals were to maximize safety margins for stress and acceleration resulting from transient impulse loads, while remaining within strict mass limits. The optimization process utilized nongradient, gradient, and approximate optimization methods in succession to modify shell thickness and foam density values within the electronics package. This combination of optimization methods was successful in improving the performance from an infeasible design that violated response allowables by a factor of two to a completely feasible design with positive design margins, while remaining within the mass limits. In addition, a tradeoff curve of mass versus safety margin was developed to facilitate the design decision process. These studies employed the ASCI Red supercomputer and used multiple levels of parallelism on up to 2560 processors. In total, a series of calculations were performed on ASCI Red in five days, where an equivalent calculation on a single desktop computer would have taken greater than 12 years to complete. This paper conveys the approaches, results, and lessons learnt from this large-scale production design application. 相似文献
14.
s and t within a given planar figure F is considered. The approach contains basic methodology developed for any parallel or distributed system. The 2D scene or
the edge of F are represented in the n Cartesian coordinate system ( n-CCS). Several algorithms for the shortest path are given, each one to be applied in specified circumstances depending on
the exact machine model or on additional information concerning geometrical properties of the figure. If these algorithms
are implemented in a parallel depth search machine (PDSM), then the shortest path can be computed in time O(1). The maximum
number of processors used is 0( n). The given methodology can also be adapted for producing an approximate solution when the shortest path is approximated
by polygonal lines. 相似文献
15.
Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called the bisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, called fault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures. 相似文献
16.
We consider two general precedence-constrained scheduling problems that have wide applicability in the areas of parallel processing, high performance compiling, and digital system synthesis. These problems are intractable so it is important to be able to compute tight bounds on their solutions. A tight lower bound on makespan scheduling can be obtained by replacing precedence constraints with release and due dates, giving a problem that can be efficiently solved. We demonstrate that recursively applying this approach yields a bound that is provably tighter than other known bounds, and experimentally shown to achieve the optimal value at least 90.3% of the time over a synthetic benchmark.We compute the best known lower bound on weighted completion time scheduling by applying the recent discovery of a new algorithm for solving a related scheduling problem. Experiments show that this bound significantly outperforms the linear programming-based bound. We have therefore demonstrated that combinatorial algorithms can be a valuable alternative to linear programming for computing tight bounds on large scheduling problems. 相似文献
17.
本文以连续波多普勒无线电引信为研究背景,建立目标信号数学模型,针对目前炸高控制方法及存在问题,提出了一种新的多普勒脉宽技术测高方法,在此基础上增加一个近表面炸的性能。并介绍了采用可编程逻辑器件实现的方法以及计算机仿真。实践证明,对Doppler信号的数字信息处理,用一片CPLD实现,提高引信信号处理的速度,并极大提高了系统的可靠性。 相似文献
18.
The paper analyzes and selects an appropriate interconnection network for a compliant multiprocessor. The multiprocessor is compliant to the tasks assigned to it in the sense that it can be reconfigured to provide a more efficient fit to the tasks to be executed. A number of possible candidate networks for the multiprocessor is considered: Omega, ADM, Hypercube and Torus. The potential applicability of these networks to the multiprocessor is analyzed from the points of view of partitionability, inter-PE delay, fault impact, and cost. After the individual analysts of the above points of consideration is completed, a weighted network factor is formed, and the optimal type of network is selected, under different performance criteria. The overall results point to the selection of the Torus or Hypercube network for most cases under consideration. 相似文献
19.
We propose a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models. The basic model captures locality among processors, which is measured as a function of two parameters; latency and bandwidth. We design YPRAM algorithms for solving several fundamental problems: parallel prefix, sorting, sorting numbers from a bounded range, and list ranking. We show that our model predicts, reasonably accurately, the actual known performances of several basic parallel models — PRAM, hypercube, mesh and tree — when solving these problems. 相似文献
20.
提出了一种基于FPGA的雷达回波实时模拟器的实现方法.该模拟器采用cPCI标准总线,以FPGA为核心计算单元,配有高速数模、模数转换模块,可实现雷达回波信号实时在线注入模拟.该模拟器可实现多种体制下复杂回波的模拟,具有很好的工程应用价值. 相似文献
|