首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Summary We prove that (1+6/2)n 2.22n is a time lower bound independent of indexing schemes for sorting n2 items on an n × n mesh-connected processor array. We distinguish between indexing schemes by showing that there exists an indexing scheme which is provably worse than the snake-like row-major indexing for sorting. We also derive lower bounds for various indexing schemes. All these results are obtained by using the chain argument which we provide in this paper.A preliminary version of this paper was presented on the 3rd International Workshop on Parallel Computation and VLSI Theory, Corfu Island, Greece (June–July, 1988). Lect. Notes Comput. Sci. 319, 434–443 (1988)Research supported in part by the University of Kentucky research initiation award  相似文献   

3.
Many reconfiguration schemes for fault-tolerant binary tree architectures have been proposed in the literature^[1-6].The VLSI layouts of most previous studies are based on the classical H-tree layout,resulting in low area utilization and likely an unnecessarily high maufacturing cost simply due to the waste of a significant portion of silicon area. In this paper,we present an area-efficient approach to the reconfigurable binary tree architecture.Area utilization and interconnection complexity of our design compare favorably with the other known approaches.In the reliability analysis,we take into account the fact that accepted chips(after fabrication)are with different degrees of redundancy initially,so as to obtain results which better reflect real situations.  相似文献   

4.
A compact parallel image processing system concept has been developed. The main features of this system is the use of off-axis paraboloidal mirror segments as collimating, Fourier transforming and image reconstructing elements, and the use of a GaAs laser diode as the coherent radiation source. Preliminary experiments to demonstrate the usefulness of this system have been performed.  相似文献   

5.
6.
A new technique is described for the registration of edge-detected images. While an extensive literature exists on the problem of image registration, few of the current approaches include a well-defined measure of the statistical confidence associated with the solution. Such a measure is essential for many autonomous applications, where registration solutions that are dubious (involving poorly focused images or terrain that is obscured by clouds) must be distinguished from those that are reliable (based on clear images of highly structured scenes). The technique developed herein utilizes straightforward edge pixel matching to determine the "best" among a class of candidate translations. A well-established statistical procedure, the McNemar test, is then applied to identify which other candidate solutions are not significantly worse than the best solution. This allows for the construction of confidence regions in the space of the registration parameters. The approach is validated through a simulation study and examples are provided of its application in numerous challenging scenarios. While the algorithm is limited to solving for two-dimensional translations, its use in validating solutions to higher-order (rigid body, affine) transformation problems is demonstrated  相似文献   

7.
An algorithm for connected component labeling of binary patterns using SIMD mesh connected computers is presented. The algorithm consists of three major steps: identifying exactly one point (seed point) within each connected component (region), assigning a unique label to each seed point, and expanding the labels to fill all pixels in the respective regions. Two approaches are given for identifying seed points. The first approach is based on shrinking and the second on the iterative replacement of equivalent labels with local minima or maxima. The shrinking algorithm reduces simply connected regions into single pixels, but multiply connected regions form rings around the holes contained in the regions. A parallel algorithm is developed to break each such ring at a single point. The broken rings are then reduced to single pixels by reshrinking. With iterations consisting of shrinking, breaking rings, if any, and reshrinking, each pattern (of any complexity) is reduced to isolated points within itself. In the second approach every region pixel in the image is initially given a unique label equal to its address in the image. Every 3 × 3 neighborhood in the image is then examined in parallel to replace the central label with the maximum (or minimum) of the labels assigned to the set of region pixels in the neighborhood. This is done iteratively until there is no further change. The seed points are then the locations where the pixel addresses match their converged labels. A parallel sorting method is used for assigning a consecutive set of numbers as labels to the seed points. Parallel expansion up to the boundaries of the original patterns then completes the connected component labeling. The computational complexities of the algorithm are discussed.  相似文献   

8.
The PIFL (Parallel Irregular Free-Lagrange) code solves two-dimensional hydrodynamics with the mesh vertices moving with the fluid, with no rezoning. The irregular mesh is made of triangles and each processor deals with one or more connected domains of fluid. After each time step the mesh is topologically restructured, mesh points may be created or destroyed, and there is a local load-balance. Every few steps there is a global load balance. The code runs on a hypercube under Cubix and is designed to run most efficiently in the limit of a large number of large-memory processors.  相似文献   

9.
We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen's algorithm. The main contributions of the paper are as follows. We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model; namely, MM1, MM 2, MM3, and compound algorithms C1(ϵ)and C2(δ). Algorithm C1(ϵ) has adjustable time complexity in sublinear level. Algorithm C2(δ) implies that it is feasible to achieve sublogarithmic time using σ(N3) processors for matrix multiplication on a realistic system. Algorithms MM3, C1(ϵ), and C2(δ) all have o(𝒩3) cost and, hence, are very processor efficient. Algorithms MM1, MM3, and C1(ϵ) are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM2 and C2(δ) are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM 2 and C2(δ) to unbounded integers and reals are also discussed  相似文献   

10.
Optical interconnections attract many engineers and scientists’ attention due to their potential for gigahertz transfer rates and concurrent access to the bus in a pipelined fashion. These unique characteristics of optical interconnections give us the opportunity to reconsider traditional algorithms designed for ideal parallel computing models, such as PRAMs. Since the PRAM model is far from practice, not all algorithms designed on this model can be implemented on a realistic parallel computing system. From this point of view, we study Cole’s pipelined merge sort [Cole R. Parallel merge sort. SIAM J Comput 1988;14:770–85] on the CREW PRAM and extend it in an innovative way to an optical interconnection model, the LARPBS (Linear Array with Reconfigurable Pipelined Bus System) model [Pan Y, Li K. Linear array with a reconfigurable pipelined bus system—concepts and applications. J Inform Sci 1998;106;237–58]. Although Cole’s algorithm is optimal, communication details have not been provided due to the fact that it is designed for a PRAM. We close this gap in our sorting algorithm on the LARPBS model and obtain an O(log N)-time optimal sorting algorithm using O(N) processors. This is a substantial improvement over the previous best sorting algorithm on the LARPBS model that runs in O(log N log log N) worst-case time using N processors [Datta A, Soundaralakshmi S, Owens R. Fast sorting algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distribut Syst 2002;13(3):212–22]. Our solution allows efficiently assign and reuse processors. We also discover two new properties of Cole’s sorting algorithm that are presented as lemmas in this paper.  相似文献   

11.
The problems in computer vision range from edge detection and segmentation at the lowest level to the problem of cognition at the highest level. This correspondence describes the organization and operation of a semantic network array processor (SNAP) as applicable to high level computer vision problems. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and a communication unit. The applications discussed in this correspondence are the two general techniques, discrete relaxation and dynamic programming. While the discrete relaxation is discussed with reference to scene labeling and edge interpretation, the dynamic programming is tuned for stereo.  相似文献   

12.
Fast neural net simulation with a DSP processor array   总被引:1,自引:0,他引:1  
This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.  相似文献   

13.
Comparison of thinning algorithms on a parallel processor   总被引:1,自引:0,他引:1  
A number of parallel algorithms for thinning elongated shapes are contrasted and compared on a Clip 4 parallel processor. These algorithms all work in rectangular tessellation and they thin elongated objects to lines one pixel thick while retaining their connectivity. Existing algorithms for use on binary pictures are considered first and new algorithms are proposed which produce more satisfactory results, but are more expensive in terms of speed and space requirements. Two methods of extending these algorithms to grey pictures are then considered. In one method, binary algorithms are used but are directed by the grey pixels; in the other the binary algorithms are generalized to the grey case. Both methods result in arcs which are not wholly determined by the original boundary of the object but lie along darker ridges. The former is faster and produces results which are easier to interpret, but the results from the latter contain more information.  相似文献   

14.
A new type of high performance array processor system is presented in this paper.Unlikethe conventional host-peripheral array processor systems,this system is designed with afunctionally distributed approach.The design philosophy is described first.Then the hardwareorganizations of two concrete systems,namely:150-AP and GF-10/12,including thecommunication between processors are shown.Some attractive system performances for usersprograms are also given.  相似文献   

15.
In parallel adaptive mesh refinement (AMR) computations the problem size can vary significantly during a simulation. The goal here is to explore the performance implications of dynamically varying the number of processors proportional to the problem size during simulation. An emulator has been developed to assess the effects of this approach on parallel communication, parallel runtime and resource consumption. The computation and communication models used in the emulator are described in detail. Results using the emulator with different AMR strategies are described for a test case. Results show for the test case, varying the number of processors, on average, reduces the total parallel communications overhead from 16 to 19% and improves parallel runtime time from 4 to 8%. These results also show that on average resource utilization improves more than 37%.  相似文献   

16.
This paper outlines the early results of research into developing algorithms to permit the counting and tracking of moving vehicles in real world scenes using the CLIP4 parallel image processor.  相似文献   

17.
A new architecture is presented to support the general class of real-time large-vocabulary speaker-independent continuous speech recognizers incorporating language models. Many such recognizers require multiple high-performance central processing units (CPU's) as well as high interprocessor communication bandwidth. This array processor provides a peak CPU performance of 2.56 giga-floating point operations per second (GFLOPS) as well as a high-speed communication network. In order to efficiently utilize these resources, algorithms were devised for partitioning speech models for mapping into the array processor. Also, a novel scheme is presented for a functional partitioning of the speech recognizer computations. The recognizer is functionally partitioned into six stages, namely, the linear predictive coding (LPC) based feature extractor, mixture probability computer, (phone) state probability computer, word probability computer, phrase probability computer, and traceback computer. Each of these stages is further subdivided as many times as necessary to fit the individual processing elements (PE's). The functional stages are pipelined and synchronized with the frame rate of the incoming speech signal. This partitioning also allows a multistage stack decoder to be implemented for reduction of computation  相似文献   

18.
19.
多态并行处理器中的线程管理器设计   总被引:2,自引:2,他引:2  
基于多态并行处理器提出了一种硬件线程管理器,支持MIMD模式8个线程管理操作和SIMD模式SC控制器统一管理两种工作模式,实现了线程级并行计算;可以监测各个线程的工作情况以及近邻通信寄存器和路由器的状态;能够在通信时停止、切换、启动线程,记录每个线程的工作状态,同时避免了因数据阻塞带来的等待问题,能够最大程度地提高单个处理器的执行效率。  相似文献   

20.
一种高速并行FFT处理器的VLSI结构设计   总被引:8,自引:1,他引:8  
在OFDM系统的实现中,高速FFT处理器是关键。在分析了基4按时域抽取快速傅立叶变换(FFT)算法特点的基础上,研究了一种高性能FFT处理器的硬件结构。此结构能同时从四个并行存储器中读取蝶形运算所需的4个操作数,极大地提高了处理速度。此结构控制单元简单,便于模块化设计。经硬件验证,达到设计要求。在系统时钟为100MHz时,1024点18位复数FFT的计算时间为13滋s。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号