共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Summary We prove that (1+6/2)n 2.22n is a time lower bound independent of indexing schemes for sorting n2 items on an n × n mesh-connected processor array. We distinguish between indexing schemes by showing that there exists an indexing scheme which is provably worse than the snake-like row-major indexing for sorting. We also derive lower bounds for various indexing schemes. All these results are obtained by using the chain argument which we provide in this paper.A preliminary version of this paper was presented on the 3rd International Workshop on Parallel Computation and VLSI Theory, Corfu Island, Greece (June–July, 1988). Lect. Notes Comput. Sci. 319, 434–443 (1988)Research supported in part by the University of Kentucky research initiation award 相似文献
3.
Chung-Han CHEN 《计算机科学技术学报》1996,11(3):326-336
Many reconfiguration schemes for fault-tolerant binary tree architectures have been proposed in the literature^[1-6].The VLSI layouts of most previous studies are based on the classical H-tree layout,resulting in low area utilization and likely an unnecessarily high maufacturing cost simply due to the waste of a significant portion of silicon area. In this paper,we present an area-efficient approach to the reconfigurable binary tree architecture.Area utilization and interconnection complexity of our design compare favorably with the other known approaches.In the reliability analysis,we take into account the fact that accepted chips(after fabrication)are with different degrees of redundancy initially,so as to obtain results which better reflect real situations. 相似文献
4.
Akram S. Husain-Abidi 《Pattern recognition》1973,5(1):3-11
A compact parallel image processing system concept has been developed. The main features of this system is the use of off-axis paraboloidal mirror segments as collimating, Fourier transforming and image reconstructing elements, and the use of a GaAs laser diode as the coherent radiation source. Preliminary experiments to demonstrate the usefulness of this system have been performed. 相似文献
5.
6.
Simonson KM Drescher SM Tanner FR 《IEEE transactions on pattern analysis and machine intelligence》2007,29(1):112-125
A new technique is described for the registration of edge-detected images. While an extensive literature exists on the problem of image registration, few of the current approaches include a well-defined measure of the statistical confidence associated with the solution. Such a measure is essential for many autonomous applications, where registration solutions that are dubious (involving poorly focused images or terrain that is obscured by clouds) must be distinguished from those that are reliable (based on clear images of highly structured scenes). The technique developed herein utilizes straightforward edge pixel matching to determine the "best" among a class of candidate translations. A well-established statistical procedure, the McNemar test, is then applied to identify which other candidate solutions are not significantly worse than the best solution. This allows for the construction of confidence regions in the space of the registration parameters. The approach is validated through a simulation study and examples are provided of its application in numerous challenging scenarios. While the algorithm is limited to solving for two-dimensional translations, its use in validating solutions to higher-order (rigid body, affine) transformation problems is demonstrated 相似文献
7.
《Computer Vision, Graphics, and Image Processing》1989,45(2):133-149
An algorithm for connected component labeling of binary patterns using SIMD mesh connected computers is presented. The algorithm consists of three major steps: identifying exactly one point (seed point) within each connected component (region), assigning a unique label to each seed point, and expanding the labels to fill all pixels in the respective regions. Two approaches are given for identifying seed points. The first approach is based on shrinking and the second on the iterative replacement of equivalent labels with local minima or maxima. The shrinking algorithm reduces simply connected regions into single pixels, but multiply connected regions form rings around the holes contained in the regions. A parallel algorithm is developed to break each such ring at a single point. The broken rings are then reduced to single pixels by reshrinking. With iterations consisting of shrinking, breaking rings, if any, and reshrinking, each pattern (of any complexity) is reduced to isolated points within itself. In the second approach every region pixel in the image is initially given a unique label equal to its address in the image. Every 3 × 3 neighborhood in the image is then examined in parallel to replace the central label with the maximum (or minimum) of the labels assigned to the set of region pixels in the neighborhood. This is done iteratively until there is no further change. The seed points are then the locations where the pixel addresses match their converged labels. A parallel sorting method is used for assigning a consecutive set of numbers as labels to the seed points. Parallel expansion up to the boundaries of the original patterns then completes the connected component labeling. The computational complexities of the algorithm are discussed. 相似文献
8.
Roy Williams 《Parallel Computing》1988,7(3):439-443
The PIFL (Parallel Irregular Free-Lagrange) code solves two-dimensional hydrodynamics with the mesh vertices moving with the fluid, with no rezoning. The irregular mesh is made of triangles and each processor deals with one or more connected domains of fluid. After each time step the mesh is topologically restructured, mesh points may be created or destroyed, and there is a local load-balance. Every few steps there is a global load balance. The code runs on a hypercube under Cubix and is designed to run most efficiently in the limit of a large number of large-memory processors. 相似文献
9.
Keqin Li Yi Pan Si Qing Zheng 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):705-720
We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen's algorithm. The main contributions of the paper are as follows. We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model; namely, MM1, MM 2, MM3, and compound algorithms C1(ϵ)and C2(δ). Algorithm C1(ϵ) has adjustable time complexity in sublinear level. Algorithm C2(δ) implies that it is feasible to achieve sublogarithmic time using σ(N3) processors for matrix multiplication on a realistic system. Algorithms MM3, C1(ϵ), and C2(δ) all have o(𝒩3) cost and, hence, are very processor efficient. Algorithms MM1, MM3, and C1(ϵ) are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM2 and C2(δ) are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM 2 and C2(δ) to unbounded integers and reals are also discussed 相似文献
10.
Optical interconnections attract many engineers and scientists’ attention due to their potential for gigahertz transfer rates and concurrent access to the bus in a pipelined fashion. These unique characteristics of optical interconnections give us the opportunity to reconsider traditional algorithms designed for ideal parallel computing models, such as PRAMs. Since the PRAM model is far from practice, not all algorithms designed on this model can be implemented on a realistic parallel computing system. From this point of view, we study Cole’s pipelined merge sort [Cole R. Parallel merge sort. SIAM J Comput 1988;14:770–85] on the CREW PRAM and extend it in an innovative way to an optical interconnection model, the LARPBS (Linear Array with Reconfigurable Pipelined Bus System) model [Pan Y, Li K. Linear array with a reconfigurable pipelined bus system—concepts and applications. J Inform Sci 1998;106;237–58]. Although Cole’s algorithm is optimal, communication details have not been provided due to the fact that it is designed for a PRAM. We close this gap in our sorting algorithm on the LARPBS model and obtain an O(log N)-time optimal sorting algorithm using O(N) processors. This is a substantial improvement over the previous best sorting algorithm on the LARPBS model that runs in O(log N log log N) worst-case time using N processors [Datta A, Soundaralakshmi S, Owens R. Fast sorting algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distribut Syst 2002;13(3):212–22]. Our solution allows efficiently assign and reuse processors. We also discover two new properties of Cole’s sorting algorithm that are presented as lemmas in this paper. 相似文献
11.
Dixit V Moldovan DI 《IEEE transactions on pattern analysis and machine intelligence》1987,(1):153-160
The problems in computer vision range from edge detection and segmentation at the lowest level to the problem of cognition at the highest level. This correspondence describes the organization and operation of a semantic network array processor (SNAP) as applicable to high level computer vision problems. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and a communication unit. The applications discussed in this correspondence are the two general techniques, discrete relaxation and dynamic programming. While the discrete relaxation is discussed with reference to scene labeling and edge interpretation, the dynamic programming is tuned for stereo. 相似文献
12.
Fast neural net simulation with a DSP processor array 总被引:1,自引:0,他引:1
This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications. 相似文献
13.
Comparison of thinning algorithms on a parallel processor 总被引:1,自引:0,他引:1
CJ Hilditch 《Image and vision computing》1983,1(3):115-132
A number of parallel algorithms for thinning elongated shapes are contrasted and compared on a Clip 4 parallel processor. These algorithms all work in rectangular tessellation and they thin elongated objects to lines one pixel thick while retaining their connectivity. Existing algorithms for use on binary pictures are considered first and new algorithms are proposed which produce more satisfactory results, but are more expensive in terms of speed and space requirements. Two methods of extending these algorithms to grey pictures are then considered. In one method, binary algorithms are used but are directed by the grey pixels; in the other the binary algorithms are generalized to the grey case. Both methods result in arcs which are not wholly determined by the original boundary of the object but lie along darker ridges. The former is faster and produces results which are easier to interpret, but the results from the latter contain more information. 相似文献
14.
Xia Peisu Fang Xinwo Wang Yuxiang Yan Kaiming Zhang Tingjun Liu Yulan Zhao Chunying Sun Jizhong 《计算机科学技术学报》1987,2(3):163-173
A new type of high performance array processor system is presented in this paper.Unlikethe conventional host-peripheral array processor systems,this system is designed with afunctionally distributed approach.The design philosophy is described first.Then the hardwareorganizations of two concrete systems,namely:150-AP and GF-10/12,including thecommunication between processors are shown.Some attractive system performances for usersprograms are also given. 相似文献
15.
In parallel adaptive mesh refinement (AMR) computations the problem size can vary significantly during a simulation. The goal
here is to explore the performance implications of dynamically varying the number of processors proportional to the problem
size during simulation. An emulator has been developed to assess the effects of this approach on parallel communication, parallel
runtime and resource consumption. The computation and communication models used in the emulator are described in detail. Results
using the emulator with different AMR strategies are described for a test case. Results show for the test case, varying the
number of processors, on average, reduces the total parallel communications overhead from 16 to 19% and improves parallel
runtime time from 4 to 8%. These results also show that on average resource utilization improves more than 37%. 相似文献
16.
《Pattern recognition letters》1987,6(3):199-213
This paper outlines the early results of research into developing algorithms to permit the counting and tracking of moving vehicles in real world scenes using the CLIP4 parallel image processor. 相似文献
17.
A new architecture is presented to support the general class of real-time large-vocabulary speaker-independent continuous speech recognizers incorporating language models. Many such recognizers require multiple high-performance central processing units (CPU's) as well as high interprocessor communication bandwidth. This array processor provides a peak CPU performance of 2.56 giga-floating point operations per second (GFLOPS) as well as a high-speed communication network. In order to efficiently utilize these resources, algorithms were devised for partitioning speech models for mapping into the array processor. Also, a novel scheme is presented for a functional partitioning of the speech recognizer computations. The recognizer is functionally partitioned into six stages, namely, the linear predictive coding (LPC) based feature extractor, mixture probability computer, (phone) state probability computer, word probability computer, phrase probability computer, and traceback computer. Each of these stages is further subdivided as many times as necessary to fit the individual processing elements (PE's). The functional stages are pipelined and synchronized with the frame rate of the incoming speech signal. This partitioning also allows a multistage stack decoder to be implemented for reduction of computation 相似文献
18.
19.