共查询到20条相似文献,搜索用时 15 毫秒
1.
We present efficient parallel algorithms for using a pyramid computer to determine convexity properties of digitized black/white pictures and labeled figures. Algorithms are presented for deciding convexity, identifying extreme points of convex hulls, and using extreme points in a variety of fashions. For a pyramid computer with a base ofn simple processing elements arranged in ann 1/2 ×n 1/2 square, the running times of the algorithms range from Θ(logn) to find the extreme points of a convex figure in a digitized picture, to Θ(n 1/6) to find the diameter of a labeled figure, Θ(n 1/4 logn) to find the extreme points of every figure in a digitized picture, to Θ(n 1/2) to find the extreme points of every labeled set of processing elements. Our results show that the pyramid computer can be used to obtain efficient solutions to nontrivial problems in image analysis. We also show the sensitivity of efficient pyramid-computer algorithms to the rate at which essential data can be compressed. Finally, we show that a wide variety of techniques are needed to make full and efficient use of the pyramid architecture. 相似文献
2.
We show that a number of geometric problems can be solved on a n × n mesh-connected computer (MCC) inO(n) time, which is optimal to within a constant factor, since a nontrivial data movement on an MCC requires (n) time. The problems studied here include multipoint location, planar point location, trapezoidal decomposition, intersection detection, intersection of two convex polygons, Voronoi diagram, the largest empty circle, the smallest enclosing circle, etc. TheO(n) algorithms for all of the above problems are based on the classical divide-and-conquer problem-solving strategy.This work was supported in part by the National Science Foundation under Grant DCR 8420814. A preliminary version was presented in the 1987 FJCC, Dallas, TX. 相似文献
3.
当前随着计算要求的不断提高,并行计算发展方法未艾,网络并行计算更成为新的潮流,在并行处理中,图象处理是一个重要的应用领域。提出一种动态并行方法,以此实现Mandelbrot图形。通过对实验数据的分析,对于任务规模和并行计算的加速比进行了研究。 相似文献
4.
In this article we report on our experience in computing resultants of bivariate polynomials on Graphics Processing Units (GPU). Following the outline of Collins’ modular approach [6], our algorithm starts by mapping the input polynomials to a finite field for sufficiently many primes m. Next, the GPU algorithm evaluates the polynomials at a number of fixed points x∈Zm, and computes a set of univariate resultants for each modular image. Afterwards, the resultant is reconstructed using polynomial interpolation and Chinese remaindering. The GPU returns resultant coefficients in the form of Mixed Radix (MR) digits. Finally, large integer coefficients are recovered from the MR representation on the CPU. All computations performed by the algorithm (except for, partly, Chinese remaindering) are outsourced to the graphics processor thereby minimizing the amount of work to be done on the host machine. The main theoretical contribution of this work is the modification of Collins’ modular algorithm using the methods of matrix algebra to make an efficient realization on the GPU feasible. According to the benchmarks, our algorithm outperforms a CPU-based resultant algorithm from 64-bit Maple 14 by a factor of 100. 相似文献
5.
We present processor-time optimal parallel algorithms for several problems onn ×n digitized image arrays, on a mesh-connected array havingp processors and a memory of sizeO(n
2) words. The number of processorsp can vary over the range [1,n
3/2] while providing optimal speedup for these problems. The class of image problems considered here includes labeling the connected components of an image; computing the convex hull, the diameter, and a smallest enclosing box of each component; and computing all closest neighbors. Such problems arise in medium-level vision and require global operations on image pixels. To achieve optimal performance, several efficient data-movement and reduction techniques are developed for the proposed organization.This research was supported in part by the National Science Foundation under Grant IRI-8710836 and in part by DARPA under Contract F33615-87-C-1436 monitored by the Wright Patterson Airforce Base. 相似文献
6.
Nobuyuki Masuda Takashige Sugie Tomoyoshi Ito Shinjiro Tanaka Yu Hamada Shin-ichi Satake Tomoaki Kunugi Kazuho Sato 《Computer Physics Communications》2010,181(12):1986-1989
We have designed a PC cluster system with special purpose computer boards for visualization of fluid flow using digital holographic particle tracking velocimetry (DHPTV). In this board, there is a Field Programmable Gate Array (FPGA) chip in which is installed a pipeline for calculating the intensity of an object from a hologram by fast Fourier transform (FFT). This cluster system can create 1024 reconstructed images from a 1024×1024-grid hologram in 0.77 s. It is expected that this system will contribute to the analysis of fluid flow using DHPTV. 相似文献
7.
Otfried Schwarzkopf 《Algorithmica》1991,6(1):685-697
Distance transforms are an important computational tool for the processing of binary images. For ann ×n image, distance transforms can be computed in time
(n) on a mesh-connected computer and in polylogarithmic time on hypercube related structures. We investigate the possibilities of computing distance transforms in polylogarithmic time on the pyramid computer and the mesh of trees. For the pyramid, we obtain a polynomial lower bound using a result by Miller and Stout, so we turn our attention to the mesh of trees. We give a very simple
(logn) algorithm for the distance transform with respect to theL
1-metric, an
(log2
n) algorithm for the transform with respect to theL
-metric, and find that the Euclidean metric is much more difficult. Based on evidence from number theory, we conjecture the impossibility of computing the Euclidean distance transform in polylogarithmic time on a mesh of trees. Instead, we approximate the distance transform up to a given error. This works for anyL
k
-metric and takes time
(log3
n).This research was supported by the Deutsche Forschungsgemeinschaft under Grant Al 253/1-1, Schwerpunktprogramm Datenstrukturen und effiziente Algorithmen. 相似文献
8.
Dipen Moitra 《Algorithmica》1991,6(1):624-657
Given a black-and-white image, represented by an array of n × n binary-valued pixels, we wish to cover the black pixels with aminimal set of (possibly overlapping) maximal squares. It was recently shown that obtaining aminimum square cover for a polygonal binary image with holes is NP-hard. We derive an optimal parallel algorithm for theminimal square cover problem, which for any desired computation timeT in [logn,n] runs on an EREW-PRAM with (n/T) processors. The cornerstone of our algorithm is a novel data structure, the cover graph, which compactly represents the covering relationships between the maximal squares of the image. The size of the cover graph is linear in the number of pixels. This algorithm has applications to problems in VLSI mask generation, incremental update of raster displays, and image compression.The research reported here forms part of the author's doctoral dissertion, submitted to Cornell University in May 1989. This work was partially supported by NSF Grant DC1-86-02256, IBM Agreement 12060043, and ONR Contract N00014-83-K-0640. A preliminary version of this paper was presented at the 26th Annual Allerton Conference on Communications, Control, and Computing, Monticello, IL, September 28–30, 1988. 相似文献
9.
Frank Dehne Russ Miller Andrew Rau-Chaplin 《International journal of parallel programming》1991,20(6):475-486
In this paper, we present optimal parallel algorithms for optical clustering on a mesh-connected computer.Optical clustering is a clustering technique based on the principal of optical resolution, and is of particular interest in picture analysis. The algorithms we present are based on the application of parallel algorithms in computational geometry and graph theory. In particular, we show that given a setS ofN points in the Euclidean plane, the following problems can be solved in optimal
time on a mesh-connected computer of sizeN.
Research partially supported by the Natural Sciences and Engineering Research Council of Canada and the National Science Foundation under Grant IRI-9108288. 相似文献
1. | Determine the optical clusters ofS with respect to a given separation parameter. |
2. | Given an interval [a, b] representing the number of optical clusters desired in the clustering ofS, determine the range of the separation parameter that will result in such an optical clustering. |
10.
An algorithm to implement the Hough transform for the detection of a straight line on a pyramidal architecture is presented. The algorithm consists of two phases. The first phase, called block-projection, takes constant time. The second phase, called block-combination, is repeated logn times and takes a total ofO(n
1/2) time for the detection of all straight lines having a given slope on an n×n image; if there arep different slopes to be detected, then the total time becomesO(pn
1/2). 相似文献
11.
Summary Linear arrays are characterized by a small communication bandwidth and a large communication diameter rendering them unsuited to the implementation of global computations. This paper presents efficient data movement and partitioning techniques to overcome several shortcomings of linear arrays. These techniques are used to derive optimal parallel algorithms for several geometric problems onn×n images using a fixed-size linear array withp processors, where 1pn.O(n
2/p) time solutions are presented for labeling connected image regions, computing the convex hull of each region, and computing nearest neighbors. Consequently, a linear array withn processors can solve several image problems inO(n) time which is the same time taken by a two dimensional mesh-connected computer withn
2 processors. Limitations of linear arrays are analyzed by presenting a class of image problems which can be solved sequentially inO(n)
2
) time, but require (n
2) time on a linear array, irrespective of the number of processors used and the partitioning of the input image among the processors. An alternate communication-efficient fixed-size organization withp processors is proposed to solve such problems inO(n
2/p) time, for 1pn.
Hussein M. Alnuweiri received the B.S. and M.S. degrees in 1983 and 1984, respectively, both in electrical engineering from King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia, and received the Ph.D. degree also in electrical engineering in 1989 from the University of Southern California, Los Angeles. Currently he is an assistant professor in the electrical engineering department at University of British Columbia. His research interests include parallel architectures and algorithms, computational aspects of VLSI networks, complexity of parallel computations, and algorithmic aspects of image analysis, vision, and robot motion planning.
Viktor K. Prasanna (V. K. Prasanna Kumar) received his BS in Electronics Engineering from the Bangalore University, his MS from the School of Automation, Indian Institute of Science. He obtained his Ph.D. in Computer Science from Pennsylvania State University in 1983. Currently, he is an Associate Professor in the department of Electrical Engineering-Systems, University of Southern California, Los Angeles. His current research interests include Parallel Computation, Computer Architecture, VLSI Computations and Computational aspects of Image Processing and Vision. He is the editor of the book Parallel Architectures and Algorithms for Image understanding published by Academic Press. Professor Prasanna serves on number of international committees and panels and is a consultant for several industries. He is the program chair of the 1992 International Parallel Processing Symposium sponsored by IEEE Computer Society and is a subject area editor of Journal of Parallel and Distributed Computing.This research was supported in part by the National Science Foundation under grant IRI-8710836 and in part by DARPA under contract F 33615-87-C-1436 monitored by Wright Patterson Airforce Base. A preliminary version of this paper appears in the IEEE Conference on Computer Vision and Pattern Recognition, 1988. 相似文献
12.
Optical interconnections attract many engineers and scientists’ attention due to their potential for gigahertz transfer rates and concurrent access to the bus in a pipelined fashion. These unique characteristics of optical interconnections give us the opportunity to reconsider traditional algorithms designed for ideal parallel computing models, such as PRAMs. Since the PRAM model is far from practice, not all algorithms designed on this model can be implemented on a realistic parallel computing system. From this point of view, we study Cole’s pipelined merge sort [Cole R. Parallel merge sort. SIAM J Comput 1988;14:770–85] on the CREW PRAM and extend it in an innovative way to an optical interconnection model, the LARPBS (Linear Array with Reconfigurable Pipelined Bus System) model [Pan Y, Li K. Linear array with a reconfigurable pipelined bus system—concepts and applications. J Inform Sci 1998;106;237–58]. Although Cole’s algorithm is optimal, communication details have not been provided due to the fact that it is designed for a PRAM. We close this gap in our sorting algorithm on the LARPBS model and obtain an O(log N)-time optimal sorting algorithm using O(N) processors. This is a substantial improvement over the previous best sorting algorithm on the LARPBS model that runs in O(log N log log N) worst-case time using N processors [Datta A, Soundaralakshmi S, Owens R. Fast sorting algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distribut Syst 2002;13(3):212–22]. Our solution allows efficiently assign and reuse processors. We also discover two new properties of Cole’s sorting algorithm that are presented as lemmas in this paper. 相似文献
13.
Massimo Cafaro Italo Epicoco Sandro Fiore Daniele Lezzi Silvia Mocavero Giovanni Aloisio 《Journal of Real-Time Image Processing》2009,4(3):219-227
In this paper, we describe the process of parallelizing an existing, production level, sequential Synthetic Aperture Radar
(SAR) processor based on the Range-Doppler algorithmic approach. We show how, taking into account the constraints imposed
by the software architecture and related software engineering costs, it is still possible with a moderate programming effort
to parallelize the software and present an message-passing interface (MPI) implementation whose speedup is about 8 on 9 processors,
achieving near real-time processing of raw SAR data even on a moderately aged parallel platform. Moreover, we discuss a hybrid
two-level parallelization approach that involves the use of both MPI and OpenMP. We also present GridStore, a novel data grid
service to manage raw, focused and post-processed SAR data in a grid environment. Indeed, another aim of this work is to show
how the processed data can be made available in a grid environment to a wide scientific community, through the adoption of
a data grid service providing both metadata and data management functionalities. In this way, along with near real-time processing
of SAR images, we provide a data grid-oriented system for data storing, publishing, management, etc.
相似文献
Giovanni AloisioEmail: |
14.
An algorithm and a computer routines library written in object programming language C++ are described, which allow removal of global intensity deformation in a grey-scale image using contrast control. During the process, a Gaussian pyramid representation of an input image is constructed through low-pass filtration and sampling of successive pyramid levels, where the input image constitutes the first (zero) level of the pyramid. In the second step a Laplacian pyramid is built by subtracting successive levels of the Gaussian pyramid. Then, all levels in the Laplacian pyramid are expanded to the original image size and added with weights, which are real numbers, to reconstruct the image. Proper choice of the weights allows global grey-level deformation removal. This technique can be applied for contrast enhancement and image archiving of airborne, scanned, photo-copied and optical camera-made images, where global distortions of intensity are frequently met. 相似文献
15.
Jayram Moorkanikara Nageswaran Andrew Felch Ashok Chandrasekhar Nikil Dutt Richard Granger Alex Nicolau Alex Veidenbaum 《International journal of parallel programming》2009,37(4):345-369
Even though computing systems have increased the number of transistors, the switching speed, and the number of processors,
most programs exhibit limited speedup due to the serial dependencies of existing algorithms. Analysis of intrinsically parallel
systems such as brain circuitry have led to the identification of novel architecture designs, and also new algorithms than
can exploit the features of modern multiprocessor systems. In this article we describe the details of a brain derived vision
(BDV) algorithm that is derived from the anatomical structure, and physiological operating principles of thalamo-cortical
brain circuits. We show that many characteristics of the BDV algorithm lend themselves to implementation on IBM CELL architecture,
and yield impressive speedups that equal or exceed the performance of specialized solutions such as FPGAs. Mapping this algorithm
to the IBM CELL is non-trivial, and we suggest various approaches to deal with parallelism, task granularity, communication,
and memory locality. We also show that a cluster of three PS3s (or more) containing IBM CELL processors provides a promising
platform for brain derived algorithms, exhibiting speedup of more than 140 × over a desktop PC implementation, and thus enabling
real-time object recognition for robotic systems. 相似文献
16.
Sparse QR factorization on a massively parallel computer 总被引:1,自引:0,他引:1
Steven G. Kratzer 《The Journal of supercomputing》1992,6(3-4):237-255
This paper shows that QR factorization of large, sparse matrices can be performed efficiently on massively parallel SIMD (single instruction stream/multiple data stream) computers such as the Connection Machine CM-2. The problem is cast as a dataflow graph, whose nodes are mapped to a virtual dataflow machine in such a way that only nearest-neighbor communication is required. This virtual machine is implemented by programming the CM-2 processors to support a restricted dataflow protocol. Execution results for several test matrices show that good performance can be obtained without relying on nested dissection techniques. 相似文献
17.
Kenneth L. Rice Tarek M. Taha Christopher N. Vutsinas 《The Journal of supercomputing》2009,47(1):21-43
This paper presents the implementation and scaling of a neocortex inspired cognitive model on a Cray XD1. Both software and
reconfigurable logic based FPGA implementations of the model are examined. This model belongs to a new class of biologically
inspired cognitive models. Large scale versions of these models have the potential for significantly stronger inference capabilities
than current conventional computing systems. These models have large amounts of parallelism and simple computations, thus
allowing highly efficient hardware implementations. As a result, hardware-acceleration of these models can produce significant
speedups over fully software implementations. Parallel software and hardware-accelerated implementations of such a model are
investigated for networks of varying complexity. A scaling analysis of these networks is presented and utilized to estimate
the throughput of both hardware-accelerated and software implementations of larger networks that utilize the full resources
of the Cray XD1. Our results indicate that hardware-acceleration can provide average throughput gains of 75 times over software-only
implementations of the networks we examined on this system.
相似文献
Christopher N. VutsinasEmail: |
18.
该文介绍了一个基于科学计算语言的遗传算法工具箱GATS.与现有的基于科学计算语言的遗传算法工具箱相比,GATS在功能上和使用上具有更多的优越性.GATS可支持四种基因编码方式;支持小生境、尺度变换等功能;支持自适应遗传算法、分层遗传算法;支持多目标优化;支持并行处理(Linux/Unix平台);以及更多的遗传算子等等.GATS采用“主体框架 + 可替换模块”的结构,便于用户加以扩展;带有Tcl/Tk编写的用户界面,使用简单;软件开放源码,特别适用于科研和教育领域.该文详细介绍了GATS的结构、功能和使用方法,并给出了多个应用示例. 相似文献
19.
Adigitized plane of sizeM is a rectangular M × M array of integer lattice points called pixels. A M × M mesh-of-processors in which each processorP
ij
represents pixel (i,j) is a natural architecture to store and manipulate images in ; such a parallel architecture is called asystolic screen. In this paper we consider a variety of computational-geometry problems on images in a digitized plane, and present optimal algorithms for solving these problems on a systolic screen. In particular, we presentO(M)-time algorithms for determining all contours of an image; constructing all rectilinear convex hulls of an image (peeling); solving the parallel and perspective visibility problem forn disjoint digitized images; and constructing the Voronoi diagram ofn planar objects represented by disjoint images, for a large class of object types (e.g., points, line segments, circles, ellipses, and polygons of constant size) and distance functions (e.g., allL
p
metrics). These algorithms implyO(M)-time solutions to a number of other geometric problems: e.g., rectangular visibility, separability, detection of pseudo-star-shapedness, and optical clustering. One of the proposed techniques also leads to a new parallel algorithm for determining all longest common subsequences of two words.Research supported by the Naural Sciences and Engineering Research Council of Canada. With the Editor-in-Chief's permission, this paper was sent to the referees in a form which kept them unaware of the fact that the Guest Editor is one of the co-authors. 相似文献
20.
We describe two hypercube algorithms to find the biconnected components of a dense connected undirected graph. One is a modified version of the Tarjan-Vishkin algorithm and the other is an adaptation of Read's sequential algorithm. The two hypercube algorithms were experimentally evaluated on an NCUBE/7 MIMD hypercube computer. The two algorithms have comparable performance, and efficiencies as high as 0.7 were observed on dense graphs.This research was supported in part by the National Science Foundation under grants DCR84-20935 and MIP 86-17374. 相似文献