共查询到20条相似文献,搜索用时 0 毫秒
1.
The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. Majority of today's applications use backpropagate feedforward ANN. In this paper, two methods of P pattern L layer ANN learning on n × n RMESH have been presented. One required memory space of O(nL) but conceptually is simpler to develop and the other uses pipelined approach which reduces the memory requirement to O(L). Both of these algorithms take O(PL) time and are optimal for RMESH architecture. 相似文献
2.
In this paper we present an O(log n) time parallel algorithm for arithmetic expression evaluation, on an n × n processor array with reconfigurable bus system, where n is the sum of the number of operators and constants in the expression. The basic technique involved here is leaves-cutting (rake operation), as in the case of PRAM model algorithms available in the literature for this problem. The input to our algorithm is assumed to be the binary tree associated with a given expression (also known as expression tree with n number of nodes). Our algorithm is faster compared to the previous best time for expression evaluation on mesh connected computers which is O(√n). 相似文献
3.
Kenneth L. Rice Tarek M. Taha Christopher N. Vutsinas 《The Journal of supercomputing》2009,47(1):21-43
This paper presents the implementation and scaling of a neocortex inspired cognitive model on a Cray XD1. Both software and
reconfigurable logic based FPGA implementations of the model are examined. This model belongs to a new class of biologically
inspired cognitive models. Large scale versions of these models have the potential for significantly stronger inference capabilities
than current conventional computing systems. These models have large amounts of parallelism and simple computations, thus
allowing highly efficient hardware implementations. As a result, hardware-acceleration of these models can produce significant
speedups over fully software implementations. Parallel software and hardware-accelerated implementations of such a model are
investigated for networks of varying complexity. A scaling analysis of these networks is presented and utilized to estimate
the throughput of both hardware-accelerated and software implementations of larger networks that utilize the full resources
of the Cray XD1. Our results indicate that hardware-acceleration can provide average throughput gains of 75 times over software-only
implementations of the networks we examined on this system.
相似文献
Christopher N. VutsinasEmail: |
4.
Several new number representations based on a residue number system are presented which use the smallest prime numbers as moduli and are suited for parallel computations on a reconfigurable mesh architecture. The bit model of linear reconfigurable mesh with exclusive write and unit-time delay for broadcasting on a subbus is assumed. It is shown how to convert in O(1) time any integer, ranging between 0 and n-1, from any commonly used representation to any new representation proposed in this paper (and vice versa) using an n/spl times/O(log/sup 2/ n/log log n) reconfigurable mesh. In particular, some of the previously known conversion techniques are improved. Moreover, as a byproduct, it is shown how to compute in O(1) time the Prefix Sums of n bits by a reconfigurable mesh having the above mentioned size, thus improving previously known results. Applications to the Prefix Sums of n h-bit integers and to Approximate String Matching with /spl alpha/ mismatches are also considered. The Summation and the Prefix Sums can be computed in O(1) time using O(h log N+log/sup 2/ N/log log N)/spl times/Nh and O(h/sup 2/+log/sup 2/ N/log(h+log N))/spl times/O(N(h+log N)) reconfigurable meshes, respectively. Moreover, it is shown for the first time how to find in O(1) time all the occurrences of a pattern of length m in a text of length n, allowing less than /spl alpha/ mismatches, using a reconfigurable mesh of size O(m log|/spl Sigma/|)/spl times/O (n(log|/spl Sigma/|+log/sup 2/ /spl alpha//log log /spl alpha/)), where the pattern and the text are strings over a finite alphabet /spl Sigma/ and /spl alpha/相似文献
5.
6.
Computing on rays: A parallel approach for surface mesh modeling from multi-material volumetric data
Charlie C.L. WangAuthor vitae 《Computers in Industry》2011,62(7):660-671
Ray representation (Ray-rep) of a solid has been studied and used in the solid modeling community for many years because of its compactness and simplicity. This paper presents a parallel approach for mesh surface modeling from multi-material volume data using an extended Ray-rep as an intermediate, where every homogeneous region is enclosed by a set of two-manifold surface meshes on the resultant model. The approach consists of three major algorithms: firstly, an algorithm is developed to convert the given multi-material volumetric data into a Ray-rep for heterogeneous solid; secondly, filtering algorithm is exploited to process the rays of heterogeneous solid in parallel; and lastly, the adaptive mesh surfaces are generated from the Ray-rep through a dual-contouring like algorithm. Here the intermediate surfaces between two constituent materials can be directly extracted without building the volumetric mesh, and the manifold topology is preserved on each surface patch. Furthermore, general offset surface can be easily computed in this paradigm by designing a special parallel operator for the rays. 相似文献
7.
Efficient and scalable quicksort on a linear array with a reconfigurable pipelined bus system 总被引:3,自引:0,他引:3
Based on the current fiber optic technology, a new computational model, called a linear array with a reconfigurable pipelined abus system (LARPBS), is proposed in this paper. A parallel quicksort algorithm is implemented on the model, and its time complexity is analyzed. For a set of N numbers, the quicksort algorithm reported in this paper runs in O(log2 N) average time on a linear array with a reconfigurable pipelined bus system of size N. If the number of processors available is reduced to P, where P < N, the algorithm runs in O((N/P) log2 N) average time and is still scalable. Besides proposing a new algorithm on the model, some basic data movement operations involved in the algorithm are discussed. We believe that these operations can be used to design other parallel algorithms on the same model. Future research in this area is also identified in this paper. 相似文献
8.
V. Singh V. Kumar G. Agha C. Tomlinson 《International journal of parallel programming》1991,20(2):95-131
We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al.
(1) and Schnorret al.
(2) have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort. 相似文献
9.
Voronoi diagram is one of the most fundamental and important geometric data structures. Voronoi diagram was historically defined for a set of points on the plane. The diagram partitions the plane into regions, one per site. The region of a site s consists of all points closer to s than to any other sites on the plane. Concepts of Voronoi diagram are often attributed to Voronoi (J. Reine Angew. Math. 133 (1907) 97) and Dirichlet (J. Reine Angew. Math. 40 (1850) 209). As a result of these early works, often the name Voronoi diagram and Dirichlet tessellation is used. Due to the importance of Voronoi diagrams, it is important that algorithms are devised to compute these structures in an efficient manner. Of course, this will create new opportunities for the applicability of these data structures. Towards this end, this paper presents new results for the computation of Voronoi diagrams for a set of n points, or n disjoint circles on the plane, on a mesh with multiple broadcasting (MMB) of size n×n. The algorithm runs in O(log2 n) time. 相似文献
10.
Reconfigurable robots can be defined as a group of robots that can have different geometries, thus obtaining different structures derived from the basic one, having different degrees of freedom and workspaces. Thanks to the optimum dexterity they offer, the user can accomplish a large variety of industrial tasks, using a structurally optimized robot leading towards better energy control and efficiency especially in case of batch size production lines where the task (for the robot) may vary periodically. Reconfigurable systems are a challenge for numerous scientists, due to the advantage of dealing with changes and uncertainties on the ever-changing manufacturing market. One of the main problems of reconfigurable robots is the proper structural geometry determination, so that the resulting structure is able to perform a variety of tasks. This paper presents the structural design of an innovative parallel robot with six degrees of freedom and its proposed configurations with five, four, three and two degrees of freedom. The kinematic analysis and the workspace representations of all the presented configurations of the parallel robot, called Recrob, are also presented. 相似文献
11.
The computation of Euclidean distance maps (EDM), also called Euclidean distance transform, is a basic operation in computer vision, pattern recognition, and robotics. Fast computation of the EDM is needed since most of the applications using the EDM require real-time computation. It is shown in L. Chen and H.Y.H. Chuang [Information Processing Letters, 51, pp. 25–29 (1994)] that a lower bound Ω(n2) is required for any sequential EDM algorithm due to the fact that in any EDM algorithm each of the n2 pixels has to be scanned at least once. Recently, many parallel EDM algorithms have been proposed to speedup its computation. Chen and Chuang proposed an algorithm for computing the EDM on an n×n mesh in O(n) time [L. Chen and H.Y.H. Chuang Parallel Computing, 21, pp. 841–852 (1995)]. Clearly, the VLSI complexities of both the sequential and the mesh algorithm described in L. Chen and H.Y.H. Chuang [Parallel Computing, 21, pp. 841–852 (1995)] are AT2=O(n4), where A is the VLSI layout area of the design and T is the computation time using area A when implemented in VLSI. In this paper, we propose a new and faster parallel algorithm for computing the EDM problem on the reconfigurable VLSI mesh model. For the same problem, our algorithm runs in O(1) time on a two-dimensional n2×n2 reconfigurable mesh. We show that the VLSI complexity of our algorithm is the same as those of the above sequential algorithm and the mesh algorithm, while it uses much less time. To our best knowledge, this is the first constant-time EDM algorithm on any parallel computational model. 相似文献
12.
Optical interconnections attract many engineers and scientists’ attention due to their potential for gigahertz transfer rates and concurrent access to the bus in a pipelined fashion. These unique characteristics of optical interconnections give us the opportunity to reconsider traditional algorithms designed for ideal parallel computing models, such as PRAMs. Since the PRAM model is far from practice, not all algorithms designed on this model can be implemented on a realistic parallel computing system. From this point of view, we study Cole’s pipelined merge sort [Cole R. Parallel merge sort. SIAM J Comput 1988;14:770–85] on the CREW PRAM and extend it in an innovative way to an optical interconnection model, the LARPBS (Linear Array with Reconfigurable Pipelined Bus System) model [Pan Y, Li K. Linear array with a reconfigurable pipelined bus system—concepts and applications. J Inform Sci 1998;106;237–58]. Although Cole’s algorithm is optimal, communication details have not been provided due to the fact that it is designed for a PRAM. We close this gap in our sorting algorithm on the LARPBS model and obtain an O(log N)-time optimal sorting algorithm using O(N) processors. This is a substantial improvement over the previous best sorting algorithm on the LARPBS model that runs in O(log N log log N) worst-case time using N processors [Datta A, Soundaralakshmi S, Owens R. Fast sorting algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distribut Syst 2002;13(3):212–22]. Our solution allows efficiently assign and reuse processors. We also discover two new properties of Cole’s sorting algorithm that are presented as lemmas in this paper. 相似文献
13.
Andrea Pietracaprina 《Information Processing Letters》2005,96(1):24-29
We present randomized and deterministic algorithms for many-to-one packet routing on an n-node two-dimensional mesh under the store-and-forward model. We consider the general instance of many-to-one routing where each node is the source (resp., destination) of ? (resp., k) packets, for arbitrary values of ? and k. All our algorithms run in optimal time and use queues of only constant size at each node to store packets in transit. The randomized algorithms, however, are simpler to implement. Our result closes a gap in the literature, where time-optimal algorithms using constant-size queues were known only for the special cases ?=1 and ?=k. 相似文献
14.
Jerry L. Trahan Mingxian Jin Wittaya Chantamas Johnnie W. Baker 《Journal of Parallel and Distributed Computing》2010
The MASC (Multiple ASsociative Computing) model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads and supports associative SIMD computing on each of its threads. There have been a wide range of algorithms developed for this model. Research on using this model in real-time system applications and building a scalable MASC architecture is currently quite active. In this paper, we present simulations between MASC and reconfigurable bus-based models, e.g., various versions of the Reconfigurable Multiple Bus Machine (RMBM). Constant time simulations of the basic RMBM by MASC and vice versa are obtained. Simulations of the segmenting RMBM, fusing RMBM, and extended RMBM by MASC in non-constant time are also discussed. By taking advantage of previously established relationships between RMBM and two other popular parallel computational models, namely, the Reconfigurable Mesh (RM) and the Parallel Random Access Machine (PRAM), we extend our simulation results to further categorize the power of the MASC model in relation to RM and PRAM. 相似文献
15.
This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design. 相似文献
16.
Given a string of lengthn, this short paper first presents anO(1)-time parallel algorithm for finding all initial palindromes and periods of the string on ann×n reconfigurable mesh (RM). Then, under the same cost (= time × the number of processors =O(n
2)), we provide a partitionable strategy when the RM doesn’t offer sufficient processors; this overcomes the hardware limitation
and is very suitable for VLSI implementation.
Prof. Chung was supported in part by the National Science Council of R. O. C. under contracts NSC87-2213-E011-001 and NSC87-2213-E011-003. 相似文献
17.
In this paper we present efficient algorithms for packet routing on the reconfigurable linear array and the reconfigurable
two-dimensional mesh. We introduce algorithms that are efficient in the worst case and algorithms that are better on average.
The time bounds presented are better than those achievable on the conventional mesh and previously known algorithms. We present
two variants of the reconfigurable mesh. In the first model, M
r
, the processors are attached to a reconfigurable bus, the individual edge connections being bidirectional. In the second
model, M
mr
, the processors are attached to two unidirectional buses. In this paper we present lower bounds and nearly matching upper
bounds for packet routing on these two models. As a consequence, we solve two of the open problems mentioned in [9].
Received August 17, 1998; revised November 3, 1999. 相似文献
18.
K.-H. Hsu 《Computer Physics Communications》2006,174(12):948-960
A parallel electrostatic Poisson's equation solver coupled with parallel adaptive mesh refinement (PAMR) is developed in this paper. The three-dimensional Poisson's equation is discretized using the Galerkin finite element method using a tetrahedral mesh. The resulting matrix equation is then solved through the parallel conjugate gradient method using the non-overlapping subdomain-by-subdomain scheme. A PAMR module is coupled with this parallel Poisson's equation solver to adaptively refine the mesh where the variation of potentials is large. The parallel performance of the parallel Poisson's equation is studied by simulating the potential distribution of a CNT-based triode-type field emitter. Results with ∼100 000 nodes show that a parallel efficiency of 84.2% is achieved in 32 processors of a PC-cluster system. The field emission properties of a single CNT triode- and tetrode-type field emitter in a periodic cell are computed to demonstrate their potential application in field emission prediction. 相似文献
19.
Youngmann Kim Author Vitae Author Vitae Sungwoo Tak Author Vitae 《Journal of Systems and Software》2009,82(10):1588-1599
We present an issue of the dynamically reconfigurable hardware-software architecture which allows for partitioning networking functions on a SoC (System on Chip) platform. We address this issue as a partition problem of implementing network protocol functions into dynamically reconfigurable hardware and software modules. Such a partitioning technique can improve the co-design productivity of hardware and software modules. Practically, the proposed partitioning technique, which is called the ITC (Inter-Task Communication) technique incorporating the RT-IJC2 (Real-Time Inter-Job Communication Channel), makes it possible to resolve the issue of partitioning networking functions into hardware and software modules on the SoC platform. Additionally, the proposed partitioning technique can support the modularity and reuse of complex network protocol functions, enabling a higher level of abstraction of future network protocol specifications onto the SoC platform. Especially, the RT-IJC2 allows for more complex data transfers between hardware and software tasks as well as provides real-time data processing simultaneously for given application-specific real-time requirements. We conduct a variety of experiments to illustrate the application and efficiency of the proposed technique after implementing it on a commercial SoC platform based on the Altera’s Excalibur including the ARM922T core and up to 1 million gates of programmable logic. 相似文献
20.
Dimensional analysis reduces a complicated ten-parameter formula for the execution time of the Linpack benchmark to a simpler two-parameter formula. These two parameters are ratios of software forces and hardware forces that determine a self-similarity surface. Machines move along paths on this surface as the problem size and the number of processors change. Two machines scale the same way, they move along the same path, if they have the same hardware forces. To design efficient algorithms, the programmer must produce software forces large enough to overcome the hardware forces. Modern machines have larger hardware forces than older machines and are harder to program. 相似文献