首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. Majority of today's applications use backpropagate feedforward ANN. In this paper, two methods of P pattern L layer ANN learning on n × n RMESH have been presented. One required memory space of O(nL) but conceptually is simpler to develop and the other uses pipelined approach which reduces the memory requirement to O(L). Both of these algorithms take O(PL) time and are optimal for RMESH architecture.  相似文献   

2.
In this paper we present an O(log n) time parallel algorithm for arithmetic expression evaluation, on an n × n processor array with reconfigurable bus system, where n is the sum of the number of operators and constants in the expression. The basic technique involved here is leaves-cutting (rake operation), as in the case of PRAM model algorithms available in the literature for this problem. The input to our algorithm is assumed to be the binary tree associated with a given expression (also known as expression tree with n number of nodes). Our algorithm is faster compared to the previous best time for expression evaluation on mesh connected computers which is O(√n).  相似文献   

3.
This paper presents the implementation and scaling of a neocortex inspired cognitive model on a Cray XD1. Both software and reconfigurable logic based FPGA implementations of the model are examined. This model belongs to a new class of biologically inspired cognitive models. Large scale versions of these models have the potential for significantly stronger inference capabilities than current conventional computing systems. These models have large amounts of parallelism and simple computations, thus allowing highly efficient hardware implementations. As a result, hardware-acceleration of these models can produce significant speedups over fully software implementations. Parallel software and hardware-accelerated implementations of such a model are investigated for networks of varying complexity. A scaling analysis of these networks is presented and utilized to estimate the throughput of both hardware-accelerated and software implementations of larger networks that utilize the full resources of the Cray XD1. Our results indicate that hardware-acceleration can provide average throughput gains of 75 times over software-only implementations of the networks we examined on this system.
Christopher N. VutsinasEmail:
  相似文献   

4.
Several new number representations based on a residue number system are presented which use the smallest prime numbers as moduli and are suited for parallel computations on a reconfigurable mesh architecture. The bit model of linear reconfigurable mesh with exclusive write and unit-time delay for broadcasting on a subbus is assumed. It is shown how to convert in O(1) time any integer, ranging between 0 and n-1, from any commonly used representation to any new representation proposed in this paper (and vice versa) using an n/spl times/O(log/sup 2/ n/log log n) reconfigurable mesh. In particular, some of the previously known conversion techniques are improved. Moreover, as a byproduct, it is shown how to compute in O(1) time the Prefix Sums of n bits by a reconfigurable mesh having the above mentioned size, thus improving previously known results. Applications to the Prefix Sums of n h-bit integers and to Approximate String Matching with /spl alpha/ mismatches are also considered. The Summation and the Prefix Sums can be computed in O(1) time using O(h log N+log/sup 2/ N/log log N)/spl times/Nh and O(h/sup 2/+log/sup 2/ N/log(h+log N))/spl times/O(N(h+log N)) reconfigurable meshes, respectively. Moreover, it is shown for the first time how to find in O(1) time all the occurrences of a pattern of length m in a text of length n, allowing less than /spl alpha/ mismatches, using a reconfigurable mesh of size O(m log|/spl Sigma/|)/spl times/O (n(log|/spl Sigma/|+log/sup 2/ /spl alpha//log log /spl alpha/)), where the pattern and the text are strings over a finite alphabet /spl Sigma/ and /spl alpha/相似文献   

5.
介绍几种典型可重构光总线并行计算模型及其高效的基本操作与并行算法,使人们更加了解光总线并行计算模型及优越性,为今后进一步研究光总线模型及其并行算法奠定基础。  相似文献   

6.
1 IntroductionLet G = (V, E) be a connected, undirected graph with a weight function W on the set Eof edges to the set of reals. A spanning tree is a subgraph T = (V, ET), ET G E, of C suchthat T is a tree. The weight W(T) of a spanning tree T is the sum of the weights of its edges.A spanning tree with the smallest possible'weight is called a minimum spanning tree (MST)of G. Computing an MST of a given weighted graph is an important problem that arisesin many applications. For this …  相似文献   

7.
Ray representation (Ray-rep) of a solid has been studied and used in the solid modeling community for many years because of its compactness and simplicity. This paper presents a parallel approach for mesh surface modeling from multi-material volume data using an extended Ray-rep as an intermediate, where every homogeneous region is enclosed by a set of two-manifold surface meshes on the resultant model. The approach consists of three major algorithms: firstly, an algorithm is developed to convert the given multi-material volumetric data into a Ray-rep for heterogeneous solid; secondly, filtering algorithm is exploited to process the rays of heterogeneous solid in parallel; and lastly, the adaptive mesh surfaces are generated from the Ray-rep through a dual-contouring like algorithm. Here the intermediate surfaces between two constituent materials can be directly extracted without building the volumetric mesh, and the manifold topology is preserved on each surface patch. Furthermore, general offset surface can be easily computed in this paradigm by designing a special parallel operator for the rays.  相似文献   

8.
We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al. (1) and Schnorret al. (2) have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort.  相似文献   

9.
Based on the current fiber optic technology, a new computational model, called a linear array with a reconfigurable pipelined abus system (LARPBS), is proposed in this paper. A parallel quicksort algorithm is implemented on the model, and its time complexity is analyzed. For a set of N numbers, the quicksort algorithm reported in this paper runs in O(log2 N) average time on a linear array with a reconfigurable pipelined bus system of size N. If the number of processors available is reduced to P, where P < N, the algorithm runs in O((N/P) log2 N) average time and is still scalable. Besides proposing a new algorithm on the model, some basic data movement operations involved in the algorithm are discussed. We believe that these operations can be used to design other parallel algorithms on the same model. Future research in this area is also identified in this paper.  相似文献   

10.
Voronoi diagram is one of the most fundamental and important geometric data structures. Voronoi diagram was historically defined for a set of points on the plane. The diagram partitions the plane into regions, one per site. The region of a site s consists of all points closer to s than to any other sites on the plane. Concepts of Voronoi diagram are often attributed to Voronoi (J. Reine Angew. Math. 133 (1907) 97) and Dirichlet (J. Reine Angew. Math. 40 (1850) 209). As a result of these early works, often the name Voronoi diagram and Dirichlet tessellation is used. Due to the importance of Voronoi diagrams, it is important that algorithms are devised to compute these structures in an efficient manner. Of course, this will create new opportunities for the applicability of these data structures. Towards this end, this paper presents new results for the computation of Voronoi diagrams for a set of n points, or n disjoint circles on the plane, on a mesh with multiple broadcasting (MMB) of size n×n. The algorithm runs in O(log2 n) time.  相似文献   

11.
Reconfigurable robots can be defined as a group of robots that can have different geometries, thus obtaining different structures derived from the basic one, having different degrees of freedom and workspaces. Thanks to the optimum dexterity they offer, the user can accomplish a large variety of industrial tasks, using a structurally optimized robot leading towards better energy control and efficiency especially in case of batch size production lines where the task (for the robot) may vary periodically. Reconfigurable systems are a challenge for numerous scientists, due to the advantage of dealing with changes and uncertainties on the ever-changing manufacturing market. One of the main problems of reconfigurable robots is the proper structural geometry determination, so that the resulting structure is able to perform a variety of tasks. This paper presents the structural design of an innovative parallel robot with six degrees of freedom and its proposed configurations with five, four, three and two degrees of freedom. The kinematic analysis and the workspace representations of all the presented configurations of the parallel robot, called Recrob, are also presented.  相似文献   

12.
Yi Pan  Keqin Li 《Information Sciences》1999,120(1-4):209-221
The computation of Euclidean distance maps (EDM), also called Euclidean distance transform, is a basic operation in computer vision, pattern recognition, and robotics. Fast computation of the EDM is needed since most of the applications using the EDM require real-time computation. It is shown in L. Chen and H.Y.H. Chuang [Information Processing Letters, 51, pp. 25–29 (1994)] that a lower bound Ω(n2) is required for any sequential EDM algorithm due to the fact that in any EDM algorithm each of the n2 pixels has to be scanned at least once. Recently, many parallel EDM algorithms have been proposed to speedup its computation. Chen and Chuang proposed an algorithm for computing the EDM on an n×n mesh in O(n) time [L. Chen and H.Y.H. Chuang Parallel Computing, 21, pp. 841–852 (1995)]. Clearly, the VLSI complexities of both the sequential and the mesh algorithm described in L. Chen and H.Y.H. Chuang [Parallel Computing, 21, pp. 841–852 (1995)] are AT2=O(n4), where A is the VLSI layout area of the design and T is the computation time using area A when implemented in VLSI. In this paper, we propose a new and faster parallel algorithm for computing the EDM problem on the reconfigurable VLSI mesh model. For the same problem, our algorithm runs in O(1) time on a two-dimensional n2×n2 reconfigurable mesh. We show that the VLSI complexity of our algorithm is the same as those of the above sequential algorithm and the mesh algorithm, while it uses much less time. To our best knowledge, this is the first constant-time EDM algorithm on any parallel computational model.  相似文献   

13.
Optical interconnections attract many engineers and scientists’ attention due to their potential for gigahertz transfer rates and concurrent access to the bus in a pipelined fashion. These unique characteristics of optical interconnections give us the opportunity to reconsider traditional algorithms designed for ideal parallel computing models, such as PRAMs. Since the PRAM model is far from practice, not all algorithms designed on this model can be implemented on a realistic parallel computing system. From this point of view, we study Cole’s pipelined merge sort [Cole R. Parallel merge sort. SIAM J Comput 1988;14:770–85] on the CREW PRAM and extend it in an innovative way to an optical interconnection model, the LARPBS (Linear Array with Reconfigurable Pipelined Bus System) model [Pan Y, Li K. Linear array with a reconfigurable pipelined bus system—concepts and applications. J Inform Sci 1998;106;237–58]. Although Cole’s algorithm is optimal, communication details have not been provided due to the fact that it is designed for a PRAM. We close this gap in our sorting algorithm on the LARPBS model and obtain an O(log N)-time optimal sorting algorithm using O(N) processors. This is a substantial improvement over the previous best sorting algorithm on the LARPBS model that runs in O(log N log log N) worst-case time using N processors [Datta A, Soundaralakshmi S, Owens R. Fast sorting algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distribut Syst 2002;13(3):212–22]. Our solution allows efficiently assign and reuse processors. We also discover two new properties of Cole’s sorting algorithm that are presented as lemmas in this paper.  相似文献   

14.
We present randomized and deterministic algorithms for many-to-one packet routing on an n-node two-dimensional mesh under the store-and-forward model. We consider the general instance of many-to-one routing where each node is the source (resp., destination) of ? (resp., k) packets, for arbitrary values of ? and k. All our algorithms run in optimal time and use queues of only constant size at each node to store packets in transit. The randomized algorithms, however, are simpler to implement. Our result closes a gap in the literature, where time-optimal algorithms using constant-size queues were known only for the special cases ?=1 and ?=k.  相似文献   

15.
李环  周帅锋 《计算机应用》2009,29(6):1687-1689
传统的自由变形方法通过计算控制晶格来变形,不适用于精确变形的情况。针对此问题,提出一种基于多不动点约束的网格模型局部编辑算法,该算法通过多个不动点的合理配置设定多种复杂的约束条件,实现网格模型的局部编辑,进而精确的变形模型。实验表明算法计算代价低,可实现精确变形。  相似文献   

16.
并行计算技术是衡量一个国家科技水平的重要标志之一,PC机群计算机是最廉价的高性能计算机.本文构建了一个Beowulf-T机群系统,提出了在该系统上的加速比和效率计算公式.通过实例的测试说明该机群系统具有高可扩展性.  相似文献   

17.
The MASC (Multiple ASsociative Computing) model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads and supports associative SIMD computing on each of its threads. There have been a wide range of algorithms developed for this model. Research on using this model in real-time system applications and building a scalable MASC architecture is currently quite active. In this paper, we present simulations between MASC and reconfigurable bus-based models, e.g., various versions of the Reconfigurable Multiple Bus Machine (RMBM). Constant time simulations of the basic RMBM by MASC and vice versa are obtained. Simulations of the segmenting RMBM, fusing RMBM, and extended RMBM by MASC in non-constant time are also discussed. By taking advantage of previously established relationships between RMBM and two other popular parallel computational models, namely, the Reconfigurable Mesh (RM) and the Parallel Random Access Machine (PRAM), we extend our simulation results to further categorize the power of the MASC model in relation to RM and PRAM.  相似文献   

18.
This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design.  相似文献   

19.
Given a string of lengthn, this short paper first presents anO(1)-time parallel algorithm for finding all initial palindromes and periods of the string on ann×n reconfigurable mesh (RM). Then, under the same cost (= time × the number of processors =O(n 2)), we provide a partitionable strategy when the RM doesn’t offer sufficient processors; this overcomes the hardware limitation and is very suitable for VLSI implementation. Prof. Chung was supported in part by the National Science Council of R. O. C. under contracts NSC87-2213-E011-001 and NSC87-2213-E011-003.  相似文献   

20.
In this paper we present efficient algorithms for packet routing on the reconfigurable linear array and the reconfigurable two-dimensional mesh. We introduce algorithms that are efficient in the worst case and algorithms that are better on average. The time bounds presented are better than those achievable on the conventional mesh and previously known algorithms. We present two variants of the reconfigurable mesh. In the first model, M r , the processors are attached to a reconfigurable bus, the individual edge connections being bidirectional. In the second model, M mr , the processors are attached to two unidirectional buses. In this paper we present lower bounds and nearly matching upper bounds for packet routing on these two models. As a consequence, we solve two of the open problems mentioned in [9]. Received August 17, 1998; revised November 3, 1999.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号