首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
The distributed implementation of P systems on a cluster of processors has met with a bottleneck communications problem. When the number of membranes grows in the system, the network gets congested and the time taken to execute an evolution step degrades. In this article, we suggest a software architecture denominated “partially parallel evolution with partially parallel communication”, where some membranes are located in each processor, proxies are used to communicate with membranes located in different processors, and a policy of access control to the communications network is mandatory. With all this, we get a certain parallelism in the system and an acceptable functioning in communications. In addition to this, it establishes a series of equations that allows us to determine in the architecture the optimum number of processors needed, the time required to execute an evolution step, the number of membranes to be located in each processor, and the conditions to determine when it is best to use the distributed solution or the sequential one. This work was presented in part at the 12th International Symposium on Artificial Life and Robotics, Oita, Japan, January 25–27, 2007  相似文献   

2.
In the present paper we discuss a general approach to solve boundary value problems numerically in a parallel environment. The basic algorithm consists of two steps: the local step, where all the P available processors work in parallel, and the global step, where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are twofold: First, this suggested approach is very flexible, especially in the local step, and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Second, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples uing this strategy are discussed.  相似文献   

3.
针对网络并行环境的计算能力强而通信相对较慢的实际情况,给出了一种局域网上求解线性方程组的并行Gauss-Seidel迭代算法.该算法将线性方程组的系数矩阵及右端项按行分块,然后将分块的系数矩阵及右端项按卷帘方式存储在各处理机,每次迭代通过循环传送已求出的部分解分量以减少处理机间的通信开销,提高并行算法的效率.试验结果表明该算法具有较高的并行效率和加速比.  相似文献   

4.
This paper addresses a multiprocessor generalization of the preemptive open-shop scheduling problem. The set of processors is partitioned into two groups and the operations of the jobs may require either single processors in either group or simultaneously all processors from the same group. We consider two variants depending on whether preemptions are allowed at any fractional time points or only at integer time points. We reduce the former problem to solving a linear program in strongly polynomial time, while a restricted version of the second problem is solved by rounding techniques. Applications to course scheduling and hypergraph edge coloring are also discussed.  相似文献   

5.
§1.引言 对Boltzmann方程求解,采用连续截面、精确角分布的蒙特卡罗模拟(下简记为MC),可以获得理想的结果,然而MC方法计算耗时多是其相对其它方法的最大不足,并行计算和高加速比是克服这种不足的可行途径。  相似文献   

6.
The Connection Machine (CM) has been demonstrated to be an efficient and fast computational engine for the solution of many problems related to image processing. The high-level parallelism of the CM naturally fits to many large-scale data intensive applications. In this paper the implementation of parallel algorithms for the analysis of multidimensional images on the CM is presented. Different aspects in the analysis of multidimensional images are considered. In the field of artificial vision, the implementation of algorithms for the filtering of image sequences (both in space and time) and the estimation of the optical flow is described and some results in terms of accuracy and computation time are presented. The processing of three-dimensional images is investigated in the field of biomedical engineering. In this case the goal is the development of algorithms for the 3-D reconstruction of human body segments and their visualization. The parallel implementations exploit the fine grain parallelism allowed by the CM, processing each point of the data on a different processor. This mechanism is allowed by the possibility of dynamically reconfiguring the connectivity of the CM nodes and of defining a huge number of virtual processors. Moreover, as the CM processors operate on one-bit data, it is possible to tune the number of bits for each data point to match the accuracy required by the application.  相似文献   

7.
To support large numbers of model neurons, neuromorphic vision systems are increasingly adopting a distributed architecture, where different arrays of neurons are located on different chips or processors. Spike-based protocols are used to communicate activity between processors. The spike activity in the arrays depends on the input statistics as well as internal parameters such as time constants and gains. In this paper, we investigate strategies for automatically adapting these parameters to maintain a constant firing rate in response to changes in the input statistics. We find that under the constraint of maintaining a fixed firing rate, a strategy based upon updating the gain alone performs as well as an optimal strategy where both the gain and the time constant are allowed to vary. We discuss how to choose the time constant and propose an adaptive gain control mechanism whose operation is robust to changes in the input statistics. Our experimental results on a mobile robotic platform validate the analysis and efficacy of the proposed strategy.  相似文献   

8.
In this paper, the notion of multiprocessor automata is introduced. The multiprocessor automaton may be thought of as the simplest model of parallel computations: there are more than one processors reading information from the tape simultaneously, and the switching function for each processor depends on the inner states of all processors on the step of computations. We prove some results on the hierarchies of two-way and one-way multiprocessor automata languages.  相似文献   

9.
This paper discusses the development of an automatic mesh generation technique designed to operate effectively on multiple instruction multiple data (MIMD) parallel computers. The meshing approach is hierarchical, that is, model entities are meshed after their boundaries have been meshed. Focus is on the region meshing step. An octree is constructed to serve as a localization tool and for efficiency. The tree is also key to the efficient parallelization of the meshing process since it supports the distribution of load to processors. The parallel mesh generation procedure repartitions the domain to be meshed and applies on processor face removals until all face removals with local data have been performed. The portion of the domain to be meshed remaining is dynamically repartitioned at the octant level using an Inertial Recursive Bisection method and local face removals are reperformed. Migration of a terminal octant involves migration of the octant data and the octant's mesh faces and/or mesh regions. Results show relatively good speed-ups for parallel face removals on small numbers of processors. Once the three-dimensional mesh has been generated, mesh regions may be scattered across processors. Therefore, a final dynamic repartitioning step is applied at the region level to produce a partition ready for finite element analysis.  相似文献   

10.
宋彭涛  田斌  蒋烈辉  李继中  王九宇 《计算机工程》2010,36(21):280-282,285
提出一种基于ISS的多处理器嵌入式系统模拟方案。采用基于总线的互连方式,合理利用共享内存机制,解决不同处理器进程间的通信问题。提出全局时钟同步机制,实现对所有处理器单元的调度安排,使各处理器之间保持步调一致。分析表明,该方案能够实现对单个或多个同源或不同源目标代码的模拟与跟踪。  相似文献   

11.
Solution of independent sets of linear banded systems is a core part of implicit numerical algorithms. In this study we propose a novel pipelined Thomas algorithm with low parallelization penalty. We introduce two-step pipelined algorithms (PAs) formally and show that the idle processor time is invariant with respect to the order of backward and forward steps. Therefore, the parallelization efficiency of the PA cannot be improved directly. However, the processor idle time can be used if some lines have been computed by the time processors become idle. We develop the immediate backward pipelined Thomas algorithm (IB-PTA). The backward step is computed immediately after the forward step has been completed for the first portion of lines. The advantage of the IB-PTA over the basic PTA is the presence of solved lines, which are available for other computations, by the time processors become idle. Implementation of the IB-PTA is based on a proposed static processor schedule that switches between forward and backward computations and controls communication between processors. Computations are performed on the Cray T3E MIMD computer. Combination of the proposed IB-PTA with the “burn from two ends” algorithm shows low parallelization penalty.  相似文献   

12.
Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary and a backup on two different processors. In this paper, we address the problem of how to schedule DAGs in Grids with communication delays so that service failures can be avoided in the presence of processors faults. The challenge is, that as tasks in a DAG have dependence on each other, a task must be scheduled to make sure that it will succeed when any of its predecessors fails due to a processor failure. We first propose a communication model and determine when communications between a backup and backups of its successors are necessary. Then we determine when a backup can start and its eligible processors so as to guarantee that every DAG can complete upon any processor failure. We develop two algorithms to schedule backups, which minimize response time and replication cost, respectively. We also develop a suboptimal algorithm which targets minimizing replication cost while not affecting response time. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.  相似文献   

13.
A constant-time algorithm for labeling the connected components of an N×N image on a reconfigurable network of N3 processors is presented. The main contribution of the algorithm is a novel constant-time technique for determining the minimum-labeled PE in each component. The number of processors used by the algorithm can be reduced to N/sup 2+(1/d/), for any 1⩽d⩽log N, if O(d) time is allowed  相似文献   

14.
《Parallel Computing》1988,7(2):249-251
We describe an O(log(n)) time with O(n) processors optimal algorithm for finding the maximal elements of a set. The model of parallel computation we consider is the CREW-PRAM, i.e. it is the synchronous shared memory model where concurrent reads are allowed but no two processors can simultaneously attempt to write in the same memory location (even if they are trying to write the same thing).  相似文献   

15.
Two techniques to perform an irregularly structured Gröbner basis computation (a basic method used in symbolic polynomial manipulation) on distributed memory machines are developed. The first technique, based on relaxation of dependencies present in the sequential computation, exploits coarse-grain parallelism. In this so-called relaxation approach, at every step, each processor reduces a local pair if available, communicates the result and status information from other processors, and updates the local set of pairs and basis. The basis is replicated on each processor while the set of pairs is distributed across the processors. The computation terminates when no pairs are left to be reduced on each processor. A relaxation algorithm based on this approach, along with its experimental results, are provided. The other technique, named quasi-barrier, is developed to enhance the performance of the relaxation algorithm. Using this technique, load balance and performance can be improved by synchronizing p processors when a fraction of the active tasks are completed. The performance enhancement is significant for large numbers of processors when the distribution of pair reduction times is close to exponential. The experimental results obtained on Intel Paragon and IBM SP2 demonstrate the effectiveness of these techniques.  相似文献   

16.
Two processors jointly provide a real-time service which can be completed by exactly one processor. Assuming each processor is allowed to announce only a one-bit information in a distributed way to decide which one should process the job, inevitably some of the jobs will get lost if only classical resources are used. In this paper, we proposed the distributed quantum entanglement sharing (DQES) model to share quantum entanglement with processors. Assisted with DQES model, not only the system dependability can be enhanced, but the faulty processor can also be identified. We also presented some possible applications such like database consistency, job scheduling, system dependability, and reliable communication protocols.  相似文献   

17.
This paper introduces the parallelization on a distributed memory multicomputer of two iterative methods for finding all the roots of a given polynomial. The parallel algorithms share the computation of the roots among the processors and perform a total exchange of the data at each step. Since the amount of communications is the main drawback of this approach, we study the effect of the network topology on the performance of the algorithms. Particularly, we show that among the different classical processors networks topologies (ring, 2d-torus or n-cube), the hypercube topology minimizes the communications. For each topology is computed the optimal number of processors. Experiments on the hypercube FPS T40 illustrate the results.  相似文献   

18.
Caprani  Ole  Madsen  Kaj  Stauning  Ole 《Reliable Computing》1997,3(3):269-275
In the search for regions that contain fixed points of a real function of several variables, tests based on interval calculations can be used to establish existence or non-existence of fixed points in regions that are examined in the course of the search. The search can e.g. be performed as a synchronous (sequential) interval iteration: In each iteration step all components of the iterate are calculated based on the previous iterate. In this case it is straight forward to base simple interval existence and non-existence tests on the calculations done in each step of the iteration. The search can also be performed as an asynchronous (parallel) iteration: Only a few components are changed in each step and this calculation is in general based on components from different previous iterates. For the asynchronous iteration it turns out that simple tests of existence and non-existence can be based on the component wise calculations done in the course of the iteration. These component wise tests are useful for parallel implementation of the search, since the tests can then be performed local to each processor and only when a test is successful does a processor communicate this result to other processors.  相似文献   

19.
A parallel sorting algorithm using cooperating heaps in a linear array of processors is presented. It can sort a sequence whose length is much larger than the number of processors. Because the output begins one step after all the items have been input, sorting n items requires 2n + 1 steps. Two independent modifications of the algorithm are possible; one tries to reduce the number of processors used, and the other can sort more items on the same array.  相似文献   

20.
TOUGH2 is a widely used reservoir simulator for solving subsurface flow related problems such as nuclear waste geologic isolation, environmental remediation of soil and groundwater contamination, and geothermal reservoir engineering. It solves a set of coupled mass and energy balance equations using a finite volume method. This contribution presents the design and analysis of a parallel version of TOUGH2. The parallel implementation first partitions the unstructured computational domain. For each time step, a set of coupled non-linear equations is solved with Newton iteration. In each Newton step, a Jacobian matrix is calculated and an ill-conditioned non-symmetric linear system is solved using a preconditioned iterative solver. Communication is required for convergence tests and data exchange across partitioning borders. Parallel performance results on Cray T3E-900 are presented for two real application problems arising in the Yucca Mountain nuclear waste site study. The execution time is reduced from 7504 seconds on two processors to 126 seconds on 128 processors for a 2D problem involving 52,752 equations. For a larger 3D problem with 293,928 equations the time decreases from 10,055 seconds on 16 processors to 329 seconds on 512 processors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号