首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup Sp O(N/3) and an efficiency Ep 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors.  相似文献   

3.
A scalable backplane topology which allows a practically unlimited number of modules with identical interfaces is presented. Short, buffered, point-to-point connections overcome clock skew problems. Synchronized, pipelined data transfer operations ensure high throughput and reasonably low latency times for fine-grain parallel algorithms. A simple bus interface logic without any special hardware configuration guarantees a cheap implementation with standard FPGAs. The measured performance in our FPGA based prototype with 32 bit wide data bus shows a throughput of 160 Mbytes/s for each module with 75 ns latency time between modules.  相似文献   

4.
For the multiprocessor systems of the hierarchical-architecture relational databases, a new approach to data layout and load balancing was proposed. Described was a database multiprocessor model enabling simulation and examination of arbitrary multiprocessor hierarchical configurations in the context of the on-line transaction processing applications. An important subclass of the symmetrical multiprocessor hierarchies was considered, and a new data layout strategy based on the method of partial mirroring was proposed for them. The disk space used to replicate the data was evaluated analytically. For the symmetrical hierarchies having certain regularity, theorems estimating the laboriousness of replica formation were proved. An efficient method of load balancing on the basis of the partial mirroring technique was proposed. The methods described are oriented to the clusters and Grid-systems.  相似文献   

5.
Requirements for tools analyzing the performance of parallel programs with respect to parallel and sequential parts, overhead, and load balance, as well as available tools for programs parallelized with Cray Microtasking or Autotasking are described.  相似文献   

6.
In this paper we report on an event-based stochastic architecture for the Adams/McKay Bayesian Online Change Point Detection algorithm (BOCPD) [1]. In the stochastic computational structures, probabilities are represented natively as stochastic events and computation is carried out directly with these probabilities and not probability density functions. A fully programmable BOCPD processor is synthesized in VHDL. The BOCPD algorithm with on-line learning, to perform foreground/background image segmentation with online learning. Running on a single Kintex 7 FPGA (Opal Kelly XEM7350-K410T) the architecture is capable of real-time processing a 160 × 120 pixels image, at 10 frames per second.  相似文献   

7.
王萌  黄振  陆建华 《微计算机信息》2007,23(26):201-203
脉冲到达角(DOA)是脉冲信号分选中可利用的重要参数。目前,利用DOA进行的脉冲分选都是基于传统的串行聚类算法,实时性能差。本文针对阈值分割的聚类方式,设计了一种基于并行流水结构的实时聚类算法,使单个DOA的聚类可在单周期内完成,并通过对聚类数目分裂过多的情况进行控制,保证了算法的稳定性和有效性。文章还介绍了算法在FPGA上的实现方法,以及应用在XilinxV2P芯片上的实时性能,并对其聚类性能进行了比较分析。  相似文献   

8.
Models of parallel computations are considered for a wide class of data processing programs. Properties of programs are investigated and approaches to parallelizing sequential data processing programs and designing parallel programs are proposed. Computation optimizing problems are formulated.Translated from Kibernetika, No. 4, pp. 1–8, 42, July–August, 1989.  相似文献   

9.
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneous broadcasts are becoming feasible. Distributed shared memory (DSM) implementations on such networks promise high performance even for small applications with small granularity. This paper, after summarizing the architecture of one such implementation called the Simultaneous Multiprocessor Optical Exchange Bus (SOME-Bus), presents simple algorithms for improving the performance of parallel programs running on the SOME-Bus multiprocessor implementing cache-coherent DSM. The algorithms are based on run-time data redistribution via dynamic page migration protocol. They use memory access references together with the information of average channel utilization, average channel waiting time, number of messages in the channel queue or short-term average channel waiting time reported by each node and gathered by hardware monitors to make correct decisions related to the placement of shared data. Simulations with four parallel codes on a 64-processor SOME-Bus show that the algorithms yield significant performance improvements such as reduction in the execution times, number of remote memory accesses, average channel waiting times, average network latencies and increase in average channel utilizations.  相似文献   

10.
Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problemmay”may arise whenever data move back and forth between the caches or local memories in different processors.Previous techniques can only deal with the rather simple cases with one linear function in the perfactly nested loop.In this paper,we present a parallel program optimizing technique called hybri loop interchange(HLI)for the cases with multiple linear functions and loop-carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reucing the program parallelism.  相似文献   

11.
Conclusion The use of linear-time automatic algorithms for sequential program parallelization substantially expands the scope of MCS. Parallel implementation is justified from time considerations even when the application program is executed only a few times without modification (only once when the macrooperation system is introduced), i.e., for frequently reprogrammed jobs. Only linear-time parallel-branch algorithms ensure that the execution frequency of the application program that justifies parallel implementation is independent of the number of statements in the program and allow the size (computational complexity) of the macrooperations treated as parallelization units to be varied between wide limits both in the direction of greater aggregation and in the direction of refinement. At the same time, the linear-time parallel-program generation algorithms proposed in this paper produce programs of acceptable quality. Translated from Kibernetika i Sistemnyi Analiz, No. 5, pp. 170–179, September–October, 1995.  相似文献   

12.
Eclipse is a scalable architecture template for designing data-dependent stream-processing subsystems of media-processing SoCs. It combines application configuration flexibility with the efficiency of function-specific coprocessors that concurrently execute the tasks of one or more applications  相似文献   

13.
J.R. Woodwark 《Displays》1984,5(2):97-103
A simple way of using many processors in picture generation is to allocate each one to a region of the display screen. A device is proposed which uses such an architecture to produce pictures direct from set-theoretic solid models. Alternative strategies for partitioning the screen area between processors are advanced, and evaluated by means of a simulation.  相似文献   

14.
15.
The problem of preemptive scheduling in a real-time multiprocessor computing system with release time/deadline intervals is investigated. Approximate algorithms based on the generalization of a single-processor algorithm of relative priority are developed and compared to the exact maximum flow algorithm. An algorithm has been developed for the case where requests for the tasks occur periodically with given periods. An algorithm for determining the values of the processor performance for which there exists an admissible schedule for a given assembly of tasks with release time/deadline intervals has been developed.  相似文献   

16.
17.
The Journal of Supercomputing - The Elliptic curve cryptosystem is a public-key cryptosystem that receives more focus in recent years due to its higher security with smaller key size when compared...  相似文献   

18.
Artificial intelligence methods appear to be particularly well suited for control design when only inexact prior knowledge about the system to be controlled is available. Design tasks that can be solved include learning control from scratch, improving partial control knowledge, and controller tuning. The paper enlightens these approaches in two case studies, both dealing with nonlinear unstable systems: inverted pendulum control, and position control of a floating object. Comparison to the classical model-based control design approaches is also provided.  相似文献   

19.
Lars Lundberg 《Software》1989,19(8):787-800
This paper describes the development of a parallel Ada system on an experimental MIMD multiprocessor. The system enables a single unmodified Ada program, with a number of tasks, to execute in parallel on different processors. Allocation and migration strategies are controlled by mechanisms in the run-time system, and are thus transparent to the Ada programmer. The parallel Ada system is based on a validated portable front-end compiler. Implementation issues related to the multiprocessor environment are pointed out, and solutions to these issues are suggested. The experimental multiprocessor environment, consisting of both hardware and software, is described. Applicable resource allocation strategies in, and feasible experiments with, the Ada system are discussed. The complete experimental system provides unique possibilities to experiment with, and monitor the effects of, design decisions at different levels in a multiprocessor environment.  相似文献   

20.
Software for safety-critical systems, such as avionic, medical, defense, and manufacturing systems, must be highly reliable since failures can have catastrophic consequences. While existing methods, such as formal techniques, testing, and fault-tolerant software, can significantly enhance software reliability, they have some limitations in achieving ultrahigh reliability requirements. Formal methods are not able to cope with specification faults, testing is not able to provide high assurance, and fault-tolerant software based on diverse designs is susceptible to common-mode failures. We present a new approach that starts with a decomposition of the system requirements into a conjunction of subtasks (goals and constraints). The system state space is then projected onto a restricted space that is specialized for a subtask. The control problem corresponding to each subtask is solved and validated in its restricted “view” of the system state space. To allow the programs for the individual subtasks to be easily composed together, the model for each subtask is relational rather than functional, i.e., it represents a set of control trajectories for each input rather than just one trajectory. The overall system is obtained by composing the models for the subtasks using well-defined set intersection and union operations. The relational approach has several significant advantages. With appropriate priority assignments, it provides strong guarantees that the safety-critical components are immune to defects in other components of the system. Also, the system reliability can be rigorously derived from the component reliabilities. This significantly reduces the validation effort since the number of states and transitions in the decomposition is a fraction of those in the overall system. The system can be composed from its components either statically or dynamically; the latter facilitates on-the-fly maintenance as well as incorporation of advanced adaptive and evolving control programs. The paper contains a detailed example to illustrate the relational approach. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号