共查询到20条相似文献,搜索用时 15 毫秒
1.
O. L. Bandman 《Programming and Computer Software》2001,27(4):170-182
In the paper, a systematic discussion is made of state of the art theory and applications of fine-grained (massive) parallelism, which is permanently being adopted in computational mathematics. All known models of fine-grained computations (cellular automaton, neural and cellular neural networks, statistical automata, etc.) are represented in terms of a unique formalism, the so-called parallel substitution algorithm (PSA), which made it possible, on the one hand, to highlight common properties of the models and, on the other hand, to demonstrate expressiveness capabilities of the PSA. Theoretical and experimental results of studies on applications of fine-grained algorithms to modeling of spatial dynamics of reaction–diffusion and molecular processes are presented. Promising prospects of their application are substantiated both for the creation of special processors designed for the implementation of the algorithms and for the implementation of the algorithms on multiprocessor systems. 相似文献
2.
L. D. Popov 《Automation and Remote Control》2007,68(5):901-911
Various schemes are considered of the parallel implementation of the branch and bound method, as applied to multiprocessor computing systems (clusters) with the distributed memory. In the language of informal automata, questions are set out of the organization of the exchange of data and signals within the cluster, which afford the asynchronous operation of its processors. Common ideas are illustrated by the example of the classical traveling salesman problem and data of numerical experiments performed on the multiprocessor computing system-100 (MCS-100) are given. 相似文献
3.
The Cydra 5 is a heterogeneous multiprocessor system that targets small work groups or departments of scientists and engineers. The two types of processors are functionally specialized for the different components of the work load found in a departmental setting. The Cydra 5 numeric processor, based on a directed-data-flow architecture, provides consistently high performance on a broader class of numerical computations. The interactive processors offload all nonnumeric work from the numeric processor, leaving it free to spend all its time on the numeric application. The I/O processors permit high-bandwidth I/O transitions with minimal involvement from the interactive or numeric processors. The system architecture and data-flow architecture are described. The numeric processor decisions and tradeoffs are examined, and the main memory system is discussed. Some reflections on the design issues are offered 相似文献
4.
《CVGIP: Image Understanding》1991,53(1):31-45
This paper proposes a new parallel architecture, which has the potential to support low-level image processing as well as intermediate and high-level vision analysis tasks efficiently. The integrated architecture consists of an SIMD mesh of processors enhanced with multiple broadcast buses, and MIMD multiprocessor with orthogonal access buses, and a two-dimensional shared memory array. Low-level image processing is performed on the mesh processor, while intermediate and high-level vision analysis is performed on the orthogonal multiprocessor. The interaction between the two levels is supported by a common shared memory. Concurrent computations and I/O are made possible by partitioning the memory into disjoint spaces so that each processor system can access a different memory space. To illustrate the power of such a two-level system, we present efficient parallel algorithms for a variety of problems from low-level image processing to high-level vision. Representative problems include matrix based computations, histogramming and key counting operations, image component labeling, pyramid computations, Hough transform, pattern clustering, and scene labeling. Through computational complexity analysis, we show that the integrated architecture meets the processing requirements of most image understanding tasks. 相似文献
5.
PAWEL SIWAK 《国际通用系统杂志》2013,42(1-3):181-129
Certain special computational models called the filter automata (FA) are presented. We use them in iterative, pipelined processing of discrete sequences. Such computations reveal characteristic moving, periodic objects, with soliton-like properties referred to as the filtrons. Filter automata play the role of a transmitting medium or a field, while filtrons behave like waves or particles We characterise FA by showing their distinguishing features, and fillrons—by introducing some parameters, such as edge numbers, number of FA cycles involved, distribution of internal idle pulses, etc. We show that filtrons may be generated in parallel by a multiprocessor ring net. This means that each filtron has its associated ring computation. If such a computation succeeds then its companion filtron exists, too. The link between filtrons and their ring computations is established simply by the filtron edge numbers. This gives a method of calculating a whole filtron if its edge number and supporting FA are known, and of looking for possible filter automata capable of supporting a prescribed filtron A number of FA computational phenomena, recently discovered, are also presented. Some of them call for physical experiments to verify to what extent the filter automata may model physical reality. We believe that such experiments could be performed in optical fibres using solitons of light. 相似文献
6.
Speedup and efficiency, two measures for performance of pipelined computers, are now used to evaluate performance of parallel algorithms for multiprocessor systems. However, these measures consider only the computation time and number of processors used and do not include the number of the communication links in the system. The author defines two new measures, cost effectiveness and time-cost effectiveness, for evaluating performance of a parallel algorithm for a multiprocessor system. From these two measures two characterization factors for multiprocessor systems are defined and used to analyze some well-known multiprocessor systems. It is found that for a given penalty function, every multiprocessor architecture has an optimal number of processors that produces maximum profit. If too many processors are used, the higher cost of the system reduces the profit obtained from the faster solution. On the other hand, if too few processors are used, the penalty paid for taking a longer time to obtain the solution reduces the profit 相似文献
7.
Algorithm-based fault tolerance has been proposed as a technique to detect incorrect computations in multiprocessor systems. In algorithm-based fault tolerance, processors produce data elements that are checked by concurrent error detection mechanisms. We investigate the efficacy of this approach for diagnosis of processor faults. Because checks are performed on data elements, the problem of location of data errors must first be solved. We propose a probabilistic model for the faults and errors in a multiprocessor system and use it to evaluate the probabilities of correct error location and fault diagnosis. We investigate the number of checks that are necessary to guarantee error location with high probability. We also give specific check assignments that accomplish this goal. We then consider the problem of fault diagnosis when the locations of erroneous data elements are known. Previous work on fault diagnosis required that the data sets produced by different processors be disjoint. We show, for the first time, that fault diagnosis is possible with high probability, even in systems where processors combine to produce individual data elements 相似文献
8.
Olga Bandman 《The Journal of supercomputing》2011,57(2):121-131
A cellular automata (CA) approach is proposed for simulating a fluid flow through porous materials with tortuous channels
at pore level. The approach aims to combine CA methods both for constructing computer representation of porous material morphology
and for simulating fluid flow through it. Morphology representation is obtained using CA whose evolution exhibits self-organization
and results in a stable configuration. The latter is then used for Lattice Gas CA application to simulate fluid flow through
a porous material specimen and compute its permeability properties. Special boundary conditions are introduced allowing for
different smoothness of solid pore walls surface. The model has been tested on a small 2D fragment in a PC and then implemented
to investigate a porous carbon electrode of a hydrogen fuel cell on 128 processors of a multiprocessor cluster. 相似文献
9.
A polynomial-time approximation scheme (PTAS) is derived for the partitioned EDF scheduling of implicit-deadline sporadic task systems upon unrelated multiprocessor platforms that are comprised of a constant number of distinct types of processors. This generalizes earlier results showing the existence of polynomial-time approximation schemes for the partitioned EDF scheduling of implicit-deadline sporadic task systems on (1) identical multiprocessor platforms, and (2) unrelated multiprocessor platforms containing a constant number of processors. 相似文献
10.
11.
Load balancing strategies for a parallel ray-tracing system based on constant subdivision 总被引:1,自引:1,他引:0
Hiroaki Kobayashi Satoshi Nishimura Hideyuki Kubota Tadao Nakamura Yoshiharu Shigei 《The Visual computer》1988,4(4):197-209
Static and dynamic load balancing strategies for a multiprocessor system for a ray tracing algorithm based on constant subdivision are presented. An object space is divided into regular cubes (subspaces), whose boundary planes are perpendicular to the coordinate axes, and these are allocated to the processors in the system. Here, load balancing among the processors is the most important problem. Firstly, in a category of static load balancing, strategies for mapping the subspaces into the processors are evaluated by simulation. Moreover, we propose a hierarchical multiprocessor system in order to realize dynamic load balancing with the static one. Its architecture can overcome the limitation of the static load balancing in a large scale multiprocessor system. 相似文献
12.
Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor real-time systems 总被引:1,自引:0,他引:1
Zhu D. Melhem R. Childers B.R. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):686-700
The high power consumption of modern processors becomes a major concern because it leads to decreased mission duration (for battery-operated systems), increased heat dissipation, and decreased reliability. While many techniques have been proposed to reduce power consumption for uniprocessor systems, there has been considerably less work on multiprocessor systems. In this paper, based on the concept of slack sharing among processors, we propose two novel power-aware scheduling algorithms for task sets with and without precedence constraints executing on multiprocessor systems. These scheduling techniques reclaim the time unused by a task to reduce the execution speed of future tasks and, thus, reduce the total energy consumption of the system. We also study the effect of discrete voltage/speed levels on the energy savings for multiprocessor systems and propose a new scheme of slack reservation to incorporate voltage/speed adjustment overhead in the scheduling algorithms. Simulation and trace-based results indicate that our algorithms achieve substantial energy savings on systems with variable voltage processors. Moreover, processors with a few discrete voltage/speed levels obtain nearly the same energy savings as processors with continuous voltage/speed, and the effect of voltage/speed adjustment overhead on the energy savings is relatively small. 相似文献
13.
The Data Locality of Work Stealing 总被引:1,自引:0,他引:1
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlled shared-memory machines,
where movement of data to and from the cache is solely controlled by the hardware. We present lower and upper bounds on the
number of cache misses when using work stealing, and introduce a locality-guided work-stealing algorithm and its experimental
validation.
{As a lower bound, we show that a work-stealing application that exhibits good data locality on a uniprocessor may exhibit
poor data locality on a multiprocessor. In particular, we show a family of multithreaded computations G
n
whose members perform Θ(n) operations (work) and incur a constant number of cache misses on a uniprocessor, while even on two processors the total
number of cache misses soars to Ω(n) . On the other hand, we show a tight upper bound on the number of cache misses that nested-parallel computations, a large,
important class of computations, incur due to multiprocessing. In particular, for nested-parallel computations, we show that
on P processors a multiprocessor execution incurs an expected
more misses than the uniprocessor execution. Here m is the execution time of an instruction incurring a cache miss, s is the steal time, C is the size of cache, and T
∈
fty is the number of nodes on the longest chain of dependencies. Based on this we give strong execution time bounds for nested-parallel
computations using work stealing.}
For the second part of our results, we present a locality-guided work-stealing algorithm that improves the data locality
of multithreaded computations by allowing a thread to have an affinity for a processor. Our initial experiments on iterative
data-parallel applications show that the algorithm matches the performance of static-partitioning under traditional work loads
but improves the performance up to 50% over static partitioning under multiprogrammed work loads. Furthermore, locality-guided
work stealing improves the performance of work stealing up to 80%. 相似文献
14.
Scheduling program tasks on processors is at the core of the efficient use of multiprocessor systems. Most task-scheduling problems are known to be NP-Hard and, thus, heuristics are the method of choice in all but the simplest cases. The utilization of acknowledged sets of benchmark-problem instances is essential for the correct comparison and analysis of heuristics. Yet, such sets are not available for several important classes of scheduling problems, including multiprocessor scheduling problem with communication delays (MSPCD) where one is interested in scheduling dependent tasks onto homogeneous multiprocessor systems, with processors connected in an arbitrary way, while explicitly accounting for the time required to transfer data between tasks allocated to different processors. We propose test-problem instances for the MSPCD that are representative in terms of number of processors, type of multiprocessor architecture, number of tasks to be scheduled, and task graph characteristics (task execution times, communication costs, and density of dependencies between tasks). Moreover, we define our task-graph generators in a way appropriate to ensure that the corresponding problem instances obey the theoretical principles recently proposed in the literature. 相似文献
15.
Cellular automata (CA) are able to produce a global behavior from local interactions between their units. They have been applied to the task scheduling problem in multiprocessor systems in a very distinguished way. As this problem is NP-Complete, heuristics and meta-heuristics are usually employed. However, these techniques must always start the scheduling process from scratch for each new parallel application given as input. On the other hand, the main advantage to use CA for scheduling is the discovery of rules while solving one application and their subsequent reuse in other instances. Recently studies related to CA-based scheduling have shown relevant approaches as the use of synchronous updating in CA evolution and good results in multiprocessor systems with two processors. However, some aspects, such as the low performance of CA-based schedulers in architectures with more than two processors and during the reuse of the discovered rules, need to be investigated. This paper presents two new models to improve CA-based scheduling to deal with such aspects. The first proposal refers to the employment of a construction heuristic to initialize CA evolution and the second one is a new neighborhood model able to capture the dependence and relations strength among the tasks in a very simple way. It was named pseudo-linear neighborhood. An extensive experimental evaluation was performed using graphs of parallel programs found in the literature and new ones randomly generated. Experimental analysis showed the combined application of both techniques makes the search for CA transition rules during learning stage more robust and leads to a significant gain when considering the reuse of them on real-world conditions. 相似文献
16.
In this second part of this two-part article, we present two examples of integrating heterogeneous processors and show the limitation of integrating processors without native support for cache coherence. Finally, we discuss the Verilog simulation results of applying our techniques to actual heterogeneous multiprocessor platforms. Experiments with actual heterogeneous multiprocessor platforms on a shared-bus measure the effectiveness of two cache coherence techniques. This integration approach, snoop-hit buffer, and the accompanying region-based cache coherence approach yield significant speedups compared to a pure software solution. 相似文献
17.
Summary General models of multiprocessor systems in which processors are functionally dedicated are described. In these models, processors are divided into different types. A task can be assigned only to a processor of certain types. Clearly, the model of multiprocessor systems with identical processors is a special case of our models. These models also include the job shop problem in which there is exactly one processor of each type. Worst case performance bounds of priority-driven schedules are obtained.This work was supported by the National Science Foundation under Grants NSFDCR 72-03740 and NSFMCS 73-03408 相似文献
18.
周书明 《计算技术与自动化》2008,27(4):122-127
多处理机系统随着系统规模的不断扩大,其组件的脆弱性随着增大。定位和替换故障处理器对于系统保持高可靠性能至关重要。故障诊断就是系统通过自身的测试来识别故障处理机的过程。基于比较的诊断分析是多计算机系统故障诊断分析的有效方法。证明煎饼网络(Pancake Network)在比较诊断模型下的诊断度。 相似文献
19.
20.
Peter J. Varman Balakrishna R. Iyer Donald J. Haderle Stephen M. Dunn 《Parallel Computing》1990,15(1-3):165-177
An efficient parallel algorithm for merging two sorted lists is presented. The algorithm is based on a novel partitioning algorithm that splits the two lists among the processors, in a way that ensures load balance during the merge. The partitioning algorithm can itself be efficiently parallelized, allowing the solution to scale with increased numbers of processors. A shared memory multiprocessor is assumed. The time complexity for partitioning and merging is O(N/p + log N), where p is the number of processors and N is the total number of elements in the two lists. Implementation results on a twenty node Sequent Symmetry multiprocessor are also presented. 相似文献