共查询到20条相似文献,搜索用时 0 毫秒
1.
We present lock-free and wait-free universal constructions for implementing large shared objects. Most previous universal constructions require processes to copy the entire object state, which is impractical for large objects. Previous attempts to address this problem require programmers to explicitly fragment large objects into smaller, more manageable pieces, paying particular attention to how such pieces are copied. In contrast, our constructions are designed to largely shield programmers from this fragmentation. Furthermore, for many objects, our constructions result in lower copying overhead than previous ones. Fragmentation is achieved in our constructions through the use of load-linked, store-conditional, and validate operations on a “large” multiword shared variable. Before presenting our constructions, we show how these operations can be efficiently implemented from similar one-word primitives 相似文献
2.
3.
4.
《国际计算机数学杂志》2012,89(3-4):373-390
The proposed ordering scheme is the fusion of Jess and Kees method and the Minimum degree ordering, that operates on a non-chordal graph. The method produces a fill preserving ordering for all the test problems selected from the Boeing-Harwell Sparse matrix collection. The extent of parallelism extracted is nearly the same as that obtained by using Liu's tree rotation heuristic. 相似文献
5.
Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level parallelism and perform better than either approach could separately. Our design achieves performance equivalent to executing 15 to 26 scalar instructions/cycle for numerical applications 相似文献
6.
Holland's schema theorem (an inequality) may be viewed as an attempt to understand genetic search in terms of a coarse graining of the state space. Stephens and Waelbroeck developed that perspective, sharpening the schema theorem to an equality. Of particular interest is a "form invariance" of their equations; the form is unchanged by the degree of coarse graining. This paper establishes a similar form invariance for the more general model of Vose et al. and uses the attendant machinery as a springboard for an interpretation and discussion of implicit parallelism. 相似文献
7.
8.
9.
We discuss the advantages and limitations of the “nonprocedural” mode of expression, in the context of languages for programming, system specification, and database access. A language is introduced which has some relevance to all three of the above processes. The syntax of LEGOL is based on the relational algebra, but incorporates procedural control structures for use when the underlying application logic demands. Using this language, solutions can be suggested for some example stock-control problems which are difficult to handle in a purely non-procedural way. 相似文献
10.
Krishnaswamy R Kim CE 《IEEE transactions on pattern analysis and machine intelligence》1987,(2):316-321
The slope of digital line segments is defined and an algorithm to evaluate it is presented. Parallelism and perpendicularity of two digital line segments are also defined. Finally, rectangular digital regions are defined and characterized, and an algorithm that determines whether or not a given digital region is a digital rectangle is presented. 相似文献
11.
A new technique for estimating and understanding the speed improvement that can result from executing a program on a parallel computer is described. The technique requires no additional programming and minimal effort by a program's author. The analysis begins by tracing a sequential program. A parallelism analyzer uses information from the trace to simulate parallel execution of the program. In addition to predicting parallel performance, the parallelism analyzer measures many aspects of a program's dynamic behavior. Measurements of six substantial programs are presented. These results indicate that the three symbolic programs differ substantially from the numeric programs and, as a consequence, cannot be automatically parallelized with the same compilation techniques 相似文献
12.
This paper defines an abstract interpreter for logic programs based on a system of asynchronous, independent processors which communicate only by passing messages. Each logic program is automatically partitioned and its pieces distributed to available processors. This approach permits two distinct forms of parallelism. OR parallelism arises from evaluating nondeterministic choices simultaneously. AND parallelism arises when a computation involves independent, but necessary, subcomputations. Algorithms like quicksort, which follow a divide and conquer approach, usually exhibit this form of parallelism. These two forms of parallelism are conjointly achieved by the parallel interpreter. 相似文献
13.
Peter M Kogge 《Parallel Computing》1985,2(3):243-253
One of today's most popular computing folktheorems states that true parallel processing and conventional computing techniques are mutually incompatible. The term Von Neumann bottleneck summarizes what many feel are the basic stumbling blocks preventing the successful application of parallelism in day-to-day computing. This paper reviews an alternative approach, based on function-based computing, that to a large degree eliminates or avoids much of the Von Neumann bottleneck, and offers opportunities for the exploitation of parallelism in ways not even conceivable in classical computing.Topics covered include a review of the Von Neumann bottleneck and imperative languages, the mathematical foundation of functional computing, namely lambda calculus, how this foundation provides opportunities for parallelism, and characteristics of the design space for implementation of these concepts in real computing hardware. 相似文献
14.
Manjikian N. Abdelrahman T.S. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(2):193-209
Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which prevent parallelism. In addition, performance losses result from cache conflicts in fused loops. In this paper, we present new techniques to: (1) allow fusion of loop nests in the presence of fusion-preventing dependences, (2) maintain parallelism and allow the parallel execution of fused loops with minimal synchronization, and (3) eliminate cache conflicts in fused loops. We describe algorithms for implementing these techniques in compilers. The techniques are evaluated on a 56-processor KSR2 multiprocessor and on a 18-processor Convex SPP-1000 multiprocessor. The results demonstrate performance improvements for both kernels and complete applications. The results also indicate that careful evaluation of the profitability of fusion is necessary as more processors are used 相似文献
15.
Gabriel Ciobanu Linqiang Pan Gheorghe Păun Mario J. Pérez-Jiménez 《Theoretical computer science》2007
A current research topic in membrane computing is to find more realistic P systems from a biological point of view, and one target in this respect is to relax the condition of using the rules in a maximally parallel way. We contribute in this paper to this issue by considering the minimal parallelism of using the rules: if at least a rule from a set of rules associated with a membrane or a region can be used, then at least one rule from that membrane or region must be used, without any other restriction (e.g., more rules can be used, but we do not care how many). Weak as it might look, this minimal parallelism still leads to universality. We first prove this for the case of symport/antiport rules. The result is obtained both for generating and accepting P systems, in the latter case also for systems working deterministically. Then, we consider P systems with active membranes, and again the usual results are obtained: universality and the possibility to solve NP-complete problems in polynomial time (by trading space for time). 相似文献
16.
Subword parallelism with MAX-2 总被引:1,自引:0,他引:1
MAX-2 illustrates how a small set of instruction extensions can provide subword parallelism to accelerate media processing and other data-parallel programs. This article proposes that subword parallelism-parallel computation on lower precision data packed into a word-is an efficient and effective solution for accelerating media processing. As an example, it describes MAX-2, a very lean, RISC-like set of media acceleration primitives included in the 64-bit PA-RISC 2.0 architecture. Because MAX-2 strives to be a minimal set of instructions, the article discusses both instructions included and excluded. Several examples illustrate the use of MAX-2 instructions, which provide subword parallelism in a word-oriented general-purpose processor at essentially no incremental cost 相似文献
17.
Asimina Maniopoulou Erlend R.M. Davidson Ricardo Grau-Crespo Aron Walsh Ian J. Bush C. Richard A. Catlow Scott M. Woodley 《Computer Physics Communications》2012,183(8):1696-1701
For many years ab initio electronic structure calculations based upon density functional theory have been one of the main application areas in high performance computing (HPC). Typically, the Kohn–Sham equations are solved by minimisation of the total energy functional, using a plane wave basis set for valence electrons and pseudopotentials to obviate the representation of core states. One of the best known and widely used software for performing this type of calculation is the Vienna Ab initio Simulation Package, VASP, which currently offers a parallelisation strategy based on the distribution of bands and plane wave coefficients over the machine processors. We report here an improved parallelisation strategy that also distributes the k-point sampling workload over different processors, allowing much better scalability for massively parallel computers. As a result, some difficult problems requiring large k-point sampling become tractable in current computing facilities. We showcase three important applications: dielectric function of epitaxially strained indium oxide, solution energies of tetravalent dopants in metallic VO2, and hydrogen on graphene. 相似文献
18.
19.
G. I. Kustova 《Automatic Documentation and Mathematical Linguistics》2008,42(2):109-114
Adverbial constructions v gotovom vide, na vsyakii sluchai, etc. that are an intrinsic resource of the Russian language along with secondary prepositions of v vide and na sluchai type are discussed. The availability and linguistic features of these constructions shows grammaticalization occurring in the modern Russian language to cover not only separate units but also word combinations. 相似文献
20.
The paper reviewed the results bearing out the deep-seated relation between the parallel computations and learning procedures for the laminated neural networks one of whose formalizations is represented by the theory of committee constructions. Additionally, consideration was given to two combinatorial problems concerned with learning pattern recognition in the class of affine committees—the problem of verifying existence of a three-element affine separating committee and that of element-minimal affine separating committee. The first problem was shown to be N P-complete, whereas the second problem is N P-hard and does not belong to the Apx class. 相似文献