期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Compiling programs for distributed-memory multiprocessors 总被引：1，自引：0，他引：1

David Callahan Ken Kennedy 《The Journal of supercomputing》1988,2(2):151-169

We describe a new approach to programming distributed-memory computers. Rather than having each node in the system explicitly programmed, we derive an efficient message-passing program from a sequential shared-memory program annotated with directions on how elements of shared arrays are distributed to processors. This article describes one possible input language for describing distributions and then details the compilation process and the optimization necessary to generate an efficient program.Research supported by Intel. 相似文献

2.

Compiling quantum programs 总被引：4，自引：0，他引：4

Paolo Zuliani 《Acta Informatica》2005,41(7-8):435-474

In this paper we study a possible compiler for a high-level imperative programming language for quantum computation, the quantum Guarded-Command Language (qGCL). It is important because it liberates us from thinking of quantum algorithms at the data-flow level, in the same way as happened for standard computation a few decades ago.We make use of the normal-form approach to compiler design, introduced by Hoare, Jifeng and Sampaio. In this approach a source program is transformed, by means of algebraic manipulations, into a particular form which can be directly executed by a target machine. This entails the definition of a simple quantum hardware architecture, derived from Hoare et al.’s computing model.Our work provides a general framework for the construction of a compiler for qGCL, focusing mainly on the correctness of the design. Here we do not deal with other topics such as efficiency of compiled code, factorisation of unitary transformations and compilation of quantum data structures. 相似文献

3.

Compiling lisp programs for parallel execution

James R. Larus 《LISP and Symbolic Computation》1991,4(1):29-99

Curare, the program restructurer described in this paper automatically transforms a sequential Lisp program into an equivalent concurrent program that runs on a multiprocessor.Data dependences constrain the program's concurrent execution because, in general, two conflicting statements cannot execute in a different order without affecting the program's result. Not all dependences are essential to produce the program's result.Curare attempts to transform the program so it computes its result with fewer conflicts. An optimized program will execute with less synchronization and more concurrency. Curare then examines loops in a program to find those that are unconstrained or lightly constrained by dependences. By necessity,Curare treats recursive functions as loops and does not limit itself to explicit program loops. Recursive functions offer several advantages over explicit loops since they provide a convenient framework for inserting locks and handling the dynamic behavior of symbolic programs. Loops that are suitable for concurrent execution are changed to execute on a set of concurrent server processes. These servers execute single loop iterations and therefore need to be extremely inexpensive to invoke.Restructured programs execute significantly faster than the original sequential programs. This improvement is large enough to attract programmers to a multiprocessor, particularly since it requires little effort on their part.This research was funded by DARPA contract numbers N00039-85-C-0269 (SPUR) and N00039-84-C-0089 (XCS) and by an NSF Presidential Young Investigator award to Paul N. Hilfinger. Additional funding came from the California MICRO program (in conjunction with Texas Instruments, Xerox, Honeywell, and Phillips/Signetics). 相似文献

4.

Compiling parallel programs by optimizing performance

Marina Chen Young-Il Choo Jingke Li 《The Journal of supercomputing》1988,2(2):171-207

This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers. Some scientifc programmers and theoreticians may ask, “What is new about Crystal?” or “How is it different from existing functional languages?” The answers lie in its model of parallel computation and a theory of parallel program optimization, and we examine this in the text to follow. We illustrate the power of our approach with benchmarks of compiled parallel code from Crystal source. The target machines are hypercube multiprocessors with distributed memory, on which it is considered difficult for functional programs to achieve high efficiency. 相似文献

5.

Compiling and verifying SC-SystemJ programs for safety-critical reactive systems

《Computer Languages, Systems and Structures》2015

Most of today's embedded systems are very complex. These systems, controlled by computer programs, continuously interact with their physical environments through network of sensory input and output devices. Consequently, the operations of such embedded systems are highly reactive and concurrent. Since embedded systems are deployed in many safety-critical applications, where failures can lead to catastrophic events, an approach that combines mathematical logic and formal verification is employed in order to ensure correct behavior of the control algorithm. This paper presents What You Prove Is What You Execute (WYPIWYE) compilation strategy for a Globally Asynchronous Locally Synchronous (GALS) programming language called Safey-Critical SystemJ. SC-SystemJ is a safety-critical subset of the SystemJ language. A formal big-step transition semantics of SC-SystemJ is developed for compiling SC-SystemJ programs into propositional Linear Temporal Logic formulas. These LTL formulas are then converted into a network of Mealy automata using a novel and efficient compilation algorithm. The resultant Mealy automata have a straightforward syntactic translation into Promela code. The resultant Promela models can be used for verifying correctness properties via the SPIN model-checker. Finally there is a single translation procedure to compile both: Promela and C/Java code for execution, which satisfies the De-Bruijn index, i.e. this final translation step is simple enough that is can be manually verified. 相似文献

6.

Compiling real-time programs with timing constraint refinement andstructural code motion

Gerber R. Seongsoo Hong 《IEEE transactions on pattern analysis and machine intelligence》1995,21(5):389-404

We present a programming language called TCEL (Time-Constrained Event Language), whose semantics are based on time-constrained relationships between observable events. Such a semantics infers only those timing constraints necessary to achieve real-time correctness, without overconstraining the system. Moreover, an optimizing compiler can exploit this looser semantics to help tune the code, so that its worst-case execution time is consistent with its real-time requirements. In this paper we describe such a transformation system, which works in two phases. First, the TCEL source code is translated into an intermediate representation. Then an instruction-scheduling algorithm rearranges selected unobservable operations and synthesizes tasks guaranteed to respect the original event-based constraints 相似文献

7.

Compiling bottom-up and mixed derivations into top-down executable logic programs

Danny De Schreye Bern Martens Gunther Sablon Maurice Bruynooghe 《Journal of Automated Reasoning》1991,7(3):337-358

We present a technique for the compilation of bottom-up and mixed logic derivations into PROLOG-programs. It is obtained as an extension of a program transformation technique called Compiling Control. We illustrate its applications in three different domains: solving numerical problems, integrity checking in deductive databases and theorem proving. The aim is to obtain efficient PROLOG programs for problems in which a non-top-down control is most appropriate.Work partly supported by ESPRIT BRA COMPULOG (project 3012).Supported by the Belgian I.W.O.N.L.-I.R.S.I.A. under contract number 5203. Author for correspondence.Supported by the Belgian National Fund for Scientific Research. 相似文献

8.

Towards secure and communication-efficient broadcast encryption systems

Wen Tao Zhu 《Journal of Network and Computer Applications》2013,36(1):178-186

Many emerging network applications are based upon a group communication model where security is a critical design issue. We address the broadcast encryption problem of distributing to a group of network entities a confidential cryptographic key, which needs to be updated from session to session. The design goals of such a system essentially include not only security but also communication efficiency concerning the distribution of the session key. We show that there is a disconnect between the essence of broadcast encryption and a scheme proposed very recently. The observation also motivates us to seek for secure and efficient broadcast encryption solutions. Three distinctive constructions, based on the bilinear map, the one-way hash function, and the RSA cryptosystem, respectively, are then presented to demonstrate reasonable tweaks and various tradeoffs when designing practical group-oriented communication systems. These constructions exhibit not only promising security but also impressive communication efficiency, and we also discuss the diverse networking scenarios to which they are applicable. 相似文献

9.

A communication-efficient three-party password authenticated key exchange protocol 总被引：1，自引：0，他引：1

Ting-Yi Chang Wei-Pang Yang 《Information Sciences》2011,181(1):217-226

Three-party password authenticated key exchange (3PAKE) protocols allow two users (clients) to establish a session key through an authentication server over an insecure channel. Clients only share an easy-to-remember password with the trusted server. In the related literature, most schemes employ the server public keys to ensure the identities of both the servers and symmetric cryptosystems to encrypt the messages. This paper describes an efficient 3PAKE based on LHL-3PAKE proposed by Lee et al. Our 3PAKE requires neither the server public keys nor symmetric cryptosystems such as DES. The formal proof of security of our 3PAKE is based on the computational Diffie-Hellman assumption in the random oracle model along with a parallel version of the proposed 3PAKE. The comparisons have shown that our 3PAKE is more practical than other 3PAKEs. 相似文献

10.

Compiling for the Cydra

James C. Dehnert Ross A. Towle 《The Journal of supercomputing》1993,7(1-2):181-227

The Cydra 5 is a VLIW minisupercomputer with hardware designed to accelerate a broad class of inner loops, presenting unique challenges to its compilers. We discuss the organization of its Fortran/77 compiler and several of the key approaches developed to fully exploit the hardware. These include the intermediate representation used; the preparation, overlapped scheduling, and register allocation of inner loops; the speculative execution model used to control global code motion; and the machine model and local instruction scheduling approach. 相似文献

11.

rPIR: ramp secret sharing-based communication-efficient private information retrieval

Lichun Li Michael Militzer Anwitaman Datta 《International Journal of Information Security》2017,16(6):603-625

Even as data and analytics-driven applications are becoming increasingly popular, retrieving data from shared databases poses a threat to the privacy of their users. For example, investors/patients retrieve records about stocks/diseases they are interested in from a stock/medical database. Knowledge of such interest is sensitive information that the database server would have access to, unless some mitigating measures are deployed. Private information retrieval (PIR) is a promising security primitive to protect the privacy of users’ interests. PIR allows the retrieval of a data record from a database without letting the database server know which record is being retrieved. The privacy guarantees could either be information theoretic or computational. Alternatively, anonymizers, which hide the identities of data users, may be used to protect the privacy of users’ interests for some situations. In this paper, we study rPIR, a new family of information-theoretic PIR schemes using ramp secret sharing. We have designed four rPIR schemes, using three ramp secret sharing approaches, achieving answer communication costs close to the cost of non-private information retrieval. Evaluation shows that, for many practical settings, rPIR schemes can achieve lower communication costs and the same level of privacy compared with traditional information-theoretic PIR schemes and anonymizers. Efficacy of the proposed schemes is demonstrated for two very different scenarios (outsourced data sharing and P2P content delivery) with realistic analysis and experiments. In many situations of these two scenarios, rPIR’s advantage of low communication cost outweighs its disadvantages, which results in less expenditure and/or better quality of service compared with what may be achieved if traditional information-theoretic PIR and anonymizers are used. 相似文献

12.

VLSI design for massively parallel signal processors

SY Kung Jurgen Annevelink 《Microprocessors and Microsystems》1983,7(10):461-468

相似文献

13.

Semi-distributed load balancing for massively parallelmulticomputer systems

Ahmad I. Ghafoor A. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(10):987-1004

A semidistributed approach is given for load balancing in large parallel and distributed systems which is different from the conventional centralized and fully distributed approaches. The proposed strategy uses a two-level hierarchical control by partitioning the interconnection structure of a distributed or multiprocessor system into independent symmetric regions (spheres) centered at some control points. The central points, called schedulers, optimally schedule tasks within their spheres and maintain state information with low overhead. The authors consider interconnection structures belonging to a number of families of distance transitive graphs for evaluation, and, using their algebraic characteristics, show that identification of spheres and their scheduling points is in general an NP-complete problem. An efficient solution for this problem is presented by making exclusive use of a combinatorial structure known as the Hadamard matrix. The performance of the proposed strategy has been evaluated and compared with an efficient fully distributed strategy through an extensive simulation study. The proposed strategy yielded much better results 相似文献

14.

Reusable coordinator modules for massively concurrent applications

F. Arbab C. L. Blom F. J. Burger C. T. H. Everaars 《Software》1998,28(7):703-735

Isolating computation and communication concerns into separate pure computation and pure coordination modules enhances modularity, understandability and reusability of parallel and/or distributed software. MANIFOLD is a pure coordination language that encourages this separation. We use real, concrete, running ANIFOLD programs to demonstrate the concept of pure coordination modules and the advantage of their reuse in applications of different natures. Performance results for the examples presented in this paper show that the overhead of using MANIFOLD to achieve this enhanced modularity and reusability is in practice small, compared to the more conventional paradigms for the design and programming of parallel and distributed software. © 1998 John Wiley & Sons, Ltd. 相似文献

15.

Practical middleware for massively multiplayer online games

Tsun-Yu Hsiao Shyan-Ming Yuan 《Internet Computing, IEEE》2005,9(5):47-54

A massively multiplayer online game (MMOG) lets thousands of players interact simultaneously within a virtual world via the Internet. Middleware plays an important role in the development of next-generation MMOGs, which must be built on platforms that address not only the service aspect, but also code maintainability and development for programmers. The authors' compact, high-performance message-oriented middleware has a code-generation programming model that is designed to address many of these problems. 相似文献

16.

A simple and communication-efficient Omega algorithm in the crash-recovery model

Cristian Martín Mikel Larrea 《Information Processing Letters》2010,110(3):83-87

This paper presents a new algorithm implementing the Omega failure detector in the crash-recovery model. Contrary to previously proposed algorithms, this algorithm does not rely on the use of stable storage and is communication-efficient, i.e., eventually only one process (the elected leader) keeps sending messages. The algorithm relies on a nondecreasing local clock associated with each process. Since stable storage is not used to keep the identity of the leader in order to read it upon recovery, unstable processes, i.e., those that crash and recover infinitely often, output a special ⊥ value upon recovery, and then agree with correct processes on the leader after receiving a first message from it. 相似文献

17.

Compiling Techniques for Algebraic Expressions

Huskey H. D. 《Computer Journal》1961,4(1):10-19

相似文献

18.

Compiling for distributed memory architectures

Rogers A. Pingali K. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):281-298

The lack of high-level languages and good compilers for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balancing. We have developed a parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference. A process decomposition is obtained by specializing the program for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution. Each process's role in the computation is determined by examining the data required for execution at run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message transmission. Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speedup within 60% to 70% of handwritten code 相似文献

19.

Compiling for power with ScalaPipe

《Journal of Systems Architecture》2013,59(8):615-625

相似文献

20.

A massively parallel architecture for self-organizing feature maps

Porrmann M. Witkowski U. Ruckert U. 《Neural Networks, IEEE Transactions on》2003,14(5):1110-1121

A hardware accelerator for self-organizing feature maps is presented. We have developed a massively parallel architecture that, on the one hand, allows a resource-efficient implementation of small or medium-sized maps for embedded applications, requiring only small areas of silicon. On the other hand, large maps can be simulated with systems that consist of several integrated circuits that work in parallel. Apart from the learning and recall of self-organizing feature maps, the hardware accelerates data pre- and postprocessing. For the verification of our architectural concepts in a real-world environment, we have implemented an ASIC that is integrated into our heterogeneous multiprocessor system for neural applications. The performance of our system is analyzed for various simulation parameters. Additionally, the performance that can be achieved with future microelectronic technologies is estimated. 相似文献