首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The paper presents a dataflow execution model, DIALOG, for logic programs which operates on an intermediate virtual machine. The virtual machine is granulated at clause argument level to exploit argument parallelism through unification. The model utilises a new variable binding scheme that eliminates dereference operations for accessing variables, and therefore supports OR-parallelism in the highly distributed dataflow environment. The model has been implemented in Occam. A conventional dataflow architecture in support of the model has been simulated as a testbed for the evaluation. The simulation indicates some encouraging results and suggests future improvements.  相似文献   

2.
Performance studies show that traditional semi-join processing methods are sometimes inefficient because of the storage and processing overhead. To remedy this problem, a new semi-join processing method, called one-shot semi-join execution is proposed. This method allows parallel generation of all the semi-join projections, parallel transmission of all the semi-join projections, and parallel execution of all the semi-joins. The authors apply this method to optimize the response time for processing distributed queries. A response time model is established, which considers both data transmission time and local processing time. Based on this model, an efficient query processing algorithm is developed and analyzed  相似文献   

3.
《Parallel Computing》1997,23(10):1405-1420
Performance prediction is necessary in order to deal with multi-dimensional performance effects on parallel systems. The compiler-generated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs written in high level data-parallel languages. The performance prediction technique is shown to be effective in analyzing several non-trivial data-parallel applications as the problem size and number of processors vary. We leverage technology from the Maple symbolic manipulation system and the S-PLUS statistical package in order to present users with critical performance information necessary for performance debugging, architectural enhancement and procurement of parallel systems. The usability of these results is improved through specifying confidence intervals as well as predicted execution times for data-parallel applications.  相似文献   

4.
在研究文件存储基础设备与技术发展的基础上,提出一种可高效扩展的分布式存储机制,将企业中已有的服务器和存储设备作为分布式存储的存储单元,将文件作为存储对象,建立一种高效率甚至是零等待时间的可以随时扩展或减少存储单元的分布式存储机制。企业中存在的服务器和存储设备较多时,利用该分布式存储机制后存储性能会得到较大的提升,管理人员文件管理的劳动强度将大大降低。重点是能够帮助企业充分利用原有的服务器和存储设备,减少更新换代的一次性大量投入。  相似文献   

5.
Simulation has become an indispensable tool for researchers to explore systems without having recourse to real experiments. Depending on the characteristics of the modeled system, methods used to represent the system may vary. Multi-agent systems are often used to model and simulate complex systems. In any cases, increasing the size and the precision of the model increases the amount of computation, requiring the use of parallel systems when it becomes too large. In this paper, we focus on parallel platforms that support multi-agent simulations and their execution on high performance resources as parallel clusters. Our contribution is a survey on existing platforms and their evaluation in the context of high performance computing. We present a qualitative analysis of several multi-agent platforms, their tests in high performance computing execution environments, and the performance results for the only two platforms that fulfill the high performance computing constraints.  相似文献   

6.
A parallel-execution model that can concurrently exploit AND and OR parallelism in logic programs is presented. This model employs a combination of techniques in an approach to executing logic problems in parallel, making tradeoffs among number of processes, degree of parallelism, and combination bandwidth. For interpreting a nondeterministic logic program, this model (1) performs frame inheritance for newly created goals, (2) creates data-dependency graphs (DDGs) that represent relationships among the goals, and (3) constructs appropriate process structures based on the DDGs. (1) The use of frame inheritance serves to increase modularity. In contrast to most previous parallel models that have a large single process structure, frame inheritance facilitates the dynamic construction of multiple independent process structures, and thus permits further manipulation of each process structure. (2) The dynamic determination of data dependency serves to reduce computational complexity. In comparison to models that exploit brute-force parallelism and models that have fixed execution sequences, this model can reduce the number of unification and/or merging steps substantially. In comparison to models that exploit only AND parallelism, this model can selectively exploit demand-driven computation, according to the binding of the query and optional annotations. (3) The construction of appropriate process structures serves to reduce communication complexity. Unlike other methods that map DDGs directly onto process structures, this model can significantly reduce the number of data sent to a process and/or the number of communication channels connected to a process  相似文献   

7.
Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execution exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researchers have paralleled existing symbolic execution tools (e.g., KLEE). In particular, Cloud9 is a widely used paralleled symbolic execution tool, and researchers have used the tool to analyze real code. However, researchers criticize that tools such as Cloud9 still cannot analyze large scale code. In this paper, we conduct a field study on Cloud9, in which we use KLEE and Cloud9 to analyze benchmarks in C. Our results confirm the criticism. Based on the results, we identify three bottlenecks that hinder the performance of Cloud9: the communication time gap, the job transfer policy, and the cache management of the solved constraints. To handle these problems, we tune the communication time gap with better parameters, modify the job transfer policy, and implement an approach for cache management of solved constraints. We conduct two evaluations on our benchmarks and a real application to understand our improvements. Our results show that our tuned Cloud9 reduces the execution time significantly, both on our benchmarks and the real application. Furthermore, our evaluation results show that our tuning techniques improve the effectiveness on all the devices, and the improvement can be achieved upto five times, depending upon a tuning value of our approach and the behaviour of program under test.  相似文献   

8.
Parallel and distributed simulation is a powerful tool for developing complex agent-based simulation. Complex simulations require parallel and distributed high performance computing solutions. It is necessary because their sequential solutions are not able to give answers in a feasible total execution time. Therefore, for the advance of computing science, it is important that High Performance Computing (HPC) techniques and solutions be proposed and studied. In literature, we can find some agent-based modeling and simulation tools that use HPC. However, none of these tools are designed to enable the HPC expert to be able to propose new techniques and solutions without great effort. In this paper, we introduce Care High Performance Simulation (HPS), which is a scientific instrument that enables researchers to: (1) develop techniques and solutions of high performance distributed simulations for agent-based models; and, (2) study, design and implement complex agent-based models that require HPC solutions. Care HPS was designed to easily and quickly develop new agent-based models. It was also designed to extend and implement new solutions for the main issues of parallel and distributed solutions such as: synchronization, communication, load and computing balancing, and partitioning algorithms. We conducted some experiments with the aim of showing the completeness and functionality of Care HPS. As a result, we show that Care HPS can be used as a scientific instrument for the advance of the agent-based parallel and distributed simulations field.  相似文献   

9.
Dynamical concurrent execution makes it possible to adapt programs for their execution on computing environments with parallel architecture. In the paper, a formal model of dynamical concurrent execution of programs written in functional style is presented. The model is proven to possess a feature that guarantees correctness of concurrent execution.  相似文献   

10.
Automated behavior analysis is a valuable technique in the development and maintenance of distributed systems. In this paper, we present a tractable dataflow analysis technique for the detection of unreachable states and actions in distributed systems. The technique follows an approximate approach described by Reif and Smolka, but delivers a more accurate result in assessing unreachable states and actions. The higher accuracy is achieved by the use of two concepts: action dependency and history sets. Although the technique does not exhaustively detect all possible errors, it detects nontrivial errors with a worst-case complexity quadratic to the system size. It can be automated and applied to systems with arbitrary loops and nondeterministic structures. The technique thus provides practical and tractable behavior analysis for preliminary designs of distributed systems. This makes it an ideal candidate for an interactive checker in software development tools. The technique is illustrated with case studies of a pump control system and an erroneous distributed program. Results from a prototype implementation are presented  相似文献   

11.
Although directory-based cache-coherence protocols are the best choice when designing chip multiprocessors with tens of cores on-chip, the memory overhead introduced by the directory structure may not scale gracefully with the number of cores. Many approaches aimed at improving the scalability of directories have been proposed. However, they do not bring perfect scalability and usually reduce the directory memory overhead by compressing coherence information, which in turn results in extra unnecessary coherence messages and, therefore, wasted energy and some performance degradation. In this work, we present a distributed directory organization based on duplicate tags for tiled CMP architectures whose size is independent on the number of tiles of the system up to a certain number of tiles. We demonstrate that this number of tiles corresponds to the number of sets in the private caches. Additionally, we show that the area overhead of the proposed directory structure is 0.56% with respect to the on-chip data caches. Moreover, the proposed directory structure keeps the same information than a non-scalable full-map directory. Finally, we propose a mechanism that takes advantage of this directory organization to remove the network traffic caused by replacements. This mechanism reduces total traffic by 15% for a 16-core configuration compared to a traditional directory-based protocol.  相似文献   

12.
The transactional memory in multicore processors has been a major research area over past several years. Many transactional memory systems have been proposed to be used to solve the synchronization problem of multicore processors. Hardware transactional memory is one of the critical methods to speedup communications in multicore environment. In this paper, we give a review of the current hardware transactional memory systems for multicore processors. We take a top-down approach to characterizing and classifying various hardware transactional design issues and present a taxonomy of hardware transactional memory systems which is consist of the five fundamental design issues: version management, conflict detection, contention management, virtualization and nesting. Finally, we discussed the active research challenge: the relationship between transactional memory and Input/Output operations and system calls.  相似文献   

13.
In this paper,we focus on the compiling implementation of parlalel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstact machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed.The performance statistics on transputer array demonstrate the effectiveness of our model,parallel abstract machine,optimizing compilation strategies and compiler.  相似文献   

14.
This work proposes an execution model for massively parallel systems aiming at ensuring the communications overlap by the computations. This model is named SCAC : Synchronous Communication Asynchronous Computation. This weakly-coupled model separates the execution of communication phases from those of computation in order to facilitate their overlapping, thus covering the data transfer time. To allow the simultaneous execution of these phases, we propose an approach based on three levels : two globally-centralized/locally-distributed hierarchical control levels and a parallel computation level. A generic and parametric implementation of the SCAC model was performed to fit different applications. This implementation allows the designer to choose the system components (from pre-designed ones) and to set its parameters in order to build the adequate SCAC configuration for the target application. An analytical estimation is proposed to predict the execution time of an application running in SCAC mode, in order to facilitate the parallel program design and the SCAC architecture configuration. The SCAC model was validated by simulation, synthesis and implementation on an FPGA platform, with different examples of parallel computing applications. The comparison of the results obtained by the SCAC model with other models has shown its effectiveness in terms of flexibility and speed-up.  相似文献   

15.
一种高性能流式并行加密算法   总被引:1,自引:0,他引:1  
随着网络用户的数量持续增多和对安全需求的增长,以AES加密的方式对用户数据流进行加密保护得到了广泛的应用。对服务器而言,大量用户形成的数据流具有流速高和突发性强的特点,而传统的串行加密却效率低下,会造成服务失效或服务质量差,因此在目前普及的CPU+GPU异构环境的基础上,通过流水线方式组织并行AES加密,以提高加密的性能,并通过滑动窗口进行突发流量控制,以提供高质量的流加密服务。实验结果显示,所提出的异构环境下的流式AES并行加密算法能满足高速率突发性的用户数据流的流式加密的需求,提高了加密的处理速度并有效地控制了流量。  相似文献   

16.
This paper presents tuple channel model (TCM), a new coordination model for parallel and distributed programming. Our proposal is based on the use of tuple channels (TCs) to model the communication and synchronization of different activities. TCs are multi-point channels that allow complex data structures to be communicated among multiple producers and consumers. This communication model allows incremental and backward communication to be expressed, providing an elegant way of implicit and direct communication and reactive control. TCs can be dynamically interconnected through the use of user-defined connectors, providing great flexibility for the definition of complex and dynamic interaction protocols. TCM also provides a simple service management mechanism, by means of which open systems can be implemented in an appropriate way. The suitability, expressiveness and programming techniques of the model are presented by means of some illustrative examples. In addition, some implementation details of the developed prototypes are sketched and we show the preliminary results demonstrating the efficiency of the proposal.  相似文献   

17.
Parallel computing and distributed computing have traditionally evolved as two separate research disciplines. Parallel computing has addressed problems of communication-intensive computation on tightly-coupled processors while distributed computing has been concerned with coordination, availability, timeliness, etc., of more loosely coupled computations. Current trends, such as parallel computing on networks of conventional processors and Internet computing, suggest the advantages of unifying these two disciplines. Actors provide a flexible model of computation which supports both parallel and distributed computing. One may evaluate the utility of a programming paradigm in terms of four criteria: expressiveness, portability, efficiency, and performance predictability. We discuss how the Actor model and programming methods based on it support these goals. In particular, we provide an overview of the state of the art in Actor languages and their implementation. Finally, we place this work in the context of recent developments in middleware, the Java language, and agents.  相似文献   

18.
《Performance Evaluation》2006,63(4-5):265-277
Performance prediction for parallel applications running in heterogeneous clusters is difficult to accomplish due to the unpredictable resource contention patterns that can be found in such environments. Typically, components of a parallel application will contend for the use of resources among themselves and with entities external to the application, such as other processes running in the computers of the cluster. The performance modeling approach should be able to represent these sources of contention and to produce an estimate of the execution time, preferably in polynomial time. This paper presents a polynomial time static performance prediction approach in which the prediction takes the form of an interval of values instead of a single value. The extra information given by an interval of values represents the variability of the underlying environment more accurately, as indicated by the practical examples presented.  相似文献   

19.
In this paper we propose an efficient and scalable storage model and lookup for provenance logs. The proposed system exploits the loosely coupled structure of the provenance logs by separating metadata from the generating process to manage large datasets with good scalability. In addition, the system utilizes the trie based lookup table to greatly improve the provenance data lookup time. Performance results on thousands of graph logs show that our prototype implementation can effectively handle logs without any resource over-utilization, thus leading to good scalability.  相似文献   

20.
Using web services to expose applications over the Internet is now a widely accepted practice. Currently, there are several ongoing efforts that provide ways to effectively compose web services distributed across different organizations. One of the problems underlying the deployment of such composite services on the web, however, is service co-allocation that arises when a composite service needs to ensure all the required component services to be available for execution at the same time. Motivated by this, this paper presents a new decentralized protocol, named web service co-allocation protocol (WSCP), which can facilitate fast execution of composite web services. The proposed framework is an enhancement of the famous two phase commit protocol through the incorporation of tentative hold phase as well as the employment of a new high performance backoff protocol developed to better address the dynamics of the service co-allocation problem. The simulation results show that the proposed approach yields significant improvements over existing protocols.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号