期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

袁援陈松乔陈建二《计算机科学》2003,30(10):131-132

Both active replication and passive replication in group communication system are limited to implementing the false-tolerance distributed system. In this paper, we have presented a duplicate technique called Semi-Activere plication SAR, which takes in the advantage of two former techniques. As taking events log as synchronous object,employing reliable multicast as synchronous way between copies, and using the substitution method in passivereplication, this new method implements the fault control to the processors in group communication system on thepremise of assuring the consistence between duplicates and deceasing overhead of the system. It is best answer for large distributed system. Finally we apply the method in the distributed applications based on EJB specifications. It is shown that SAG provides a good solution for the false-tolerance distributed system where the communication model between clients and servers is noint-to-ooint. 相似文献

2.

A formal semantics for debugging synchronous message passing-based concurrent programs

LI He LUO Jie LI Wei 《中国科学:信息科学(英文版)》2014,(12):194-211

In this paper, we propose a semantic framework to debug synchronous message passing-based con- current programs, which are increasingly useful as parallel computing and distributed systems become more and more pervasive. We first design a concurrent programming language model to uniformly represent exist- ing concurrent programming languages. Compared to sequential programming languages, this model contains communication statements, i.e., sending and receiving statements, and a concurrent structure to represent com- munication and concurrency. We then propose a debugging process consisting of a tracing and a locating procedure. The tracing procedure re-executes a program with a failed test case and uses specially designed data structures to collect useful execution information for locating bugs. We provide for the tracing procedure a struc- tural operational semantics to represent synchronous communication and concurrency. The locating procedure backward locates the ill-designed statement by using information obtained in the tracing procedure, generates a fix equation, and tries to fix the bug by solving the fix equation. We also propose a structural operational semantics for the locating procedure. We supply two examples to test our proposed operational semantics. 相似文献

3.

并行分布计算中的任务调度模型 总被引：3，自引：0，他引：3

陈华平黄刘生陈国良《计算机科学》1999,26(6):33-36

In this paper,we first describe the concept of task scheduing in Parallel and Distributed Computing(PDC) ,then illustrate the task scheduling model in PDC and the way of calculating the execution cost and communication cost ,and lastly discuss an approach to estimate the communication contention overhead. 相似文献

4.

Improvements to the Control Techniques of Sequential Inference Machines——from Instructions to Hardware Organization

下载免费PDF全文

Xing Hancheng Li Chunlin Xing Dongsheng 《计算机科学技术学报》1991,6(1):66-73

Nondeterminism of PROLOG execution requires that a block of control information or a choice point for each procedure call be stored when there are other candidate clauses to be used.When the currently selected clause fails,the bindings made by the clause must be undone and the stored choice point is reactivated,and then another clause of the candidate ones is chosen to run on it.Storing and reactivating choice points and undoing account for the great overhead are required to control PROLOG execution,which is quite different from conventional programs.This paper focuses on the techniques used in Sequential PROLOG Engine (SPE) to reduce the overhead of control operations.The control instructions of SPE store no more choice points than the necessary.Its architecture takes the approaches of analysing the potential parallelism in the control operations and developing a fraction of it due to the cost-effect consideration.The results of executing two sample programs on SPE in the form of hand timings are presented,which favor the approach. 相似文献

5.

Timing-sequence testing of parallel programs

下载免费PDF全文

LING Yu LI Shu ZHANG Hui HAN Chengde 《计算机科学技术学报》2000,15(1):84-95

Testing of parallel programs involves two parts-testing of control flow within the processes and testing of timing-sequence.This paper focuses on the latter,particularly on the timing-sequence of message-passing paradigms.Firstly the coarse-grained SYN-sequence model is built up to describe the execution of distributed programs.All of the topics discussed in this paper are based on it .The most direct way to test a program is to run it.A fault-free parallel program should be of both correct computing results and propoer SYN-sequence.In order to analyze the validity of observed SYN-sequence,this paper presents the formal specification (Backus Normal Form) of the valid SYN-sequence.Till now there is little work about the testing coverage for distributed programs.Calculating the number of the valid SYN-sequences is the key to coverage problem,while the number of the valid SYN-sequences is terribly large and it is very hard to obtain the combination law among SYN-events.In order to resolve this problem,this paper proposes an efficient testing strategy-atomic SYN-event testing,which is to linearize the SYN-sequence (making it only consist of serial atomic SYN-events)first and then test each atomic SYN-event independently.This paper particularly provides the calculating formula about the number of the valid SYN-sequences for tree-topology atomic SYN-event(broadcast and combine),Furthermore,the number of valid SYN-sequences also,to some degree,mirrors the testability of parallel programs.Taking tree-topology atomic SYN-event as an example,this paper demonstrates the testability and communication speed of the tree-topology atomic SYN-event under different numbers of branches in order to achieve a more satisfactory tradeoff between testability and communication efficiency. 相似文献

6.

A Fast Parallel Frequency-Domain Watermarking Algorithm

Soha S. Zaghloul Amira Al-Othman 《通讯和计算机》2014,(4):388-394

The aim of this research is to develop a faster watermarking algorithm based on the frequency domain. A sequential algorithm is picked, implemented, and parallelism is exploited in order to achieve a shorter execution time. Both DWT （discrete wavelet transform） and DCT （discrete cosine transform） are applied. Frequency domain watermarking techniques are known to be more robust. In addition, the algorithm falls under the bling category which implies a higher degree of security. A quad-core Intel Core TM i7-3630QM processor is used. The CPU （central processing unit） is 2.4 GHz and 6 GB RAM. MATLAB R2012a is used under Microsoft Windows 7 operating system. Two main lines of experiments are conducted; namely, the association of hosts to watermarks and the measurement of program speedup. Speedup is measured for both embedding and extraction operations on both dual-core and quad-core. Results reveal a gained speedup that reaches more than 200% as compared to the sequential algorithm. 相似文献

7.

Automatic recovery from resource exhaustion exceptions by collecting leaked resources

Zi-ying Dai Xiao-guang Mao Li-qian Chen Yan Lei 《浙江大学学报:C卷英文版》2014,15(8):622-635

相似文献

8.

Fault Tolerance and Recovery for Group Communication Services in Distributed Networks

下载免费PDF全文

王跃华周忠吴威《计算机科学技术学报》2012,27(2):298-312

Group communication services (GCSs) are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world.However,it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network (DN).While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network,they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures,which hinders the service practicability.In this paper,we present two schemes to address the challenges.The first one is a location-aware replication scheme called NS,which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns (e.g.,network partition and single point failure) while keeping replication overhead low.The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery.Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate,recovery latency,and communication cost. 相似文献

9.

Integrating Parallelizing Compilation Technologies for SMP Clusters

下载免费PDF全文

Xiao-BingFeng LiChen Yi-RanWang Xiao-MiAn LinMa Chun-LeiSang Zhao-QingZhang 《计算机科学技术学报》2005,20(1):0-0

In this paper, a source to source parallelizing compiler system, AutoPar, is presentd. The system transforms FORTRAN programs to multi-level hybrid MPI/OpenMP parallel programs. Integrated parallel optimizing technologies are utilized extensively to derive an effective program decomposition in the whole program scope. Other features such as synchronization optimization and communication optimization improve the performance scalability of the generated parallel programs, from both intra-node and inter-node. The system makes great effort to boost automation of parallelization. Profiling feedback is used in performance estimation which is the basis of automatic program decomposition. Performance results for eight benchmarks in NPB1.0 from NAS on an SMP cluster are given, and the speedup is desirable. It is noticeable that in the experiment, at most one data distribution directive and a reduction directive are inserted by the user in BT/SP/LU. The compiler is based on ORC, Open Research Compiler. ORC is a powerful compiler infrastructure, with such features as robustness, flexibility and efficiency. Strong analysis capability and well-defined infrastructure of ORC make the system implementation quite fast. 相似文献

10.

A note on the quadratic assignment problem

Wajeb Gharib Gharibi Omar Saeed Al-Mushayt 《通讯和计算机》2009,6(11):1-7,29

The quadratic assignment problem （QAP） is one of the most interesting and challenging combinatorial optimization problems in existence and one of the most difficult problems in the NP-hard class. Many real-life problems in several areas such as facilities location parallel and distributed computing, combinatorial data analysis and combinatorial optimization problems which have many application in computer engineering and industry can be formulated as a QAP. In this paper, the author give a short note on the QAP, giving our rounding approach with the survey of other algorithms that used to solve this important problem. 相似文献

11.

Compiler to interpreter: experiences with a distributed programming language

Robert M. Gebala Carole M. McNamee Ronald A. Olsson 《Software》2001,31(9):893-909

One interpretive approach for handling concurrency is to provide an interpreter instance for each executing language‐level process. Such an approach has mainly been applied to concurrent implementations of logic and functional languages. This paper describes the use of this approach in constructing an interpreter for an imperative, distributed programming language from an existing compiler and run‐time support system (RTS). Primary design goals were to exploit the existing compiler to the extent possible as well as to have minimal impact on the RTS used to support concurrency. We have been successful in meeting these goals. Additionally, performance results show our interpreter's execution times compare favorably to the times required for compilation, linkage, and execution of small programs or programs with a significant number of calls to the RTS; on such programs, our interpreter's performance also compares favorably to that of the standard Java implementation. However, for larger programs and programs with fewer calls to the underlying RTS, the conventional compiler‐based implementation outperforms the interpreter implementation. For many distributed programs in which network costs dominate, the performances of the two implementations differ little. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

12.

Update transport: a new technique for update synchronization inreplicated database systems

Singhal M. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(12):1325-1336

A fully distributed approach to update synchronization is presented where each site completely executes every update. This approach has several features-higher resiliency to different kinds of failures, higher parallelism, improved response to user requests, and low communication overhead. A fully distributed algorithm for concurrency control obtained by rehashing a previously published semidistributed algorithm into the fully distributed model of update execution is presented. A performance model of replicated database systems is presented and used to study the performance of the proposed algorithm and its semidistributed version. The results of the performance study reveal that the proposed approach can substantially improve the performance at the cost of moderate input/output overhead 相似文献

13.

Testing and debugging distributed programs using global predicates

Venkatesan S. Dathan B. 《IEEE transactions on pattern analysis and machine intelligence》1995,21(2):163-177

Testing and debugging programs are more involved in distributed systems than in uniprocessor systems because of the presence of the communication medium and the inherent concurrency. Past research has established that predicate testing is an approach that can alleviate some of the problems in this area. However, checking whether a general predicate is true in a particular distributed execution appears to be a computationally hard problem. This paper considers a class of predicates called conjunctive form predicates (CFP) that is quite useful in distributed program development, but can be tested efficiently. We develop fully-distributed algorithms to test CFP's, prove that these algorithms are correct, and analyze them for their message complexity. The analysis shows that our techniques incur a fairly low overhead on the distributed system 相似文献

14.

High-performance message striping over reliable transport protocols

Nader Mohamed Jameela Al-Jaroodi Hong Jiang David Swanson 《The Journal of supercomputing》2006,38(3):261-278

This paper introduces a high-performance middleware-level message striping approach to increase communication bandwidth for data transfer in heterogeneous clusters equipped with multiple networks. In this scheme, concurrency is used for the striping process. The proposed striping approach is designed to work at the middleware-level, between the distributed applications and the reliable transport protocols such as TCP. The middleware-level striping approach provides flexible, scalable, and hardware-, network-, and operating systems-independent communication bandwidth solution. In addition, techniques to enhance the performance of this approach over multiple networks are introduced. The proposed techniques, which minimize synchronization contention and eliminate the striping sequence header, rely on the features of a reliable transport protocol such as TCP to reduce some of the concurrent striping overhead. The techniques have been implemented and evaluated on a real cluster with multiple networks and the results show significant performance gains for data transfer over existing approaches. 相似文献

15.

Accelerating sequential programs on commodity multi-core processors

Yuanming Zhang Gang Xiao Takanobu Baba 《Journal of Parallel and Distributed Computing》2014

A recently proposed pipelined multithreading (PMT) technique exhibits wide applicability in parallelizing general sequential programs on multi-core processors. However, significant inter-core communication overhead limits PMT performance and prevents its commercial utilization. A simple and effective clustered pipelined multithreading (CPMT) approach is presented to accelerate sequential programs on commodity multi-core processors. This CPMT technique adopts a clustered communication mechanism that can yield very low average communication overhead by eliminating false sharing as well as reducing communication operation and transit delays in the software-only approach. A single-producer/single-consumer concurrent lock-free clusteredQueue algorithm based on a two-level queue structure is also proposed. The accuracy of CPMT is theoretically demonstrated. The performances of the algorithm and CPMT are evaluated on a commodity AMD Phenom four-core processor. The number of enqueue and dequeue times of the algorithm are 20.8 and 23 cycles given an appropriate parameter, respectively. The speedup of CPMT ranges from 13.1% to 119.8% for typical loops extracted from the SPEC CPU 2000 benchmark suite. 相似文献

16.

Visualization of Message Passing Parallel Programs with the TOPSYS Parallel Programming Environment

《Journal of Parallel and Distributed Computing》1993,18(2):118-128

Parallel programming is orders of magnitudes more complex than writing sequential programs. This is particularly true for programming distributed memory multiprocessor architectures based on message passing programming models. Apart from understanding the sequential parts of the parallel program, new degrees of freedom lead to additional problems. Understanding the synchronization and communication behavior of parallel programs is the most critical issue in programming distributed memory multiprocessors. The paper describes methods and tools for visualization and animation of the dynamic execution of parallel programs. Based on an evaluation and classification of existing visualization environments, the visualization and animation tool VISTOP (VISualization TOol for Parallel Systems) is presented as part of the integrated tool environment TOPSY S (TOols for Parallel SYStems) for programming distributed memory multiprocessors. VISTOP supports the interactive on-line visualization of message passing programs based on various views; in particular, a process graph based concurrency view for detecting synchronization and communication bugs. 相似文献

17.

Parallel execution of Prolog with granularity control

Lourdes Araujo Jose J. Ruz 《Future Generation Computer Systems》1998,13(6):421-441

This paper presents a system for parallel execution of Prolog supporting both independent conjunctive and disjunctive parallelism. The system is intended for distributed memory architecture and is composed of a set of workers with a hierarchical structure scheduler. The execution model has been designed in such a way that each worker's environment does not contain references to terms in other environments, thus reducing communication overhead. In order to guarantee the improvement of the performance by the parallelism exploitation, a granularity control has been introduced for each kind of parallelism. For conjunctive parallelism PDP applies a control based on the estimation provided by CASLOG. The features of the system allow to introduce this control without adding overhead. For disjunctive parallelism PDP controls granularity by applying a heuristic-based method, which can be adapted to other parallel Prolog systems. Different scheduling policies have also been tested. The system has been implemented on a transputer network and performance results show that it provides a high speedup for coarse grain parallel programs. 相似文献

18.

一种分布式实时事务调度算法

刘云生覃飙杨进才《小型微型计算机系统》2003,24(6):962-965

传统的乐观并发控制策略利用了一些不必要的事务重启来保证数据的一致性，事务重启能够极大的增加系统载荷以及加强资源和数据的竞争，在分布式环境下，由于系统的复杂性和较高的通讯开销加剧了该问题．针对该问题本文提出了一种靳的乐观并发控制策略，通过动态调整事务串行化执行顺序来避免不必要的事务重启．当把这种新的并发控制策略在分布式实时环境中实现时，考虑到分布式事务的实时性要求，本文提出把写阶段从临界区中分离出来的方法，并用顺序加锁的策略来保证分布式事务执行的正确性，最后给出了该实现方法的正确性证明．相似文献

19.

Parallel computing optimization in the Apollo domain network

Pekergin M.F. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(4):296-303

The performance of parallel computing in a network of Apollo workstations where the processes use the remote procedure call (RPC) mechanism for communication is addressed. The speedup in such systems cannot be accurately estimated without taking into account the relatively large communication overheads. Moreover, it decreases by increasing parallelism when the latter exceeds some certain limit. To estimate the speedup and determine the optimum degree of parallelism, the author characterizes the parallelization and the communication overheads in the system considered. Then, parallel applications are modeled and their execution times are expressed for the general case of nonidentical tasks and workstations. The general case study allows the structural constraints of the applications to be taken into account by permitting their partitioning into heterogeneous tasks. A simple expression of the optimum degree of parallelism is obtained for identical tasks where the inherent constraints are neglected. The fact that the theoretical maximum speedup is bounded by half of the optimum degree of parallelism shows the importance of this measure 相似文献

20.

A study of achievable speedup in distributed simulation via NULLmessages

Kumar D. Harous S. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(3):347-354

The results of an experimental study on distributed simulation of three open queuing networks are reported. The distributed simulation scheme considered is a simple variation of the scheme given by K.M. Chandy and J. Misra (1979) using NULL messages. A new approach is used to study the relationship between the overhead and performance of a distributed simulator, and the approach is illustrated by studying these three example networks. Two measures of ideal speedup of distributed simulation over sequential simulation are defined and measured. These values of ideal speedup are much less than simply the number of processors, and hence provide a more realistic value for the ideal speedup 相似文献