期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张辰孟凡荣《计算机工程与设计》2004,25(12):2245-2246,2260

集群作为成本低、高性能并行或分布式计算平台已经成熟，同时提供许多关键服务(高效通信、负载平衡等)的软件包也已经出现，然而对于集群文件系统的可扩展性和高可用性仍然缺乏有效的支持。为此，介绍了一种基于可扩展性和高可用性的集群文件系统模型，并从可用性角度与集群文件系统(CFS，Cluster File System)进行了比较分析。相似文献

2.

一种基于结点的分布式合作缓存管理算法DCC 总被引：2，自引：2，他引：0

郑晓薇郑纬民沈美明《计算机研究与发展》1999,36(9):1057-1061

工作站机群系统需要有一个高性能的并行文件系统以适用高速输入、输出数据处理的要求,而提高并行文件系统的关键是合作缓存技术。文中提出了一种基于结点的分布式合作缓存管理算法ＤＣＣ。该算法综合了避管理器和基于局部信息两种算法的优点,给出了一种位于结点机上的主块信息站的方法,通过对主块信息站的信息进行维护来达到对全局信息的跟踪。算法采用积极的局限信息维护策略,提高了主块位置判断的准确性。与ＧＭＩＳ算法、Ｈｉ相似文献

3.

Supporting Cost-Effective Fault Tolerance in Distributed Message-Passing Applications with File Operations 总被引：1，自引：0，他引：1

Ouyang Jinsong Maheshwari Piyush 《The Journal of supercomputing》1999,14(3):207-232

In this paper we present an approach to reliable distributed computing, which incorporates fault tolerance into applications at low cost, in terms of both run-time performance and programming effort required to construct reliable application software. In our model fault tolerance is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level reliable transmission protocol. By employing novel techniques 8and algorithms, our approach is distinguished from other consistent checkpointing schemes by the following features: first, minimum communication overhead for constructing a consistent distributed checkpoint and catching messages in transit during checkpointing; second, tolerance to message losses due to site failures or unreliable non-FIFO networks; and third, efficient checkpointing and recovery of persistent state, i.e., user files. Based on the model, a software library prototype called Libra has been implemented for supporting fault tolerance in distributed message-passing applications with file operations. The library provides an easy to use programming interface including message-passing and file I/O primitives, which hides the complexity of both fault-tolerant network communications and checkpointing and recovering user files from the application level. Experience with a number of long-running distributed applications shows that Libra can provide fault tolerance in a cost-effective manner. 相似文献

4.

Applications of machine learning approach on multi-queue message scheduling

Mu-Song Chen Hao-Wei Yen 《Expert systems with applications》2011,38(4):3323-3335

Due to limited resource contentions and deadline constraints, messages on the controller area network (CAN) are competing for service from the common resources. This problem can be resolved by assigning priorities to different message classes to satisfy time-critical applications. Actually, because of the fluctuation of network traffic or an inefficient use of resources, these static or dynamic priority policies may not guarantee flexibility for different kinds of messages in real-time scheduling. Consequently, the message transmission which cannot comply with the timing requirements or deadlines may deteriorate system performance significantly. In this paper, we have proposed a controller-plant model, where the plant is analogous to a message queue pool (MQP) and the message scheduling controller (MSC) is responsible to dispatch resources for queued messages according to the feedback information from the MQP. The message scheduling controller, which is realized by the radial basis function (RBF) network, is designed with machine learning algorithm to compensate the variations in plant dynamics. The MSC with the novel hybrid learning schemes can ensure a low and stable message waiting time variance (or a uniform distribution of waiting time) and lower transmission failures. A significant emphasis of the MSC is the variable structure of the RBF model to accommodate to complex scheduling situations. Simulation experiments have shown that several variants of the MSC significantly improve overall system performance over the static scheduling strategies and the dynamic earliest-deadline first (EDF) algorithms under a wide range of workload characteristics and execution environments. 相似文献

5.

PC机群上JIAJIA与MPI的比较 总被引：3，自引：2，他引：3

下载免费PDF全文

胡明昌史岗胡伟武唐志敏张福新《软件学报》2003,14(7):1187-1194

对JIAJIA和MPI (message passing interface)是进行了比较.JIAJIA和MPI分别代表共享存储和消息传递的编程模式.MPI显式进行数据传输,编程复杂;JIAJIA由底层维护数据一致性,并附加提供简单的消息传递函数,编程容易、灵活.JIAJIA分配共享内存时开销较大,初始化时间比MPI长.提出了一个关于并行加速比与进程数目之间关系的近似经验公式,推出JIAJIA和MPI性能差距随着进程数目的增多而增大的结论.测试结果表明,大部分应用程序的JIAJIA和MPI版本的并行性能差距不超过10%.对于通信量很小的应用程序,其JIAJIA和MPI的性能差距较小,而通信量本身较大的应用程序,其JIAJIA和MPI的性能差距主要取决于运行时产生的实际通信量. 相似文献

6.

基于网络编码的航空自组网安全路由算法

罗长远庞松超《计算机应用研究》2017,34(4)

针对航空自组网路由可靠性低及安全性差的特点,提出了基于网络编码的安全路由算法NC-SRP。该算法基于地理位置信息确定协作编码簇进而构建多路径传输网络,保证了源节点和目的节点的匿名性;将消息编码后连同编码向量进行分割转发;协同簇内节点对消息重编码并多播,对累积编码向量重编码后分散转发,从而可以在不需要密钥的情况下保证消息的安全性。理论分析与仿真实验表明,NC-SRP提高了消息的安全性的同时依靠网络编码的优势提高了路由的性能。相似文献

7.

网络并行计算系统的消息存储器网络接口设计 总被引：4，自引：0，他引：4

武剑锋李三立戈弋《计算机学报》2000,23(2):195-201

文中通过定性分析典型并行应用程序,提出产蒙义了消息传递无关因子Ｒ,即堆中的数据的传递在整个消息传递中所占比例,而且后在一个实际的ＮＰＣ环境中对一组典型并行应用程序进行踪迹统计,证实了Ｒ接近１的分析,根据这个定性分析以及定量统计结构,结合存储器技术的进展,在ＮＰＣ中的网络接口上引入了消息存储器,使得ＮＰＣ中各个结点可以直接访问其它结点的消息存储器,通过竣是出结论,在设置了消息存储器的网络接口的ＮＰＣ相似文献

8.

采用故障接管策略的容错MPI实现技术

周恩强卢宇彤《计算机工程》2004,30(17):89-91

面向集群系统的通信故障，研究了如何在消息传递层采用故障接管实现通信子系统的透明容错。并描述了基于高性能通信接口NICHAL的容错MPI(R-MPI)实现，测试数据表明该实现有效利用TRDMA特征实现容错通信协议。相似文献

9.

Communication styles for parallel systems

Gross T. Hinrichs S. O'Hallaron D.R. Stricker T. Hasegawa A. 《Computer》1994,27(12):34-44

Distributed-memory parallel systems rely on explicit message exchange for communication, but the communication operations they support can differ in many aspects. One key difference is the way messages are generated or consumed. With systolic communication, a message is transmitted as it is generated. For example, the result computed by the multiplier is sent directly to the communication subsystem for transmission to another node. With memory communication, the complete message is generated and stored in memory, and then transmitted to its destination. Since sender and receiver nodes are individually controlled, they can use different communication styles. One example of memory communication is message passing: both the sender and receiver buffer the message in memory. These two communication styles place different demands on processor design. This article illustrates each style's effect on processor resources for some key application kernels. We are targeting the iWarp system because it supports both communication styles. Two parallel-program generators, one for each communication style, automatically map the sample programs 相似文献

10.

A real-time primary-backup replication service

Hengming Zou Jahanian F. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(6):533-548

This paper presents a real-time primary-backup replication scheme to support fault-tolerant data access in a real-time environment. The main features of the system are fast response to client requests, bounded inconsistency between primary and backup, temporal consistency guarantee for replicated data, and quick recovery from failures. The paper defines external and interobject temporal consistency, the notion of phase variance, and builds a computation model that ensures such consistencies for replicated data deterministically where the underlying communication mechanism provides deterministic message delivery semantics and probabilistically where no such support is available. It also presents an optimization of the system and an analysis of the failover process which includes failover consistency and failure recovery time. An implementation of the proposed scheme is built within the x-kernel architecture on the MK 7.2 microkernel from the Open Group. The results of a detailed performance evaluation of this implementation are also discussed 相似文献

11.

Software-based rerouting for fault-tolerant pipelined communication 总被引：1，自引：0，他引：1

Young-Joo Suh Dao B.V. Yalamanchili S. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(3):193-211

This paper presents a software-based approach to fault-tolerant routing in networks using wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is removed from the network by the local router and delivered to the messaging layer of the local node's operating system. The message passing software can reroute this message, possibly along nonminimal paths. Alternatively, the message may be addressed to an intermediate node, which will forward the message to the destination. A message may encounter multiple faults and pass through multiple intermediate nodes. The proposed techniques are applicable to both obliviously and adaptively routed networks. The techniques are specifically targeted toward commercial multiprocessors where the mean time to repair (MTTR) is much smaller than the mean time between router failures (MTBF), i.e., it is sufficient to tolerate a maximum of three failures. This paper presents requirements for buffer management, deadlock freedom, and livelock freedom. Simulation results are presented to evaluate the degradation in latency and throughput as a function of the number and distribution of faults. There are several advantages of such an approach. Router designs are minimally impacted, and thus remain compact and fast. Only messages that encounter faulty components are affected, while the machine is ensured of continued operation until the faulty components can be replaced. The technique leverages existing network technology, and the concepts are portable across evolving switch and router designs. Therefore, we feel that the technique is a good candidate for incorporation into the next generation of multiprocessor networks 相似文献

12.

Efficient communication using message prediction for clusters of multiprocessors

Ahmad Afsahi Nikitas J. Dimopoulos 《Concurrency and Computation》2002,14(10):859-883

With the increasing uniprocessor and symmetric multiprocessor computational power available today, interprocessor communication has become an important factor that limits the performance of clusters of workstations/multiprocessors. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems in such systems. A significant portion of the software communication overhead belongs to a number of message copying operations. Ideally, it is desirable to have a true zero‐copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination without any intermediate buffering. However, due to the fact that message‐passing applications at the send side do not know the final receive buffer addresses, early arrival messages have to be buffered at a temporary area. In this paper, we show that there is a message reception communication locality in message‐passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be efficiently used to drain the network and cache the incoming messages even if the corresponding receive calls have not yet been posted. The performance of these predictors, in terms of hit ratio, on some parallel applications are quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies. We also show that the proposed predictors do not have sensitivity to the starting message reception call, and that they perform better than (or at least equal to) our previously proposed predictors. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

13.

Fault-aware Communication Mapping for NoCs with Guaranteed Latency

Sorin Manolache Petru Eles Zebo Peng 《International journal of parallel programming》2007,35(2):125-156

As feature sizes shrink, transient failures of on-chip network links become a critical problem. At the same time, many applications require guarantees on both message arrival probability and response time. We address the problem of transient link failures by means of temporally and spatially redundant transmission of messages, such that designer-imposed message arrival probabilities are guaranteed. Response time minimisation is achieved by a heuristic that statically assigns multiple copies of each message to network links, intelligently combining temporal and spatial redundancy. Concerns regarding energy consumption are addressed in two ways. First, we reduce the total amount of transmitted messages, and, second, we minimise the application response time such that the resulted time slack can be exploited for energy savings through voltage reduction. The advantages of the proposed approach are guaranteed message arrival probability and guaranteed worst case application response time. 相似文献

14.

A study of e‐mail patterns

Sam Shah Brian D. Noble 《Software》2007,37(14):1515-1538

Although electronic mail is an increasingly important service, there are few empirical studies of e‐mail traffic. We have observed over 2.85 million messages passing through our departmental servers over the course of seven months, and derived distributions that approximate several important e‐mail parameters including message sizes, message senders and receivers and the burstiness of message deliveries. Our work is unique in that we also analyse message payloads: attachment content types, e‐mail redundancy, and the use of e‐mail as a sharing mechanism. These data can be used in developing e‐mail workloads for mail system engineering or benchmarking. To this end, we provide an improved version of Postmark, a small‐file Internet benchmark, that better approximates mail server characteristics. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

15.

Dealing with network partitions in structured overlay networks

Tallat M. Shafaat Ali Ghodsi Seif Haridi 《Peer-to-Peer Networking and Applications》2009,2(4):334-347

Structured overlay networks form a major class of peer-to-peer systems, which are touted for their abilities to scale, tolerate failures, and self-manage. Any long-lived Internet-scale distributed system is destined to face network partitions. Although the problem of network partitions and mergers is highly related to fault-tolerance and self-management in large-scale systems, it has hardly been studied in the context of structured peer-to-peer systems. These systems have mainly been studied under churn (frequent joins/failures), which as a side effect solves the problem of network partitions, as it is similar to massive node failures. Yet, the crucial aspect of network mergers has been ignored. In fact, it has been claimed that ring-based structured overlay networks, which constitute the majority of the structured overlays, are intrinsically ill-suited for merging rings. In this paper, we present an algorithm for merging multiple similar ring-based overlays when the underlying network merges. We examine the solution in dynamic conditions, showing how our solution is resilient to churn during the merger, something widely believed to be difficult or impossible. We evaluate the algorithm for various scenarios and show that even when falsely detecting a merger, the algorithm quickly terminates and does not clutter the network with many messages. The algorithm is flexible as the tradeoff between message complexity and time complexity can be adjusted by a parameter. 相似文献

16.

Building a reliable and high-performance content-based publish/subscribe system

Yaxiong Zhao Jie Wu 《Journal of Parallel and Distributed Computing》2013

相似文献

17.

网络并行超级计算系统THNPSC—1 总被引：2，自引：0，他引：2

李三立都志辉马群生王小鸽《计算机学报》2001,24(6):627-632

网络并行计算（也称集群式计算）是实现高性能计算的重要方式,该文介绍了一个清华大学研制的网络并行超级计算系统THNPSC－1,它是由Pentium Ⅲ SMP计算结点组成;网络互联采用两种高速网：一种是自制的具有动态仲裁与路由寻经的交叉开关网络THNet,另一种是100Mpbs的Ethernet.THNet中的交叉开关THSwitch是用15万门的ALTERA FPGA芯片构成,THNet还包括具有DMA引擎的网络适配器THNIA.THNet每一端口可以提供数据传输率为1．056Gbps,其聚合频宽可达8．4Gbps;采用固定用户缓冲和扩展的主动消息传递等法,THNet执行用户层的消息传递,旁路操作系统的系统调用,做到零拷贝的消息传递,乒乓测试结果表明：单向消息传递延迟可减少到8μs。THNetl软件包括THNIA驱动程序和支持用户层通信的函数库。此文对相关工作进行了简要对比,并说明了该系统的应用情况。相似文献

18.

Adaptive scheduling for integrated traffic on WDM optical networks

Maode Ma Xiaohong Huang 《Computer Networks》2004,44(6):581

One of the important issues in the design of future generation of high-speed networks is to provide differentiated service to different types of traffic with various time constraints. In this paper, we study the problem of providing real-time service to either hard or soft real-time messages and normal transmission service to variable-length messages without time constraints in WDM optical networks. We propose an adaptive scheduling algorithm for scheduling message transmissions in order to improve the network performance when both real-time and non real-time messages are transmitted in one topology. We have analyzed the complexity of the algorithm to show its feasibility. We have conducted extensive discrete-event simulations to evaluate the performance of the proposed algorithm. The study suggests that when scheduling message transmission in WDM networks differentiated services should be considered in order to meet time constraints of real-time messages while non real-time messages are being served so that the overall performance of the network could be improved. 相似文献

19.

基于动态连接的RDMA可靠传输协议设计

刘路张磊曹继军戴艺《计算机工程与科学》2012,34(8):184-190

未来100P/E级高性能计算机系统对网络的传输可靠性、性能均衡性、可扩展性方面有更高的需求。本文提出的RDMA传输模型,采取配置少量资源,动态连接使用的策略实现端到端的数据可靠传输。与传统的可靠通信协议如Infiniband相比,本方案的优势为:(1)支持自动重路由,可绕过网络故障区域保证消息的可靠传输;(2)支持报文乱序到达,支持源和目的间的多路径传输,提供消息的流控机制,能较好地均衡网络整体性能,减少网络热点和缓解网络拥塞;(3)基于通信接口硬件实现可靠性数据结构,不需要消耗主存为通信建立连接,具有极高的系统可扩展性。初步测试结果表明,采取了优化措施后,该协议不会增加小于4K字节消息的传输延迟。相似文献

20.

位置透明的MA可靠消息传递机制 总被引：1，自引：0，他引：1

杨娟李建国《计算机应用》2004,24(3):25-26,30

移动Agent系统中的通信机制多由RMI加上消息发送机制实现，在现有的三种主流消息发送机制上进行改进，提出了新的消息转发策略——资源分散模型(Resource Distributed Model)。RDM提供了一种类似于结合了Homeagent和按路径转发方式的寻址策略，达到消息可达的目的，基于RDM的移动服务(Mobile Service)是一种在快速寻址后将消息快速转发的方式，MS减少了消息缓存部件的消息缓存量，并可用多个MS同时寻址从而提高消息发送速度。相似文献