期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A tree-based particle swarm optimization for multicast routing

Hua Wang Xiangxu Meng Shuai Li Hong Xu 《Computer Networks》2010,54(15):2775-2786

QoS multicast routing is a non-linear combinatorial optimization problem. It tries to find a multicast routing tree with minimal cost that can satisfy constraints such as bandwidth, delay, and delay jitter. This problem is NP-complete. The solution to such problems is often to search first for paths from the source node to each destination node and then integrate these paths into a multicast tree. Such a method, however, is slow and complex. To overcome these shortcomings, we propose a new method for tree-based optimization. Our algorithm optimizes the multicast tree directly, unlike the conventional solutions to finding paths and integrating them to generate a multicast tree. Our algorithm also applies particle swarm optimization to the solution to control the optimization orientation of the tree shape. Simulation results show that our algorithm performs well in searching, converging speed and adaptability scale. 相似文献

2.

一种有效的时延约束组播路由算法

黄勇《电脑与信息技术》2002,10(6):16-18

在多媒体通信网络中，组播问题提出了新的要求，除了最小化组播通信的代价，同时要求保证每一个目的的节点在固定的延时之内接收信息，在这篇论文中，我们提出了一个边路选择函数用于解决时延约束组播问题，我们的实验结果揭示了该函数能提供满足时约束且代价较小的组播路由问题近似解。相似文献

3.

Moving address translation closer to memory in distributed shared-memory multiprocessors

Qiu X. Dubois M. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(7):612-623

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low. 相似文献

4.

Synchronization algorithms for shared-memory multiprocessors 总被引：2，自引：0，他引：2

Graunke G. Thakkar S. 《Computer》1990,23(6):60-69

A performance evaluation of the Symmetry multiprocessor system revealed that the synchronization mechanism did not perform well for highly contested locks, like those found in certain parallel applications. Several software synchronization mechanisms were developed and evaluated, using a hardware monitor, on the Symmetry multiprocessor system; the mechanisms were to reduce contention for the lock. The mechanisms remain valuable even when changes are made to the hardware synchronization mechanism to improve support for highly contested locks. The Symmetry architecture is described, and a number of lock algorithms and their use of hardware resources are examined. The performance of each lock is observed from the perspective of both the program itself and the total system performance 相似文献

5.

A distributed multicast routing protocol for real-time multicast applications

《Computer Networks》1999,31(1-2):101-110

Multicast routing is establishing a tree which is rooted from the source node and contains all the multicast destinations. A delay bounded routing tree is a tree in which the accumulated delay from the source node to any destination along the tree does not exceed a pre-specified bound. This paper presents a distributed routing protocol which constructs delay bounded routing trees for real-time multicast connections. A constructed routing tree has a near optimal network cost under the delay bound constraint. The proposed algorithm is fully distributed, efficient in terms of the number of messages required, and flexible in multicast membership changes. A large number of simulations have been done to show the network cost of the routing trees generated by our method is better than the other major existing algorithms. 相似文献

6.

An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration

Acacio M.E. Gonzalez J. Garcia J.M. Duato J. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(8):755-768

Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware, and the network interface/router. In this paper, we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel three-level directory architecture, as well as it adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node, which enhances performance by saving accesses to the slower main memory. Scalability is guaranteed by having the second and third-level directories out of the processor chip and using compressed data structures. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them, is also presented. Using execution-driven simulations, we show that significant latency reductions can be obtained by using the proposed node architecture, which translates into reductions of more than 30 percent in several cases in the application execution time. 相似文献

7.

An efficient algorithm for group multicast routing with bandwidth reservation 总被引：1，自引：0，他引：1

C. P. Low N. Wang 《Computer Communications》2000,23(18):1740-1746

Multicasting refers to the transmission of data from a source node to multiple destination nodes in a network. Group multicasting is a generalisation of multicasting whereby every member of a group is allowed to multicast messages to other members that belong to the same group. The routing problem in this case involves the construction of a set of low cost multicast trees with bandwidth requirements, one for each member of the group, for multicasting messages to other members of the group. In this paper, we propose a new heuristic algorithm to generate a set of low cost multicast trees with bandwidth requirements. Simulation results show that our proposed algorithm performed better in terms of cost and in terms of utilisation of bandwidth as compared to an existing algorithm that was proposed by Jia and Wang [3]. 相似文献

8.

An efficient distributed algorithm for generating and updating multicast trees

Luca Gatani Giuseppe Lo Re Salvatore Gaglio 《Parallel Computing》2006,32(11-12):777

As group applications are becoming widespread, efficient network utilization becomes a growing concern. Multicast transmission represents a necessary lower network service for the wide diffusion of new multimedia network applications. Multicast transmission may use network resources more efficiently than multiple point-to-point messages; however, creating optimal multicast trees (Steiner Tree Problem in networks) is prohibitively expensive. This paper proposes a distributed algorithm for the heuristic solution of the Steiner Tree Problem, allowing the construction of effective distribution trees using a coordination protocol among the network nodes. Furthermore, we propose a novel distributed technique for dynamically updating the multicast tree. The approach proposed has been implemented and extensively tested both in simulation, and on experimental networks. Performance evaluation indicates that the distributed algorithm performs as well as the centralized version, providing good levels of convergence time and communication complexity. 相似文献

9.

An efficient fault-tolerant multicast routing protocol withcore-based tree techniques

Weijia Jia Wei Zhao Dong Xuan Gaochao Xu 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(10):984-1000

In this paper, we design and analyze an efficient fault-tolerant multicast routing protocol. Reliable multicast communication is critical for the success of many Internet applications. Multicast routing protocols with core-based tree techniques (CBT) have been widely used because of their scalability and simplicity. We enhance the CBT protocol with fault tolerance capability and improve its efficiency and effectiveness. With our strategy, when a faulty component is detected, some pre-defined backup path(s) is (are) used to bypass the faulty component and enable the multicast communication to continue. Our protocol only requires that routers near the faulty component be reconfigured, thus reducing the runtime overhead without compromising much of the performance. Our approach is in contrast to other approaches that often require relatively large tree reformation when faults occur. These global methods are usually costly and complicated in their attempt to achieve theoretically optimal performance. Our performance evaluation shows that our new protocol performs nearly as well as the best possible global method while utilizing much less runtime overhead and implementation cost 相似文献

10.

动态优化的分布式组播路由算法

李岩龙王华童永安《计算机工程与设计》2010,31(5)

为了在真实的网络环境中寻找一棵延迟受限、耗费最小的组播转发树,以便更好地支持组播通信,提出了一个可以动态优化的分布式组播路由算法,该算法利用蚁群思想解决上述组播路由问题.由于不同代的蚂蚁之间可以通过信息素来实现间接通信,而信息素又是一种可以反映环境变化的媒介质,因此,该算法能够根据网络环境的变化及时做出调整.结合实际的网络拓扑,进行仿真实验,实验结果表明,通过蚂蚁一代代的进化,算法可以找到一棵满足延迟约束并且耗费尽可能小的组播树. 相似文献

11.

A distributed QoS-Aware multicast routing protocol 总被引：7，自引：0，他引：7

Li?Layuan Email author Li?Chunlin 《Acta Informatica》2003,40(3):211-233

This paper discusses the multicast routing problem with QoS constraints, and describes a network model that is suitable to research such routing problem. The paper mainly presents a distributed QoS-aware multicast routing protocol (QMRP). The QMRP can operate on top of the unicast routing protocol. It only requires the local state information of the link (or the node), but does not require any global network state to be maintained. The QMRP can significantly reduce the overhead for constructing a multicast tree with QoS constraints. In QMRP, a multicast group member can join or leave the multicast session dynamically, which can support dynamic membership. The protocol can search multiple feasible tree branches, and select the optimal or near-optimal branch for connecting the new receiver to the multicast tree if it exists. In this paper, the proof of correctness and complexity analysis of the QMRP are given, and the performance measures of the protocol are evaluated using simulation. The study shows that QMRP provides an available approach to multicast routing with QoS constraints and dynamic membership support.Received: 3 April 2003, Published online: 2 September 2003 相似文献

12.

Fast synchronization on shared-memory multiprocessors: An architectural approach

《Journal of Parallel and Distributed Computing》2005,65(10):1158-1170

Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases.The contributions of this paper are threefold. First, we revisit some representative synchronization algorithms in light of recent architecture innovations and provide an example of how the simplifying assumptions made by typical analytical models of synchronization mechanisms can lead to significant performance estimate errors. Second, we present an architectural innovation called active memory that enables very fast atomic operations in a shared-memory multiprocessor. Third, we use execution-driven simulation to quantitatively compare the performance of a variety of synchronization mechanisms based on both existing hardware techniques and active memory operations. To the best of our knowledge, synchronization based on active memory outforms all existing spinlock and non-hardwired barrier implementations by a large margin. 相似文献

13.

A compiler optimization algorithm for shared-memory multiprocessors

McKinley K.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):769-787

This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines 相似文献

14.

Parallel Execution of Prolog on Shared-Memory Multiprocessors

下载免费PDF全文

Gao Yaoqing Wang Dingxing Zheng Weimin Shen Meiming Huang Zhiyi Hu Shouren Giotto Levi 《计算机科学技术学报》1993,8(4):43-50

Logic programs offer many opportunities for the exploitation of parallelism.But the parallel execution of a task incurs various overheads.This paper focuses on the issues relevant to parallelizing Prolog on shared-memory multiprocessors efficiently. 相似文献

15.

传感网络中基于位置信息的分布式多播路由算法

吕珊李军义《计算机工程与设计》2009,30(4)

提出了一个传感网络中基于位置信息的成簇思想,并基于该思想给出了一种传感网络中基于位置信息的分布式多播路由算法.该算法首先利用相邻节点的位置信息分布式成簇,然后各簇头利用蚂蚁算法分布式地找到一条具有最短总跳数的到目的节点组的实际路径.最后,由簇头负责收集本簇内的传感信息,并在对这些信息进行汇聚处理后,沿找到的最优路径将汇聚数据分别发送到各目的节点.理论分析和仿真结果表明,该算法能有效地节约能量,具有较好的路由性能. 相似文献

16.

Token coherence: a new framework for shared-memory multiprocessors

Martin M.M.K. Hill M.D. Wood D.A. 《Micro, IEEE》2003,23(6):108-116

Commercial workload and technology trends are pushing existing shared-memory multiprocessor coherence protocols in divergent directions. Token coherence provides a framework for new coherence protocols that can reconcile these opposing trends. The token coherence framework directly enforces the coherence invariant by counting tokens (requiring all of a block's tokens to write and at least one token to read). This token-counting approach enables more obviously correct protocols that do not rely on request ordering and can operate with alternative policies that seek to improve the performance of future multiprocessors. 相似文献

17.

An efficient multipath routing for distributed computing systems with data replication

D. J. Chen P. Y. Chang 《Information Sciences》1999,120(1-4):143-157

In distributed computing environments, executing a program often requires the access of remote data files. An efficient data routing scheme is thus important for time-critical applications. To ensure a prior desired communication quality, we present a connection-oriented routing scheme, the multipath routing, which allows multiple routes to be established between the source and the destination. Based on the multipath routing scheme, the problem of finding a collection of routing paths for an application to minimize its data transmission time is addressed. Such a problem becomes a complex combinatorial one when the application accesses multiple replicated data sources. Since finding an optimal solution is computationally infeasible in practice, we thus propose a heuristic method to get a sub-optimal solution. 相似文献

18.

A parallel copying garbage collection scheme for shared-memory multiprocessors

Khayri A. M. Ali 《New Generation Computing》1996,14(1):53-77

Earlier work on parallel copying garbage collection is based on parallelization of sequential schemes with breadth-first traversing of live data. Recently it has been demonstrated that sequential copying schemes with depth-first traversing of live data are more flexible and more efficient than the corresponding ones with breadth-first traversal. A clear advantage of the former is that they work with no extra space overheads on non-contiguous memory blocks, which allows more flexible implementation. This paper shows how to parallelize an efficient depth-first copying scheme while retaining its high efficiency and flexibility. Research interests: His research interests include techniques for parallel and distributed implementation of functional, logic, concurrent constraints and object-oriented programming systems. He is also interested in performance analysis and garbage collection. His current interest is efficient techniques for distributed implementation of the concurrent programming language Oz. 相似文献

19.

Compile-time optimization of near-neighbor communication for scalable shared-memory multiprocessors

David E. Hudak Santosh G. Abraham 《Journal of Parallel and Distributed Computing》1992,15(4)

Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines, where the exploitation of the memory hierarchy is critical to achieving high performance. Iterative data parallel loops with near-neighbor communication account for many important numerical applications. In such loops, the communication of partial results stresses the memory system performance. In this paper, we develop data placement schemes that minimize communication time where the near-neighbor interaction is determined by a stencil. Under a given loop partition, our compile-time algorithm partitions global data into four classes for each processor, with each class requiring specific consistency maintenance requirements. The ADAPT (Automatic Data Allocation and Partitioning Tool) system was implemented to automatically partition parallel code segments for the BBN TC2000, a scalable shared-memory multiprocessor. ADAPT caches global arrays and maintains data consistency in software through instructions that flush data from private caches. Restructuring of a fluid flow code segment by ADAPT improved performance by a factor of more than 3 on the BBN TC2000. Features in current generation pipelined processors with multiple functional units permit the overlap of memory accesses with computation. Our experiments on the BBN TC2000 show that the degree of overlap is limited by architectural parameters, such as the number of CPU registers. 相似文献

20.

Design of a high-speed optical interconnect for scalable shared-memory multiprocessors

Avinash Karanth Kodi Louri A. 《Micro, IEEE》2005,25(1):41-49

Large-scale distributed shared-memory multiprocessors (DSMs) provide a shared address space by physically distributing the memory among different processors. A fundamental DSM communication problem that significantly affects scalability is an increase in remote memory latency as the number of system nodes increases. Remote memory latency, caused by accessing a memory location in a processor other than the one originating the request, includes both communication latency and remote memory access latency over I/O and memory buses. The proposed architecture reduces remote memory access latency by increasing connectivity and maximizing channel availability for remote communication. It also provides efficient and fast unicast, multicast, and broadcast capabilities, using a combination of aggressively designed multiplexing techniques. Simulations show that this architecture provides excellent interconnect support for a highly scalable, high-bandwidth, low-latency network. 相似文献