期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Self-tuning management of update-intensive multidimensional data in clusters of workstations

Vassil Kriakov George Kollios Alex Delis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):739-764

Contemporary applications continuously modify large volumes of multidimensional data that must be accessed efficiently and, more importantly, must be updated in a timely manner. Single-server storage approaches are insufficient when managing such volumes of data, while the high frequency of data modification render classical indexing methods inefficient. To address these two problems we introduce a distributed storage manager for multidimensional data based on a Cluster-of-Workstations. The manager addresses the above challenges through a set of mechanisms that, through selective on-line data reorganization, collectively maintain a balanced load across a cluster of workstations. With the help of both a highly efficient and speedy self-tuning mechanism, based on a new data structure called stat-index, as well as a query aggregation and clustering algorithm, our storage manager attains short query response times even in the presence of massive modifications and highly skewed access patterns. Furthermore, we provide a data migration cost model used to determine the best data redistribution strategy. Through extensive experimentation with our prototype, we establish that our storage manager can sustain significant update rates with minimal overhead. 相似文献

2.

Bandwidth Efficient All-to-All Broadcast on Switched Clusters

Ahmad Faraj Pitch Patarasuk Xin Yuan 《International journal of parallel programming》2008,36(4):426-453

Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used. All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm our theoretical finding. 相似文献

3.

随机模式匹配并行算法在工作站机群上的实现

下载免费PDF全文

薛淞文申卫昌剡公孝乔龙《计算机工程与应用》2010,46(21):129-131

对随机模式匹配算法进行了改进,并根据MPICH并行编程环境中任务间通信的特点,设计了一种基于MPICH的改进的随机模式匹配并行算法。根据运行在COW（工作站机群）上的进程数目将文本串进行重叠划分,每个进程完成一个文本子串的模式匹配。实验结果表明,该改进的随机模式匹配并行算法有效地加快了模式匹配的速度,提高了工作站机群的资源利用率。相似文献

4.

Techniques for pipelined broadcast on ethernet switched clusters

Pitch Patarasuk Xin Yuan Ahmad Faraj 《Journal of Parallel and Distributed Computing》2008

By splitting a large broadcast message into segments and broadcasting the segments in a pipelined fashion, pipelined broadcast can achieve high performance in many systems. In this paper, we investigate techniques for efficient pipelined broadcast on clusters connected by multiple Ethernet switches. Specifically, we develop algorithms for computing various contention-free broadcast trees that are suitable for pipelined broadcast on Ethernet switched clusters, extend the parametrized LogP model for predicting appropriate segment sizes for pipelined broadcast, show that the segment sizes computed based on the model yield high performance, and evaluate various pipelined broadcast schemes through experimentation on Ethernet switched clusters with various topologies. The results demonstrate that our techniques are practical and efficient for contemporary fast Ethernet and Giga-bit Ethernet clusters. 相似文献

5.

利用FCM求解最佳聚类数的算法 总被引：2，自引：0，他引：2

张姣玲《计算机工程与应用》2008,44(22):65-67

利用FCM求解最佳聚类数的算法中,每次应用FCM算法都要重新初始化类中心,而FCM算法对初始类中心敏感,这样使得利用FCM求解最佳聚类数的算法很不稳定。对该算法进行了改进,提出了一个合并函数,使得（c－1）类的类中心依赖于类的类中心。仿真实验表明：新的算法稳定性好,且运算速度明显比旧的算法要快。相似文献

6.

High performance computing on networks of workstations through the exploitation of function parallelism

Yung-Lin Liu Hau-Yang Cheng Chung-Ta King 《Journal of Systems Architecture》1999,45(15):1307-1321

Network of workstations (NOW) has become a widely accepted form of high-performance parallel computing. As in conventional multicomputers, parallel programs running on such a platform are often written in an SPMD form to exploit data parallelism. Each workstation in a NOW is treated similarly to a processing element in a multicomputer system. However, workstations are far more powerful and flexible than the processing elements in conventional multicomputers. In this paper, we discuss how workstations in a NOW can be used to exploit more parallelism in an SPMD program, especially those induced from concurrent activities. 相似文献

7.

Coupling hundreds of workstations for parallel molecular sequence analysis

Volker Strumpen 《Software》1995,25(3):291-304

We present a highly scalable approach to distributed parallel computing on workstations in the Internet which provides significant speed-up to molecular biology sequence analysis. Recent developments show that smaller numbers of workstations connected via a local area network can be used efficiently for parallel computing. This work emphasizes scalability with respect to the number of workstations employed. We show that a massively parallel approach using several hundred workstations, dispersed over all continents, can successfully be applied for solving problems with low requirements on communication bandwidth. We calculated the optimal local alignment scores between a single genetic sequence and all sequences of a genetic sequence database using the ssearch code that is well known among molecular biologists. In a heterogeneous network with more than 800 workstations this job terminated after several minutes, in contrast to several days it would have taken on a single machine. 相似文献

8.

A family of optimal termination detection algorithms

Neeraj Mittal S. Venkatesan Sathya Peri 《Distributed Computing》2007,20(2):141-162

An important problem in distributed systems is to detect termination of a distributed computation. A computation is said to have terminated when all processes have become passive and all channels have become empty. In this paper, we present a suite of algorithms for detecting termination of a non-diffusing computation for an arbitrary communication topology under a variety of conditions. All our termination detection algorithms have optimal message complexity. Furthermore, they have optimal detection latency when message processing time is ignored. A preliminary version of the paper first appeared in the 18th Symposium on Distributed Computing (DISC), 2004 [27]. 相似文献

9.

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

Li Xiao Xiaodong Zhang Zhengqian Kuang Baiming Feng Jichang Kang 《The Journal of supercomputing》2006,38(2):189-217

Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective Networks of Workstations (NOW). Auto-CFD-NOW is a pre-compiler that transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on NOW. Our work makes the following three unique contributions. First, this pre-compiler is highly automatic, requiring a minimum number of user directives for parallelization. Second, we have applied a dependency analysis technique for the CFD applications, called analysis after partitioning. We propose a mirror-image decomposition technique to parallelize self-dependent field loops that are hard to parallelize by existing methods. Finally, traditional optimizations of communication focus on eliminating redundant synchronizations. We have developed an optimization scheme which combines all the non-redundant synchronizations in CFD programs to further reduce the communication overhead. The Auto-CFD-NOW has been implemented on networks of workstations and has been successfully used for automatically parallelizing structured CFD application programs. Our experiments show its effectiveness and scalability for parallelizing large CFD applications. This work is supported in part by the China National Aerospace Science Foundation, and by the U.S. National Science Foundation under grants CCR-9812187, CCR-0098055, CCF-0325760, CCF 0514078, and CNS 0549006. 相似文献

10.

Efficient scheduling of MPI applications on networks of workstations

M.A.R. Dantas E.J. Zaluska 《Future Generation Computer Systems》1998,13(6):489-499

The availability of a large number of workstations connected through a network can represent an attractive option for high-performance computing for many applications. The message-passing interface (MPI) software environment is an effort from many organisations to define a de facto message-passing standard. In other words, the original specification was not designed as a comprehensive parallel programming environment and some researchers agree that the standard should be preserved as simple and clean as possible. Nevertheless, a software environment such as MPI should have somehow a scheduling mechanism for the effective submission of parallel applications on network of workstations. This paper presents an alternative lightweight approach called Selective-MPI (S-MPI), which was designed to enhance the efficiency of the scheduling of applications on an MPI implementation environment. 相似文献

11.

Heavy traffic optimal resource allocation algorithms for cloud computing clusters

《Performance Evaluation》2014

Cloud computing is emerging as an important platform for business, personal and mobile computing applications. In this paper, we study a stochastic model of cloud computing, where jobs arrive according to a stochastic process and request resources like CPU, memory and storage space. We consider a model where the resource allocation problem can be separated into a routing or load balancing problem and a scheduling problem. We study the join-the-shortest-queue routing and power-of-two-choices routing algorithms with the MaxWeight scheduling algorithm. It was known that these algorithms are throughput optimal. In this paper, we show that these algorithms are queue length optimal in the heavy traffic limit. 相似文献

12.

基于群机系统的并行程序的最大加速比计算 总被引：1，自引：0，他引：1

韩天舒胡铭曾《计算机工程与设计》1999,20(2):1-5

加速比是并行程序的重要指标之一。在大多数并行系统中,在数据规模确定的情况下,程序的加速比随节点工作站的增加而增加,但是大多数群机系统的节点工作站是共享物理传输介质的,这使得许多并行程序的加速比在节点机数目超过某一个值之后会随着节,点机的增加而减少。本文通过对群机系统上并行程序执行时间的分析,论述了在数据规模确定的情况下,程序能够获得的最大加速比和最短的计算时间,以及获得这个加速比和计算时间的节点机个数。相似文献

13.

Parallel strategies for the local biological sequence alignment in a cluster of workstations

Azzedine Boukerche Alba Cristina Magalhaes Alves de Melo Mauricio Ayala-Rincón Maria Emilia Machado Telles Walter 《Journal of Parallel and Distributed Computing》2007

Recently, many organisms have had their DNA entirely sequenced. This reality presents the need for comparing long DNA sequences, which is a challenging task due to its high demands for computational power and memory. Sequence comparison is a basic operation in DNA sequencing projects, and most sequence comparison methods currently in use are based on heuristics, which are faster but offer no guarantees of producing the best alignments possible. In order to alleviate this problem, Smith–Waterman proposed an algorithm. This algorithm obtains the best local alignments but at the expense of very high computing power and huge memory requirements. In this article, we present and evaluate our experiments involving three strategies to run the Smith–Waterman algorithm in a cluster of workstations using a Distributed Shared Memory System. Our results on an eight-machine cluster presented very good speed-up and indicate that impressive improvements can be achieved depending on the strategy used. In addition, we present a number of theoretical remarks concerning how to reduce the amount of memory used. 相似文献

14.

基于Web的远程集群监控系统的设计与实现 总被引：2，自引：0，他引：2

童端董小社李纪云吴维刚《计算机工程与应用》2003,39(35):100-102

集群系统的商品化部件构成特点在具有高性价比优点的同时,也带来了可用性和可管理性差的缺点,因此集群系统的监控就变得特别重要。该文结合国家高性能计算中心(西安)的Linux集群系统给出了一种基于Web的集群监控系统的体系结构框架以及实现策略,详细介绍了数据采集、信息收集和存储以及状态的可视化各个模块的具体实现,基于WEB的实现策略使该系统具有平台无关性和监控远程性的优点。相似文献

15.

VP4:基于机群的pvm并行程序性能可视化工具

李小洲《计算机应用与软件》2002,19(5):4-6,28

本文介绍了一个通用的pvm并行程序性能可视化软件工具VP~4。针对工作站机群的特点,它采用多层次性能数据采集方法和基于事件的采取策略,这样可以在尽量减少“侵入影响”的前提下,采集并汇总全部性能数据。VP~4对汇总的性能数据进行处理后,利用图形与动画生成各种易于使用的可视化性能视图。通过实验表明,本软件工具可以有效的帮助用户发现性能瓶颈,辅助用户开发高性能的并行程序。相似文献

16.

工位数固定的U型拆卸线部分拆卸平衡问题

下载免费PDF全文

吴秀丽张兴宇《控制理论与应用》2024,41(6):1079-1088

为提高工位数固定的U型拆卸线拆卸效率, 减少有害部件对操作人员的潜在威胁, 针对高价值零部件和有害零部件的拆卸需求, 本文提出了工位数固定的U型拆卸线部分拆卸平衡问题, 建立了以最小化节拍时间、高危工位数目和负载均衡为目标的优化模型, 并设计了改进的变邻域搜索算法进行求解. 在编码过程中提出一种基于零部件释放位置的选择策略, 以减少前继零部件拆卸顺序对编码的影响; 提出最小偏差二分法, 有效减少解码的迭代次数; 提出瓶颈挤压局部搜索策略, 用以优化节拍时间和均衡负载指标. 通过与其他算法对比, 结果表明改进的变邻域搜索算法求解具有优越性, 并且可实现对工位数固定的U型拆卸线部分拆卸平衡问题的高效求解. 相似文献

17.

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

Teng Ma George Bosilca Aurelien Bouteiller Jack J. Dongarra 《Journal of Parallel and Distributed Computing》2013

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non-uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective topology and the hardware topologies, as well as the use of single-copy kernel assisted mechanisms. However, on distributed environments, a single level approach cannot encompass the extreme variations not only in bandwidth and latency capabilities, but also in the capability to support duplex communications or operate multiple concurrent copies. This calls for a collaborative approach between multiple layers of collective algorithms, dedicated to extracting the maximum degree of parallelism from the collective algorithm by consolidating the intra- and inter-node communications. 相似文献

18.

Models and algorithms for coscheduling compute-intensive taks on a network of workstations

Mikhail J. Atallah Christina Lock Black Dan C. Marinescu Howard Jay Siegel Thomas L. Casavant 《Journal of Parallel and Distributed Computing》1992,16(4)

The problem of using the idle cycles of a number of high performance workstations, interconnected by a high speed network, for solving computationally intensive tasks is discussed. The classes of distributed applications examined require some form of synchronization among the subtasks, hence the need for coscheduling to guarantee that subtasks start at the same time and execute at the same pace on a group of workstations. A model of the system is presented that allows the definition of an objective function to be maximized. Then a quadratic time and linear space algorithm is derived for computing the optimal coschedule, for the given model and class of applications addressed. 相似文献

19.

Fast algorithms for computing tree LCS

Shay Mozes Dekel Tsur Oren Weimann Michal Ziv-Ukelson 《Theoretical computer science》2009

相似文献

20.

大规模化工过程系统的分解协调优化并行算法 总被引：2，自引：0，他引：2

张帆《计算机仿真》2004,21(6):74-77

该文针对大规模化工过程系统优化中计算能力不够的情况,研究一种适合于大系统求解的分解协调算法。在SQP算法分解计算的基础上,利用无约束优化算法进行协调,同时采用并行技术以提高求解效率。利用单机与机群系统建构仿真计算环境,对一换热器系统进行了实际解算。算例结果表明,此算法是行之有效的,在大规模过程系统优化计算中可进行推广应用。相似文献