期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive data parallel computing on workstation clusters

《Journal of Parallel and Distributed Computing》2004,64(11):1241-1255

Many important parallel applications are data parallel, and may be efficiently implemented on a workstation cluster by allocating each workstation a contiguous partition of the data domain. Implementation on non-dedicated clusters, however, is complicated by the possibility of changes in workstation availability. For example, a personal workstation may be reclaimed by its primary user for interactive use. In such situations, a node must be removed from the collection of workstations forming the “virtual parallel machine” allocated to the application, and data redistributed accordingly. Conversely, workstations may become available to join the virtual parallel machine.This paper identifies fundamental characteristics of efficient policies for data redistribution following addition/removal of workstations from the cluster. The following conclusions are obtained based on mathematical analysis and simulations: (a) allocating data to a new node from the center of the data domain substantially reduces data migration costs compared to allocation from the edge; (b) addition in groups is beneficial compared to repeated single additions; and (c) even a large number of incremental adjustments of the data domain partitions, owing to successive additions/removals of nodes, do not appear to substantially degrade partition quality compared to that obtained by partitioning from scratch. We believe that these observations can be fruitfully incorporated in the design of workstation cluster support systems for data parallel computing. 相似文献

2.

A parallel heuristic for the Vehicle Routing Problem with Simultaneous Pickup and Delivery

A. Subramanian L.M.A. Drummond C. Bentes L.S. Ochi R. Farias 《Computers & Operations Research》2010,37(11):1899-1911

This paper presents a parallel approach for solving the Vehicle Routing Problem with Simultaneous Pickup and Delivery (VRPSPD). The parallel algorithm is embedded with a multi-start heuristic which consists of a variable neighborhood descent procedure, with a random neighborhood ordering (RVND), integrated in an iterated local search (ILS) framework. The experiments were performed in a cluster with a multi-core architecture using up to 256 cores. The results obtained on the benchmark problems, available in the literature, show that the proposed algorithm not only improved several of the known solutions, but also presented a very satisfying scalability. 相似文献

3.

Parallel program analysis on workstation clusters: Memory utilisation and load balancing

Roberto Togneri 《Concurrency and Computation》1998,10(8):631-653

Using a workstation cluster for parallel program development requires consideration of various factors to optimise the mapping of the algorithm to the characteristics of the environment. In this paper we present a new analysis and verification of well-known ideas in parallel programming research of specific importance to both the use and design of workstation cluster computing systems. We define a new performance measure related to memory resource utilisation and show how redundant memory usage can lead to poor memory utilisation of the cluster. We also present analytical and experimental evidence that the pool-of-tasks paradigm can lead to significantly improved speedup over series–parallel algorithms, especially when considering equivalent computational and communication requirements. The effect of load balancing on the series–parallel and pool-of-tasks algorithms is examined, and our analysis and experimental results confirm not only that the pool-of-tasks algorithms are more robust to load imbalances but that the effect of the imbalance is mitigated when more workstations are used. © 1998 John Wiley & Sons, Ltd. 相似文献

4.

Parallel Computing on an Ethernet Cluster of Workstations: Opportunities and Constraints 总被引：1，自引：0，他引：1

Hamdi Mounir Pan Yi Hamidzadeh B. Lim F. M. 《The Journal of supercomputing》1999,13(2):111-132

Parallel computing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallel computing over this parallel computing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation. We were successful in illustrating the effect of major factors on the performance and scalability of a cluster of workstations connected by an Ethernet network. Moreover, our analytical model was accurate enough to agree closely with the experimental results. Thus, we feel that such an investigation would be helpful in understanding the strengths and weaknesses of an Ethernet cluster of workstation in the execution of parallel applications. 相似文献

5.

基于群机系统的并行程序的最大加速比计算 总被引：1，自引：0，他引：1

韩天舒胡铭曾《计算机工程与设计》1999,20(2):1-5

加速比是并行程序的重要指标之一。在大多数并行系统中,在数据规模确定的情况下,程序的加速比随节点工作站的增加而增加,但是大多数群机系统的节点工作站是共享物理传输介质的,这使得许多并行程序的加速比在节点机数目超过某一个值之后会随着节,点机的增加而减少。本文通过对群机系统上并行程序执行时间的分析,论述了在数据规模确定的情况下,程序能够获得的最大加速比和最短的计算时间,以及获得这个加速比和计算时间的节点机个数。相似文献

6.

A fault-tolerant parallel heuristic for assignment problems

E. -G. Talbi Z. Hafidi D. Kebbal J. -M. Geib 《Future Generation Computer Systems》1998,14(5-6):425-438

This paper presents a new approach for parallel heuristic algorithms based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. This approach demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems with respect to the personal character of workstations. The fault-tolerant algorithm allows a minimal loss of computation in case of failures. The proposed algorithm exploits the properties of this class of applications in order to reduce the complexity of the algorithm in terms of the checkpoint files size and the control messages exchanged. The parallel heuristic algorithm combines different search strategies: simulated annealing and tabu search. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems. 相似文献

7.

基于工作站机群的PVM系统的序列比对 总被引：1，自引：0，他引：1

刘寿强潘春华桂兵祥吕国斌墙芳躅《计算机工程》2002,28(5):89-90,96

序列比对是分子生物学研究领域的一个重要的工具。在DNA数据量急剧增加的今天，高效的序列比对算法在研究新发现的次序中显得非常重要。通过Smith和Waterman法用PVM系统在工作站机群上已完成了分布式序列比对法。也同样在Inter iPSC/860高效性能并行计算机上获得了成功。这个分布式Smith-Waterman算法在Internet GRAIL和GENQUEST上充当搜索工具。该文论述了此算法的实现和性能指标。相似文献

8.

Parallel algorithm design for workstation clusters

J. N. Magee S. C. Cheung 《Software》1991,21(3):235-250

Clusters of workstations connected by local area networks are in common use in many organizations. The combined processing power of these clusters is rarely exploited owing to the lack of suitable parallel algorithms. The paper describes a parallel programming paradigm called supervisor-worker, suitable for the workstation environment, which can be used to speed up the execution of a large class of existing sequential programs. Simple formulae are developed to predict the speed-up of a parallel algorithm developed in this way. The predictions depend on two easily-determined parameters of the sequential program and the characteristic communication cost of the workstation cluster. Consequently, it is possible to estimate the benefits of the parallel program before proceeding with detailed implementation. As an example, the parallel version of a travelling salesman program is developed and the measured speed-up compared with the predicted speed-up. 相似文献

9.

Limitations of Cycle Stealing for Parallel Processing on a Network of Homogeneous Workstations

Scott T. Leutenegger Xian-He Sun 《Journal of Parallel and Distributed Computing》1997,43(2):733

The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. In this paper we address the feasibility and limitation of such a nondedicated parallel processing environment assuming workstation processes have priority over parallel tasks. We develop a simple analytical model to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. It forms a foundation for task partitioning and scheduling in a nondedicated network environment. A new term, task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. We propose that task ratio is a useful metric for determining how a parallel applications should be partitioned and scheduled in order to make efficient use of a nondedicated distributed system. 相似文献

10.

A distributed parallel genetic algorithm for solving optimal growth models

Paul M. Beaumont Patrick T. Bradshaw 《Computational Economics》1995,8(3):159-179

We use a distributed parallel genetic algorithm (DPGA) to fund numerical solutions to a single state deterministic optimal growth model for both the infinite and finite horizon cases. To evaluate the DPGA we consider a version of the Taylor-Uhlig problem for which the analytical solutions are known. The first-order conditions for the infinite horizon case lead to a nonlinear integral equation whose solution we approximate using a Chebyshev polynomial series expansion. The DPGA is used to search the parameter space for the optimal fit for this function. For the finite horizon case the DPGA searches the state space for a sequence of states which maximize the agent's discounted utility over the finite horizon. The DPGA runs on a cluster of up to fifty workstations linked via PVM. The topology of the function to be optimized is mapped onto each node on the cluster and the nodes essentially complete with one another for the optimal solution. We demonstrate that the DPGA has several useful features. For instance, the DPGA solves the exact Euler equation over the full range of the state variable and it does not require an accurate initial guess. The DPGA is easily generalized to multiple state problems. 相似文献

11.

Parallel Application Scheduling on Networks of Workstations

Stergios V. Anastasiadis Kenneth C. Sevcik 《Journal of Parallel and Distributed Computing》1997,43(2):364

Parallel applications can be executed using the idle computing capacity of workstation clusters. However, it remains unclear how to schedule the processors among different applications most effectively. Processor scheduling algorithms that were successful for shared-memory machines have proven to be inadequate for distributed memory environments due to the high costs of remote memory accesses and redistributing data. We investigate how knowledge of system load and application characteristics can be used in scheduling decisions. We propose a new algorithm based on adaptive equipartitioning, which, by properly exploiting both the information types above, performs better than other nonpreemptive scheduling rules, and nearly as well as idealized versions of preemptive rules (with free preemption). We conclude that the new algorithm is suitable for use in scheduling parallel applications on networks of workstations. 相似文献

12.

A distributed processing algorithm for solving integer programs using a cluster of workstations

《Parallel Computing》1997,23(6):733-753

The sequential branch and bound algorithm is the most established method for solving mixed integer and discrete programming problems. It is based on the tree search of the possible subproblems of the original problem. There are two goals in carrying out a tree search, namely, (i) finding a good and ultimately the best integer solution, and (ii) to prove that the best solution has been found or no integer feasible solution exists. We call these the stage 1 and stage 2 of tree search. In general it is extremely difficult to choose the ideal search strategy in stage 1 or stage 2 for a given integer programming (IP) problem. On the other hand by investigating a number of different strategies (and hence different search trees) a good solution can be reached quickly in respect of many practical IP problems. Starting from this observation a parallel branch and bound algorithm has been designed which exploits this two stage approach. In the first stage we investigate a number of alternative search trees (forest search) in the hope of finding a good solution quickly. This we call the multiple heuristic search (MHS). In this approach the best integer solution is broadcast to other processors involved in MHS tree development. In the second stage we reorganise the search to investigate branches of a chosen tree in parallel. This two stage algorithm has been implemented on a cluster of SUN workstations using the Parallel Virtual Machine (PVM) harness [12]. The results of our investigation for a range of well known test problems taken from the MIPLIB set and others from the literature are reported here. 相似文献

13.

A parallel adaptive tabu search approach

E.G. Talbi Z. Hafidi J-M. Geib 《Parallel Computing》1998,24(14):2003-2019

This paper presents a new approach for parallel tabu search based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. Adaptive parallelism demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems. The parallel tabu search algorithm includes different tabu list sizes and new intensification/diversification mechanisms. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems. 相似文献

14.

An efficient approach for type II robotic assembly line balancing problems

Jie Gao Linyan Sun Lihua Wang Mitsuo Gen 《Computers & Industrial Engineering》2009,56(3):1065-1080

In the past decades, robots have been extensively applied in assembly systems as called robotic assembly lines. When changes in the production process of a product take place, the line needs to be reconfigured in order to improve its productivity. This study presents a type II robotic assembly line balancing (rALB-II) problem, in which the assembly tasks have to be assigned to workstations, and each workstation needs to select one of the available robots to process the assigned tasks with the objective of minimum cycle time. An innovative genetic algorithm (GA) hybridized with local search is proposed for the problem. The genetic algorithm uses a partial representation technique, where only part of the decision information about a candidate solution is expressed in the chromosome and the rest is computed via a heuristic method. Based on different neighborhood structures, five local search procedures are developed to enhance the search ability of GA. The coordination between these procedures is well considered in order to escape from local optima and to reduce computation time. The performance of the hybrid genetic algorithm (hGA) is tested on 32 rALB-II problems and the obtained results are compared with those by other methods. 相似文献

15.

Scheduling algorithms for a semiconductor probing facility

June-Young Bang Yeong-Dae Kim 《Computers & Operations Research》2011

This paper focuses on a scheduling problem in a semiconductor wafer probing facility. In the probing facility, wafer lots with distinct ready times are processed on a series of workstations, each with identical parallel machines. We develop a heuristic algorithm for the problem with the objective of minimizing total tardiness of orders. The algorithm employs a bottleneck-focused scheduling method, in which a schedule at the bottleneck workstation is constructed first and then schedules for other workstations are constructed based on the schedule at the bottleneck. For scheduling wafer lots at the bottleneck workstation, we consider prospective tardiness of the lots as well as sequence-dependent setup times required between different types of wafer lots. We also present a rolling horizon method for implementation of the scheduling method on a dynamic situation. The suggested methods are evaluated through a series of computational experiments and results show that the methods work better than existing heuristic methods. 相似文献

16.

A sensitivity analysis method for driving the Artificial Bee Colony algorithm's search process

《Applied Soft Computing》2016

In this paper, we improve D. Karaboga's Artificial Bee Colony (ABC) optimization algorithm, by using the sensitivity analysis method described by Morris. Many improvements of the ABC algorithm have been made, with effective results. In this paper, we propose a new approach of random selection in neighborhood search. As the algorithm is running, we apply a sensitivity analysis method, Morris’ OAT (One-At-Time) method, to orientate the random choice selection of a dimension to shift. Morris’ method detects which dimensions have a high influence on the objective function result and promotes the search following these dimensions. The result of this analysis drives the ABC algorithm towards significant dimensions of the search space to improve the discovery of the global optimum. We also demonstrate that this method is fruitful for more recent improvements of ABC algorithm, such as GABC, MeABC and qABC. 相似文献

17.

Automatic test pattern generation on parallel processors

Sunil Arvindam Vipin Kumar V. Nageshwara Rao Vineet Singh 《Parallel Computing》1991,17(12):1323-1342

Test generation for combinational circuits is an important step in the VLSI design process. Unfortunately, the problem is highly computation-intensive and for circuits encountered in practice, test generation time can often be enormous. In this paper, we present a parallel formulation of the backtrack search algorithm called PODEM, which is a highly used algorithm for this problem. It is known that the sequential PODEM algorithm consumes most of its execution time in generating tests for ‘hard-to-detect’ (HTD) faults and is often unable to detect them even after a large number of backtracks. Our parallel formulation overcomes these limitations by dividing the search space and searching it concurrently using multiple processes.

We present a number of experimental results and show that these match our theoretical results presented elsewhere. We show that the search efficiency of the parallel algorithm improves and even beats that of the sequential algorithm as the ‘hardness’ of a fault increases. We present speedup results and performance analyses of our formulation on a 128 processor Symult s2010 multicomputer. We also present preliminary results on a network of Sun workstations. Our results show that parallel search techniques provides good speedups as well as high fault coverage of the HTD faults in reasonable time when compared to the uniprocessor implementation. Our experimental validation of most of our theoretical results builds confidence in the following theoretical prediction: our parallel formulation of PODEM is highly scalable on a variety of commercially-available, large MIMD parallel processors (in additions to the ones with which we experimented). 相似文献

18.

Prophet: automated scheduling of SPMD programs in workstation networks

Jon B. Weissman 《Concurrency and Computation》1999,11(6):301-321

Obtaining efficient execution of parallel programs in workstation networks is a difficult problem for the user. Unlike dedicated parallel computer resources, network resources are shared, heterogeneous, vary in availability, and offer communication performance that is still an order of magnitude slower than parallel computer interconnection networks. Prophet, a system that automatically schedules data parallel SPMD programs in workstation networks for the user, has been developed. Prophet uses application and resource information to select the appropriate type and number of workstations, divide the application into component tasks and data across these workstations, and assign tasks to workstations. This system has been integrated into the Mentat parallel processing system developed at the University of Virginia. A suite of scientific Mentat applications has been scheduled using Prophet on a heterogeneous workstation network. The results are promising and demonstrate that scheduling SPMD applications can be automated with good performance. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

19.

On how Pachycondyla apicalis ants suggest a new search algorithm

N. Reference to Monmarch G. Reference to Venturini M. Reference to Slimane 《Future Generation Computer Systems》2000,16(8):937-946

In this paper we present a new optimization algorithm based on a model of the foraging behavior of a population of primitive ants (Pachycondyla apicalis). These ants are characterized by a relatively simple but efficient strategy for prey search in which individuals hunt alone and try to cover a given area around their nest. The ant colony search behavior consists of a set of parallel local searches on hunting sites with a sensitivity to successful sites. Also, their nest is periodically moved. Accordingly, the proposed algorithm performs parallel random searches in the neighborhood of points called hunting sites. Hunting sites are created in the neighborhood of a point called nest. At constant intervals of time the nest is moved, which corresponds to a restart operator which re-initializes the parallel searches. We have applied this algorithm, called API, to numerical optimization problems with encouraging results. 相似文献

20.

工作站网络上协作任务的调度 总被引：9，自引：0，他引：9

齐红鞠九滨《软件学报》1998,9(1):14-17

在利用工作站群集系统进行的协作模式并行计算中，任务调度在很大程度上决定并行计算的性能．本文给出了一个任务调度的模型和算法，它考虑了协作模式并行计算中任务间的同步时间、通信时间、数据加载及结果收集时间．根据这个调度模型，可以选择一组并行执行时间最短的工作站，从而获得较好的并行计算性能．相似文献