期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

J. M. Jansen F. W. Sijstermans 《Future Generation Computer Systems》1989,4(4):271-279

In this paper a parallel algorithm for branch-and-bound problems is sketched. It is designed to run on MIMD machines and exploits coarse grain parallelism. Due to the irregular and unpredictable behavior of branch-and-bound algorithms, it is hard to obtain a good load-balance. Another design issue is the minimization of the communication overhead. Our algorithm overcomes these problems by a dynamic load-balancing strategy and a dynamic way to decide when communication is really useful. After first sketching a general parallel algorithm for branch-and-bound problems, we concentrate on a particular instance of a branch-and-bound problem: the so-called knapsack problem. The performance of the algorithm for this problem is measured with a simulator for a multiprocessor system. 相似文献

2.

Depth-m search in branch-and-bound algorithms

Toshihide Ibaraki 《International journal of parallel programming》1978,7(4):315-343

A new search strategy, called depth-m search, is proposed for branch-and-bound algorithms, wherem is a parameter to be set by the user. In particular, depth-1 search is equivalent to the conventional depth-first search, and depth- search is equivalent to the general heuristic search (including best-bound search as a special case). It is confirmed by computational experiment that the performance of depth-m search continuously changes from that, of depth-first search to that of heuristic search, whenm is changed from 1 to . The exact upper bound on the size of the required memory space is derived and shown to be bounded byO(n^m), wheren is the problem size. Some methods for controllingm during computation are also proposed and compared, to carry out the entire computation within a given memory space bound. 相似文献

3.

Determination of optimal recognition algorithms in the two-level system

Jerzy Jzefczyk 《Pattern recognition letters》1986,4(6):413-420

In this paper the problem of pattern recognition in the two-level system is investigated. The application of linear decision functions to the determination of the optimal recognition algorithms is presented. The results are obtained in a numerical way using a random method of optimization. 相似文献

4.

Theoretical comparisons of search strategies in branch-and-bound algorithms 总被引：1，自引：0，他引：1

Toshihide Ibaraki 《International journal of parallel programming》1976,5(4):315-344

Four known search strategies used in branch-and-bound algorithms-heuristic search, depth-first search, best-bound search, and breadth-first search-are theoretically compared from the viewpoint of the performance of the resulting algorithms. Heuristic search includes the other three as special cases. Since heuristic search is determined by a heuristic functionh, we first investigate how the performance of the resulting algorithms depends onh. In particular, we show that heuristic search is stable in the sense that a slight change inh causes only a slight change in its performance. The best and the worst heurstic functions are clarified, and also discussed is how the heuristic functionh should be modified to obtain a branch-and-bound algorithm with an improved performance. Finally, properties and limitations of depth-first search, best-bound search, and breadth-first search viewed as special cases of heuristic search are considered. In particular, it is shown that the stability observed for heuristic search no longer holds for depth-first search. 相似文献

5.

Efficient parallel branch-and-bound algorithm for constructing minimum ultrametric trees

Kun-Ming Yu Jiayi Zhou Chun-Yuan Lin Chuan Yi Tang 《Journal of Parallel and Distributed Computing》2009

The construction of evolutionary trees is important for computational biology, especially for the development of biological taxonomies. The ultrametric tree (UT) is a commonly used model for evolutionary trees assuming that the rate of evolution is constant (molecular clock hypothesis). However, the construction of minimum ultrametric trees (MUTs, principle of minimum evolution) has been shown to be NP-hard even from a metric distance matrix. The branch-and-bound algorithm is generally used to solve a wide variety of NP-hard problems. In previous work, a sequential branch-and-bound algorithm for constructing MUTs (BBU) was presented and the experimental results showed that it is useful for MUT construction. Hence, in this study, an efficient parallel branch-and-bound algorithm (PBBU) for constructing MUTs or near-MUTs from a metric distance matrix was designed. A random data set as well as some practical data sets of Human + Chimpanzee Mitochondrial and Bacteriophage T7 DNAs were used to test the PBBU. The experimental results show that the PBBU found an optimal solution for 36 species on 16 PCs within a reasonable time. To the best of our knowledge, no algorithm has been found to solve this problem even for 25 species. Moreover, the PBBU achieved satisfying speed-up ratios for most of the test cases. 相似文献

6.

基子遗传算法的自适应Linux内存管理系统研究

薛胜军张小华《通讯和计算机》2007,4(4):41-44

本文针对Linux内存管理系统的重要算法与相应参数在不同系统环境下对整个系统和其它应用程序性能的影响，提出了在内存管理系统裹增加一个基于遗传算法的自适应内存管理系统参数调整功能。当操作系统的状态发生重大改变时，使用遗传算法将它们转换为相应的参数调整策略，然后在适当的时候对系统参数进行调整，从而达到改善系统性能的目标。相似文献

7.

Implementation of two-level algorithms using fixed-point arithmetic

AMIN M. NASSAR MAGDI S. MAHMOUD 《International journal of systems science》2013,44(9):1279-1292

The effect of using fixed-point arithmetic in the digital implementation of two-level control algorithms is examined in this paper. Analytical expressions are developed to predict the change in the expected minimum cost and associated matrices. It is shown that there is a favourable match between the analytical predictions and averaged simulation experiments. The use of finite-precision machines increases the expected theoretical minimum cost and makes the two-level algorithm become slow and thus require excessive iterations to converge. 相似文献

8.

Stochastic control of paging in a two-level computer memory

O.I. Aven Ya.A. Kogan 《Automatica》1975,11(3):309-313

A paging memory system of a computer is formulated as a Markovian decision process on an infinite time period. A compact expression for a performance index is derived and a computational procedure is given which makes it possible to determine the optimal performance index and optimal paging algorithm. Sufficient conditions are proposed for a given set of pages to be nonreplacable for an optimal algorithm. Numerical results are obtained for comparing paging algorithms for moderate size programs. 相似文献

9.

Two branch-and-bound algorithms for the robust parallel machine scheduling problem

Mohammad Ranjbar Morteza Davari 《Computers & Operations Research》2012,39(7):1652-1660

Uncertainty is an inevitable element in many practical production planning and scheduling environments. When a due date is predetermined for performing a set of jobs for a customer, production managers are often concerned with establishing a schedule with the highest possible confidence of meeting the due date. In this paper, we study the problem of scheduling a given number of jobs on a specified number of identical parallel machines when the processing time of each job is stochastic. Our goal is to find a robust schedule that maximizes the customer service level, which is the probability of the makespan not exceeding the due date. We develop two branch-and-bound algorithms for finding an optimal solution; the two algorithms differ mainly in their branching scheme. We generate a set of benchmark instances and compare the performance of the algorithms based on this dataset. 相似文献

10.

On the hybridization of memetic algorithms with branch-and-bound techniques.

José E Gallardo Carlos Cotta Antonio J Fernández 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(1):77-83

Branch-and-bound (BnB) and memetic algorithms represent two very different approaches for tackling combinatorial optimization problems. However, these approaches are compatible. In this correspondence, a hybrid model that combines these two techniques is considered. To be precise, it is based on the interleaved execution of both approaches. Since the requirements of time and memory in BnB techniques are generally conflicting, a truncated exact search, namely, beam search, has opted to be carried out. Therefore, the resulting hybrid algorithm has a heuristic nature. The multidimensional 0-1 knapsack problem and the shortest common supersequence problem have been chosen as benchmarks. As will be shown, the hybrid algorithm can produce better results in both problems at the same computational cost, especially for large problem instances. 相似文献

11.

Designing linear algebra algorithms on the IBM 3090 vector multiprocessor with a hierarchical memory system

G. Radicati Y. Robert P. Sguazzero 《Calcolo》1988,25(1-2):153-167

The IBM 3090 is a vector multiprocessor with a hierarchical memory system. We show with two examples (the LU and Householder factorizations) that the complex memory system and the vector hardware can be used efficiently by recasting the basic algorithms in terms of high-level matrix-matrix modules. 相似文献

12.

Efficient PRAM simulation on a distributed memory machine

R. M. Karp M. Luby F. Meyer auf der Heide 《Algorithmica》1996,16(4-5):517-542

We present algorithms for the randomized simulation of a shared memory machine (PRAM) on a Distributed Memory Machine (DMM). In a PRAM, memory conflicts occur only through concurrent access to the same cell, whereas the memory of a DMM is divided into modules, one for each processor, and concurrent accesses to the same module create a conflict. Thedelay of a simulation is the time needed to simulate a parallel memory access of the PRAM. Any general simulation of anm processor PRAM on ann processor DMM will necessarily have delay at leastm/n. A randomized simulation is calledtime-processor optimal if the delay isO(m/n) with high probability. Using a novel simulation scheme based on hashing we obtain a time-processor optimal simulation with delayO(log log(n) log*(n)). The best previous simulations use a simpler scheme based on hashing and have much larger delay: (log(n)/log log(n)) for the simulation of an n processor PRAM on ann processor DMM, and (log(n)) in the case where the simulation is time-processor optimal.Our simulations use several (two or three) hash functions to distribute the shared memory among the memory modules of the PRAM. The stochastic processes modeling the behavior of our algorithms and their analyses based on powerful classes of universal hash functions may be of independent interest.Research partially supported by NSF/DARPA Grant CCR-9005448. Work was done while at the University of California at Berkeley and the International Computer Science Institute, Berkeley, CA.Research partially supported by National Science Foundation Operating Grant CCR-9016468, National Science Foundation Operating Grant CCR-9304722, United States-Israel Binational Science Foundation Grant No. 89-00312, United States-Israel Binational Science Foundation Grant No. 92-00226, and ESPRIT BR Grant EC-US 030.Part of work was done during a visit at the International Computer Science Institute at Berkeley; supported in part by DFG-Forschergruppe Effiziente Nutzung massiv paralleler Systeme, Teilprojekt 4, and by the Esprit Basic Research Action Nr. 7141 (ALCOM II). 相似文献

13.

Efficient algorithms for system diagnosis with both processor andcomparator faults

Chen Y. Bucken W. Echtle K. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(4):371-381

For the comparison-based self-diagnosis of multiprocessor systems, an extended model that considers both processor and comparator faults is presented. It is shown that in this model the system diagnosability is t⩽Zδ/2Z, where δ is the minimum vertex degree of the system graph. However, if the number of faulty comparators is assumed not to exceed the number of faulty processors, the diagnosability of the model reaches t⩽δ. An optimal O(|E|) algorithm, where E is the set of comparators, is given for identifying all faulty processors and comparators, provided that the total number of faulty components does not exceed the system diagnosability, and an O(|E|)² algorithm for the case t⩽δ is also presented. These efficient algorithms determine the faulty processors by calculating each processor's weight, which is mainly defined by the number of adjacent relative tests stating `agreement'. After sorting the processors according to their weights, the algorithms determine all faulty components by separating the sorted processor list 相似文献

14.

Efficient preflow push algorithms

R. Cerulli M. Gentili A. Iossa 《Computers & Operations Research》2008

Algorithms for the maximum flow problem can be grouped into two categories: augmenting path algorithms [Ford LR, Fulkerson DR. Flows in networks. Princeton University Press: Princeton, NJ: 1962], and preflow push algorithms [Goldberg AV, Tarjan RE. A new approach to the maximum flow problem. In: Proceedings of the 18th annual ACM symposium on theory of computing, 1986; p. 136–46]. Preflow push algorithms are characterized by a drawback known as ping pong effect. In this paper we present a technique that allows to avoid such an effect and can be considered as an approach combining the augmenting path and preflow push methods. An extended experimentation shows the effectiveness of the proposed approach. 相似文献

15.

Efficient mapping and implementation of matrix algorithms on a hypercube

Vladimir Cherkassky Ross Smith 《The Journal of supercomputing》1988,2(1):7-27

It is well known that parallelism by itself does not lead to higher speeds. This study shows how to put parallelism to best use, that is, how to find an optimal balance between communication and computation overheads for two parallel matrix algorithms. The problem graph for matrix algorithms analyzed in this paper is a two-dimensional grid (toroidal mesh) which is mapped onto a hypercube topology. To perform matrix operations on a hypercube, a matrix is partitioned into several submatrices which are stored and manipulated in the nodes. We seek to find an optimal matrix partitioning to minimize overall execution time. The NCUBE parallel machine is used for experimental performance evaluation. For matrix multiplication, we derive an exact analytical model to determine the optimal partitioning size and perform its experimental verification on the NCUBE parallel processor. For a parallel Gaussian elimination known as the balanced algorithm, we present performance measurements and an approximate analytical model for performance evaluation. Our analyses show that the optimal submatrix size is typically small and does not depend on the original matrix size. 相似文献

16.

Efficient adaptive collect algorithms

Yehuda Afek Yaron De Levie 《Distributed Computing》2007,20(3):221-238

Deterministic collect algorithms are presented that are adaptive to total contention and are efficient with respect to both the number of registers used and the step complexity. One of them has optimal O(k) step and O(n) space complexities, but assumes that processes’ identifiers are in O(n), where n is the total number of processes in the system and k is the total contention. The step complexity of an unrestricted name space variant of this algorithm remains O(k), but its space complexity increases to O(n ²). 相似文献

17.

Efficient algorithms for computing two nearest-neighbor problems on a rap

Tzong-Wann Kao Shi-Jinn Horng 《Pattern recognition》1994,27(12):1707-1716

This paper makes an improvement of computing two nearest-neighbor problems of images on a reconfigurable array of processors (RAP) by increasing the bus width between processors. Based on a base-n system, a constant time algorithm is first presented for computing the maximum/minimum of N log N-bit unsigned integers on a RAP using N processors each with N^1/c-bit bus width, where c is a constant and c ≥ 1. Then, two basic operations such as image component labeling and border following are also derived from it. Finally, these algorithms are used to design two constant time algorithms for the nearest neighbor black pixel and the nearest neighbor component problems on an N^1/2 × N^1/2 image using N^1/2 × N^1/2 processors each with N^1/c-bit bus width, where c is a constant and c ≥ 1. Another contribution of this paper is that the execution time of the proposed algorithms is tunable by the bus width. 相似文献

18.

Efficient algorithms for a class of partitioning problems

Ashraf Iqbal M. Bokhari S.H. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):170-175

We address the problem of optimally partitioning the modules of chain- or tree-like tasks over chain-structured or host-satellite multiple computer systems. This important class of problems includes many signal processing and industrial control applications. Prior research has resulted in a succession of faster exact and approximate algorithms for these problems. We describe polynomial exact and approximate algorithms for this class that are better than any of the previously reported algorithms. Our approach is based on a preprocessing step that condenses the given chain or tree structured task into a monotonic chain or tree. The partitioning of this monotonic task can then be carried out using fast search techniques 相似文献

19.

Localization of edges in graph models of two-level algorithms

N. A. Likhoded P. I. Sobolevskii A. A. Tiunchik 《Cybernetics and Systems Analysis》1999,35(6):895-902

Multidimensional computational models of two-level algorithms are introduced and investigated. Transformations of graph models of the algorithms are developed, which allow one to obtain modified models without global edges. The modified graph models can be transformed by the well-known transformation and mapping procedures into one-, two-, and three-dimensional array processors without global interconnections. Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 63–71, November–December, 1999. 相似文献

20.

Efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers

Chun-Yuan Lin Yeh-Ching Chung Jen-Shiuh Liu 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):625-639

Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases. 相似文献