期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Random Data Accesses on a Coarse-Grained Parallel Machine I. One-to-One Mappings

《Journal of Parallel and Distributed Computing》1997,44(1):14-23

This paper describes deterministic communication-efficient algorithms for performing dynamic permutations on a coarse-grained parallel machine. Our analysis shows that the general permutation operation can be completed inCμn/p(+ lower order terms) time and is optimal and scalable providedn⪢p³+p²τ/μ (nis the size of the permutation or the number of elements distributed across thepprocessors, τ is the start-up overhead, and 1/μ is the data transfer rate).Cis a small constant typically between 2 and 3 for write permutations, slightly higher for read permutations. Modifications to exploit locality of access are presented. Special classes of permutations that are optimal for smaller sizes are also described. A companion paper [22] deals with the problem of random data accesses with hot spots. 相似文献

2.

Fast Generation of Random Permutations Via Networks Simulation

A. Czumaj P. Kanarek M. Kutylowski K. Lorys 《Algorithmica》1998,21(1):2-20

We consider the problem of generating random permutations with uniform distribution. That is, we require that for an arbitrary permutation π of n elements, with probability 1/n! the machine halts with the i th output cell containing π(i) , for 1 ≤ i ≤ n . We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n ^1+o(1) ) processors on the CREW PRAM. This is the first o(log n) -time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm outperforms each of the previously known algorithms for the exclusive write PRAMs. The common and novel feature of both our algorithms is first to design a suitable random switching network generating a permutation and then to simulate this network on the PRAM model in a fast way. Received November 1996; revised March 1997. 相似文献

3.

Minimizing total weighted flow time under uncertainty using dominance and a stability box

Yu.N. Sotskov T.-C. Lai 《Computers & Operations Research》2012,39(6):1271-1289

We consider an uncertain single-machine scheduling problem, in which the processing time of a job can take any real value on a given closed interval. The criterion is to minimize the total weighted flow time of the n jobs, where there is a weight associated with a job. We calculate a number of minimal dominant sets of the job permutations (a minimal dominant set contains at least one optimal permutation for each possible scenario). We introduce a new stability measure of a job permutation (a stability box) and derive an exact formula for the stability box, which runs in O(n log n) time. We investigate properties of a stability box. These properties allow us to develop an O(n²)-algorithm for constructing a permutation with the largest volume of a stability box. If several permutations have the largest volume of a stability box, the developed algorithm selects one of them due to a simple heuristic. The efficiency of the constructed permutation is demonstrated on a large set of randomly generated instances with 10≤n≤1000. 相似文献

4.

O(n) routing in rearrangeable networks

《Journal of Systems Architecture》2000,46(6):529-542

In (2n−1)-stage rearrangeable networks, the routing time for any arbitrary permutation is Ω(n²) compared to its propagation delay O(n) only. Here, we attempt to identify the sets of permutations, which are routable in O(n) time in these networks. We define four classes of self-routable permutations for Benes network. An O(n) algorithm is presented here, that identifies if any permutation P belongs to one of the proposed self-routable classes, and if yes, it also generates the necessary control vectors for routing P. Therefore, the identification, as well as the switch setting, both problems are resolved in O(n) time by this algorithm. It covers all the permutations that are self-routable by anyone of the proposed techniques. Some interesting relationships are also explored among these four classes of permutations, by applying the concept of ‘group-transformations’ [N. Das, B.B. Bhattacharya, J. Dattagupta, Hierarchical classification of permutation classes in multistage interconnection networks, IEEE Trans. Comput. (1993) 665–677] on these permutations. The concepts developed here for Benes network, can easily be extended to a class of (2n−1)-stage networks, which are topologically equivalent to Benes network. As a result, the set of permutations routable in a (2n−1)-stage rearrangeable network, in a time comparable to its propagation delay has been extended to a large extent. 相似文献

5.

An Optimal Shortest Path Parallel Algorithm for Permutation Graphs

《Journal of Parallel and Distributed Computing》1995,24(1):94-99

We present an optimal parallel algorithm for the single-source shortest path problem for permutation graphs. The algorithm runs in O(log n) time using O(n/log n) processors on an EREW PRAM. As an application, we show that a minimum connected dominating set in a permutation graph can be found in O(log n) time using O(n/log n) processors. 相似文献

6.

Parallel permutation generation on linear array

《国际计算机数学杂志》2012,89(3-4):113-121

Given n items, a parallel algorithm for generating all the n! permutations is presented. The computational model used is a linear array which consists of n identical processing elements with a simple structure. One permutation is produced at each other time step. The elapsed time to produce a permutation is independent of the integer n. The basic idea used is the iterative method and the exchange of two consecutive components in an existing permutation. The design procedures of this algorithm are considered in detail. The ranking and unranking functions of the required permutations are also discussed. 相似文献

7.

Optimal routing algorithms for mesh-connected processor arrays

Sanguthevar Rajasekaran Thanasis Tsantilas 《Algorithmica》1992,8(1-6):21-38

We show that there is a randomizedoblivious algorithm for routing any (partial) permutation on ann ×n grid in 2n +O(logn) parallel communication steps. The queues will not grow larger than Θ(logn/log logn) with high probability. We then modify this to obtain a (nonoblivious) algorithm with the same running time such that the size of the queues is bounded by a constant with high probability. For permutations withlocality, where each packet has to travel a distance at mostL, a generalization of the algorithm routes in time proportional toL with high probability. Finally, we identify a class of meshlike networks that have optimal or near-optimal diameter. These meshes have the potential of being adapted to run existing sorting and routing algorithms with corresponding reduction in their running times. 相似文献

8.

Minimizing total weighted completion time with uncertain data: A stability approach

Yu. N. Sotskov N. G. Egorova F. Werner 《Automation and Remote Control》2010,71(10):2038-2057

A single-machine scheduling problem is investigated provided that the input data are uncertain: The processing time of a job can take any real value from the given segment. The criterion is to minimize the total weighted completion time for the n jobs. As a solution concept to such a scheduling problem with an uncertain input data, it is reasonable to consider a minimal dominant set of job permutations containing an optimal permutation for each possible realization of the job processing times. To find an optimal or approximate permutation to be realized, we look for a permutation with the largest stability box being a subset of the stability region. We develop a branch-and-bound algorithm to construct a permutation with the largest volume of a stability box. If several permutations have the same volume of a stability box, we select one of them due to one of two simple heuristics. The efficiency of the constructed permutations (how close they are to a factually optimal permutation) and the efficiency of the developed software (average CPU-time used for an instance) are demonstrated on a wide set of randomly generated instances with 5 ≤ n ≤ 100. 相似文献

9.

Towards understanding optimal MIMD queueless routing of arbitrary permutations on hypercubes

Jean-Pierre Jung Ibrahima Sakho 《The Journal of supercomputing》2012,61(3):371-393

This paper addresses the problem of the offline routing of arbitrary permutations on hypercubes under the MIMD queueless communication constraints which imposes that only one message can be located in each node of the hypercube right through the routing. According to e-cube routing, this kind of communication may require in the worst cases at least n exchanges steps on an n-dimensional hypercube. It has been conjectured that in the general case n steps suffice. The conjecture has been proved either by enumeration or by program for the particular cases of n??3. We revisit the problem through the k-partitioning paradigm based on maximum matching of bipartite graphs concept to take advantage of the recursive structure of the hypercube topology. The paradigm consists in looking for each message a transition node distant of at most say k<n hops from its current node such that all the messages can be routed to their transition nodes in k hops then finally routed to their final destination nodes in n?k hops on two disjoints hypercubes. In others words, the paradigm consists to determine an upstream permutation routable in k steps and which leads to two independent downstream permutations routable in n?k steps on two disjoint hypercubes. With this purpose, we give a characterization of the non-1-partitionable permutations from which the proof of the conjecture comes straightforwardly for n??2 and the non-1-partitionable permutations can be built whatever n may be. For n=3, we thus confirm the existence of exactly three classes of non-1-partitionable permutations and prove that there are two classes of upstream permutations to avoid when 2-partitioning any non-1-partitionable permutation. The process to avoid such upstream permutations is presented, and leads to a formal proof of the conjecture for n=3. For n>3, experiences gained in routing permutations on 4D-hypercubes allow conjecturing that in these cases, too, any non-1-partitionable permutation is 2-partitionable. Indeed, the analysis carried out for the 3D-hypercubes is repeatable to identify, certainly more laboriously given their combinatory, the upstream permutations to avoid when 2-partitioning a non-1-partitionable permutation on a 4D-hypercube. The proof that resulting downstream permutations on a 3D-hypercube can be routed in 2 steps is a consequence of the fact that for n=2 and 3 any permutation on a nD-hypercube can be routed in n steps. 相似文献

10.

Fault-Tolerant Permutation Routing in Hypercubes

《Journal of Parallel and Distributed Computing》1993,17(4):277-281

This paper introduces a deterministic mechanism for fault-tolerant permutation routing on an n-cube with less than n processor and/or link faults, using O(n) steps and constant queue size. The basic approach is to modify an existing route of a given permutation to avoid the faulty processors and/or faulty links, yet only incurring a constant factor slowdown in communication by ensuring that, at each step of the routing, every message either stays where it is or is sent to a nonfaulty neighbor, each link has at most one message traversing it in each direction, and the routing uses constant queue size. The existing route used in this paper is Benes routes. However, our method can be applied to any routing method where, in each step, all messages use links across the same dimension. The same approach can also be applied to online routing based on Batcher′s bitonic sorting to avoid faults. 相似文献

11.

Sorting Unsigned Permutations by Weighted Reversals,Transpositions, and Transreversals

下载免费PDF全文

Xiao-Wen Lou Da-Ming Zhu 《计算机科学技术学报》2010,25(4):853-863

Reversals, transpositions and transreversals are common events in genome rearrangement. The genome rearrangement sorting problem is to transform one genome into another using the minimum number of given rearrangement operations. An integer permutation is used to represent a genome in many cases. It can be divided into disjoint strips with each strip denoting a block of consecutive integers. A singleton is a strip of one integer. And the genome rearrangement problem turns into the problem of sorting a permutation into the identity permutation equivalently. Hannenhalli and Pevzner designed a polynomial time algorithm for the unsigned reversal sorting problem on those permutations with O(log n) singletons. In this paper, first we describe one case in which Hannenhalli and Pevzner’s algorithm may fail and propose a corrected approach. In addition, we propose a (1+ε)-approximation algorithm for sorting unsigned permutations with O(log n) singletons by reversals of weight 1 and transpositions/transreversals of weight 2. 相似文献

12.

Routing Permutations on Hypercube Machines with Half-Duplex Links

《Journal of Parallel and Distributed Computing》1994,20(1):14-19

Algorithms are presented for realizing permutations on a less restrictive hypercube model called the S-MIMD (synchronous MIMD), which allows at most one data transfer on a given communication link at a given time instant, and where data movements are not restricted to a single dimension at a given time. First, an optimal algorithm for bit-permute permutations is developed from a very simple realization of the shuffle on a 3-cube; this algorithm needs 2⌊n/2⌋ routing steps on an n-dimensional hypercube. The technique is then extended to an optimal algorithm for bit-permute-complement permutations, one that needs n routing steps. Also, algorithms are sketched for routing permutations in the classes Ω and Ω⁻¹ in 3⌈n/2⌉ routing steps, yielding an off-line algorithm for routing arbitrary permutations in 3n steps. 相似文献

13.

Average-case analysis of QuickSort and Binary Insertion Tree height using incompressibility

Brendan Lucier Tao Jiang Ming Li 《Information Processing Letters》2007,103(2):45-51

We study the Kolmogorov complexity of a Binary Insertion Tree, and present a succinct encoding scheme for Binary Insertion Trees produced from incompressible permutations. Based on the encoding scheme, we obtain a simple incompressibility argument that yields an asymptotic analysis of the average height of a Binary Insertion Tree. This argument further implies that the QuickSort algorithm sorts a permutation of n elements in Θ(nlgn) comparisons on average. 相似文献

14.

Efficient Parallel Computations on the Reduced Mesh of Trees Organization

《Journal of Parallel and Distributed Computing》1994,20(2):121-135

Optimal and near optimal parallel algorithms for several fundamental problems are proposed for a parallel organization consistmg of n processors, each having access to a row and a column of an n × n array of memory modules. Parallel computations are implemented on such an organization by decomposing them into alternating orthogonal processing phases. A number of efficient data movement techniques are developed for the proposed organization which lead to optimal or near optimal solutions to several communication-intensive problems such as sorting, performing permutations, list ranking (data dependent parallel prefix), and problems on graphs represented by an unsorted list of n² edges. It is also shown that the proposed organization is capable of simulating any fixed-degree network on n² processors with O(n) loss in time, which is optimal. Finally, an enhanced organization having p processors, 1 ≤ p ≤ n², and O(n²) memory locations is presented, and is shown to provide optimal speedups for adjacency-matrix based graph problems for any number of processors in the range [1, n^3/2]. 相似文献

15.

Renaming and Dispersing: Techniques for Fast Load Balancing

《Journal of Parallel and Distributed Computing》1994,23(2):149-157

Consider the following situation: n processors of a PRAM are given n independent tasks. Each task can be executed in constant time by a single processor. The distribution of tasks among the processors is unknown; each processor has information only about its set of tasks. The batch execution problem is to reschedule the tasks so that the quickest execution of all tasks is achieved. This problem captures some basic cooperation obstacles of the PRAM model since, without rescheduling overhead, all the tasks can be completed in O(1) time. The solution presented in this paper is an O(lg lg n) time load balancing algorithm which achieves, with overwhelming probability, an almost flat distribution, i.e., O(1) tasks for each processor. We introduce two novel techniques which are of an independent interest: renaming-a randomized scheme for an approximate compaction, and dispersing-a gather-scatter paradigm for distribution smoothing. The model of computation used in the CRCW PRAM. The concurrent-write submodel is ROBUST; i.e., if two or more processors write into the same cell at the same time then no prediction can be made about the cell contents. 相似文献

16.

Whole mirror duplication-random loss model and pattern avoiding permutations

Jean-Luc Baril Rémi Vernay 《Information Processing Letters》2010,110(11):474-480

In this paper we study the problem of the whole mirror duplication-random loss model in terms of pattern avoiding permutations. We prove that the class of permutations obtained with this model after a given number p of duplications of the identity is the class of permutations avoiding the alternating permutations of length p²+1. We also compute the number of duplications necessary and sufficient to obtain any permutation of length n. We provide two efficient algorithms to reconstitute a possible scenario of whole mirror duplications from identity to any permutation of length n. One of them uses the well-known binary reflected Gray code (Gray, 1953) [10]. Other relative models are also considered. 相似文献

17.

An improved parallel algorithm for integer GCD

Benny Chor Oded Goldreich 《Algorithmica》1990,5(1-4):1-10

We present a simple parallel algorithm for computing the greatest common divisor (gcd) of twon-bit integers in the Common version of the CRCW model of computation. The run-time of the algorithm in terms of bit operations isO(n/logn), usingn ^1+? processors, where ? is any positive constant. This improves on the algorithm of Kannan, Miller, and Rudolph, the only sublinear algorithm known previously, both in run time and in number of processors; they requireO(n log logn/logn),n ² log² n, respectively, in the same CRCW model. We give an alternative implementation of our algorithm in the CREW model. Its run-time isO(n log logn/logn), usingn ^1+? processors. Both implementations can be modified to yield the extended gcd, within the same complexity bounds. 相似文献

18.

Subcomplete generalizations of graph isomorphism

Christoph M. Hoffmann 《Journal of Computer and System Sciences》1982,25(3):332-359

We present eight group-theoretic problems in NP one of which is a reformulation of graph isomorphism. We give technical evidence that none of the problems is NP-complete, and give polynomial time reductions among the problems. There is a good possibility that seven of these problems are harder than graph isomorphism (relative to polynomial time reduction), so that they might be examples of natural problems of intermediate difficulty, situated properly between the class of NP-complete problems and the class P of problems decidable in deterministic polynomial time. Because of strong structural similarity, two of the apparently harder problems can be interpreted as generalized isomorphism and generalized automorphism, respectively. Whether these problems ultimately prove to be harder than graph isomorphism seems to depend, in part, on the open problem whether every permutation group of degree n arises as the automorphism group of a combinatorial structure of size polynomial in n. Finally, we give an O(n² · k) algorithm for constructing the centralizer of a permutation group of degree n presented by a generating set of k permutations. Note that we may assume that k is O(n · log n), without loss of generality. This problem is a special case of one of the harder group-theoretic problems. From the method of constructing the centralizer, we recover results about the group-theoretic structure of the centralizer. Furthermore, applying our algorithm for intersecting with a normalizing permutation group, we show how to find the center of a permutation group of degree n in O(n⁶) steps, having constructed the centralizer of the group first. 相似文献

19.

Efficient Parallel Algorithms for Permutation Graphs

《Journal of Parallel and Distributed Computing》1995,26(1):116-124

In this paper, we present optimal O(log n) time, O(n/log n) processor EREW PRAM parallel algorithms for finding the connected components, cut vertices, and bridges of a permutation graph. We also present an O(log n) time, O(n) processor, CREW PRAM model parallel algorithm for finding a Breadth First Search (BFS) spanning tree of a permutation graph rooted at vertex 1 and use the same to derive an efficient parallel algorithm for the All Pairs Shortest Path problem on permutation graphs. 相似文献

20.

Linear structure of bipartite permutation graphs and the longest path problem

Ryuhei Uehara Gabriel Valiente 《Information Processing Letters》2007,103(2):71-77

The class of bipartite permutation graphs is the intersection of two well known graph classes: bipartite graphs and permutation graphs. A complete bipartite decomposition of a bipartite permutation graph is proposed in this note. The decomposition gives a linear structure of bipartite permutation graphs, and it can be obtained in O(n) time, where n is the number of vertices. As an application of the decomposition, we show an O(n) time and space algorithm for finding a longest path in a bipartite permutation graph. 相似文献