期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Branch‐and‐bound approach for optima localization in scheduling multiprocessor jobs

Alexander Kononov Polina Kononova Alexander Gordeev 《International Transactions in Operational Research》2020,27(1):381-393

We consider multiprocessor task scheduling problems with dedicated processors. We determine the tight optima localization intervals for different subproblems of the basic problem. Based on the ideas of a computer‐aided technique developed by Sevastianov and Tchernykh for shop scheduling problems, we elaborate a similar method for the multiprocessor task scheduling problem. Our method allows us to find an upper bound for the length of the optimal schedule in terms of natural lower bound. As a byproduct of our results, a family of linear‐time approximation algorithms with a constant ratio performance guarantee is designed for the NP‐hard subproblems of the basic problem, and new polynomially solvable classes of problems are found. 相似文献

2.

线性网络上分布式任务调度算法 总被引：1，自引：0，他引：1

孙广中陈国良陈辰许舸张铮《计算机研究与发展》2003,40(10):1476-1481

针对一种已有的分布式计算理论模型(单位长度的任务由处理器独立产生，没有全局控制，彼此通信需要花费时间)，研究了在线性网络上的任务有效调度问题．通过考虑算法中任务处理时间和通信时间之间的平衡，给出了一个近似比为5．88的分布式算法，该算法无需全局信息，且处理策略简单．对该问题的近似比下界也做了研究，证明了该问题不存在近似比小于1．16的算法．相似文献

3.

On the worst-case ratio of a compound multiprocessor scheduling algorithm

《Information Processing Letters》1987,25(6):389-396

The basic problem of nonpreemptive scheduling of independent tasks on identical processors is studied. The well-known heuristics LPT and Multifit are combined to an algorithm Mix which has a better worst-case ratio than each of its components. Exact ratios for the case of two and three processors are given. 相似文献

4.

An Approximation Algorithm and Dynamic Programming for Reduction in Heterogeneous Environments

Pangfeng Liu May-Chen Kuo Da-Wei Wang 《Algorithmica》2009,53(3):425-453

Network of workstation (NOW) is a cost-effective alternative to massively parallel supercomputers. As commercially available off-the-shelf processors become cheaper and faster, it is now possible to build a cluster that provides high computing power within a limited budget. However, a cluster may consist of different types of processors and this heterogeneity complicates the design of efficient collective communication protocols. For example, it is a very hard combinatorial problem to find an optimal reduction schedule for such heterogeneous clusters. Nevertheless, we show that a simple technique called slowest-node-first (SNF) is very effective in designing efficient reduction protocols for heterogeneous clusters. First, we show that SNF is actually a 2-approximation algorithm, which means that an SNF schedule length is always within twice of the optimal schedule length, no matter what kind of cluster is given. In addition, we show that SNF does give the optimal reduction time when the cluster consists of two types of processors, when the ratio of communication speed between them is at least two. When the communication speed ratio is less than two, we develop a dynamic programming technique to find the optimal schedule. Our dynamic programming utilizes the monotone property of the objective function, and can significantly reduce the amount of computation time. Finally, combined with an approximation algorithm for broadcast 2004, we propose an all-reduction algorithm which sends the reduction answer to all processors, with approximation ratio 3.5. We conduct three groups of experiments. First, we show that SNF performs better than the built-in MPI_Reduce in a test cluster. Second, we observe a factor of 93 times saving in computation time to find the optimal schedule, when compared with a naive dynamic programming implementation. Thirdly, we apply the theoretical results to a branch-and-bound search and show that they can reduce the search time of the optimal reduction schedule by a factor of 500, when the cluster has three kinds of processors. 相似文献

5.

测试集问题的集合覆盖贪心算法的深入近似 总被引：1，自引：0，他引：1

崔鹏刘红静《软件学报》2006,17(7):1494-1500

测试集问题是一个有着广泛应用的NP难问题.集合覆盖贪心算法是测试集问题的一个常用近似算法,其由集合覆盖问题得到的近似比21nn+1能否改进是一个公开的问题.集合覆盖贪心算法的推广被用来求解生物信息学中出现的冗余测试集问题.通过分析条目对被区分次数的分布情况,用去随机方法证明了集合覆盖贪心算法对测试集问题的近似比可以为1.51nn+0.5lnlnn+2,从而缩小了这种算法近似比分析的间隙.另外,给出了集合覆盖贪心算法对冗余度为n-1的加权冗余测试集问题的近似比的紧密下界(2-o(1))lnn-Θ 1). 相似文献

6.

Approximation algorithms in partitioning real-time tasks with replications

Jian Lin Albert M. K. Cheng Gokhan Gercek 《International Journal of Parallel, Emergent and Distributed Systems》2018,33(2):211-232

Today is an era where multiprocessor technology plays a major role in designs of modern computer architecture. While multiprocessor systems offer extra computing power, it also opens a new range of opportunities to improve fault-robustness. This paper focuses on a problem of achieving fault-tolerance using replications in real-time, multiprocessor systems. In the problem, multiple replicas, or copies, of a computing task are executed on distinct processors to resist potential processor failures and computing faults. Two greedy, approximation heuristics, named Worst Fit Increasing K-Replication and First Fit Increasing K-Replication, are studied to maximise the number of real-time tasks assigned on a system with identical processors, respecting to the tasks’ replicating and timely requirements. Worst case performance is analysed by using an approximation ratio between the algorithms and an optimal solution. We mathematically prove that the ratios of using both algorithms are infinitely close to 2. Simulations are performed on a large set of testing cases which can be used to bring to light the average performance of using the algorithms in practice. The results show that both heuristic algorithms provide simple but fast and effective solutions to solve the problem. 相似文献

7.

Computational forces in the SAGE benchmark

Robert W. Numrich 《Journal of Parallel and Distributed Computing》2009

Dimensional analysis applied to a complicated timing formula for the SAGE benchmark yields new insight into the limits to scalability. A single surface, defined by two curvilinear coordinates, describes the parallel efficiency of the benchmark. Each machine, as a function of the number of processors, follows its own path on the surface determined by dimensionless ratios of hardware forces to software forces. Two machines with the same ratios follow the same path and are self-similar, even though the numerical value of each individual force may be different. For this benchmark, latency effects are unimportant relative to bandwidth effects because of the slab decomposition used to distribute the problem across processors. To a good first-order approximation, a single force ratio describes the efficiency as a function of the number of processors. A simpler model, with a single dimensionless exponent, describes the first-order behavior of the computational power as a function of the number of processors. 相似文献

8.

关于最小测试集的线性规划松弛近似

崔鹏刘红静《计算机科学》2005,32(10):157-159

目前最小测试集的最佳近似比是贪心算法的2 ln n＋o（1）.这个近似比能否改进是一个公开的问题.本文讨论了最小测试集的基于线性规划松弛的近似比证明方法的能力问题.我们证明最小测试集的整性间隙至少为0.72 ln n,而且最小测试集整性间隙的系数可以与最小集合覆盖的整性间隙的系数一样大.另外,我们说明加权最小测试集的贪心算法的近似比不能通过对偶拟合方法改进超过一个常数. 相似文献

9.

Approximation algorithms for general parallel task scheduling

Oh-Heum Kwon Kyung-Yong Chwa 《Information Processing Letters》2002,81(3):143-150

A general parallel task scheduling problem is considered. A task can be processed in parallel on one of several alternative subsets of processors. The processing time of the task depends on the subset of processors assigned to the task. We first show the hardness of approximating the problem for both preemptive and nonpreemptive cases in the general setting. Next we focus on linear array network of m processors. We give an approximation algorithm of ratio O(logm) for nonpreemptive scheduling, and another algorithm of ratio 2 for preemptive scheduling. Finally, we give a nonpreemptive scheduling algorithm of ratio O(log²m) for m×m two-dimensional meshes. 相似文献

10.

Performance modelling of three parallel sorting algorithms on a pipelined transputer network

V. Lakshmi Narasimhan J. Armstrong 《Concurrency and Computation》1996,8(5):335-355

The implementation of three parallel sorting algorithms, namely binary sort, odd-even transposition sort and bitonic sort, on a network of transputers is analysed in the paper. The variation in the performance of these algorithms as the number of processors and sort size are changed is investigated. Experimental results show that when up to eight transputers are used, connected as a linear pipeline configuration, all three algorithms can achieve reasonable speedup ratios. The bitonic sort, binary sort odd-even transposition sort achieve speedup ratios of 5, 4.4 and 4, respectively, when eight processors are used to sort 100,000 integers. Analytical models are derived which can be used to predict the performance of the three algorithms when a linear pipeline configuration is used. The predicted performance of the algorithms is compared with the experimental performance in order to validate the model. When the models are used to predict the performance using 16 transputers, it is found that the speedup does not significantly improve compared to the performance achieved with eight transputers. This shows that interprocessor communication has a significant effect on the algorithmic performance when a larger number of processors are used. The conclusions reinforce the fact that the binary and bitonic sorting algorithms are not well-suited to a linear pipeline configuration and that they may perform better if a different topology were used, for example a mesh or a cube connection scheme. Further, the analytical technique used for performance modelling as elaborated in the paper can be employed profitably for other multiprocessor systems as well. 相似文献

11.

An efficient implementation of parallel eigenvalue computation for massively parallel processing 总被引：4，自引：0，他引：4

Takahiro Katagiri Yasumasa Kanada 《Parallel Computing》2001,27(14):1831-1845

This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献

12.

On Hierarchical Diameter-Clustering and the Supplier Problem

Aparna Das Claire Kenyon-Mathieu 《Theory of Computing Systems》2009,45(3):497-511

Given a data set in a metric space, we study the problem of hierarchical clustering to minimize the maximum cluster diameter, and the hierarchical k-supplier problem with customers arriving online. We prove that two previously known algorithms for hierarchical clustering, one (offline) due to Dasgupta and Long and the other (online) due to Charikar, Chekuri, Feder and Motwani, output essentially the same result when points are considered in the same order. We show that the analyses of both algorithms are tight and exhibit a new lower bound for hierarchical clustering. Finally we present the first constant factor approximation algorithm for the online hierarchical k-supplier problem. 相似文献

13.

Energy efficient scheduling of real-time tasks on multi-core processors with voltage islands

《Future Generation Computer Systems》2016

This paper studies energy efficient scheduling of periodic real-time tasks on multi-core processors with voltage islands, in which cores are partitioned into multiple blocks (termed voltage islands) and each block has its own power source to supply voltage. Cores in the same block always operate at the same voltage level, but can be adjusted by using Dynamic Voltage and Frequency Scaling (DVFS). We propose a Voltage Island Largest Capacity First (VILCF) algorithm for energy efficient scheduling of periodic real-time tasks on multi-core processors. It achieves better energy efficiency by fully utilizing the remaining capacity of an island before turning on more islands or increasing the voltage level of the current active islands. We provide detailed theoretical analysis of the approximation ratio of the proposed VILCF algorithm in terms of energy efficiency. In addition, our experimental results show that VILCF significantly outperforms the existing algorithms when there are multiple cores in a voltage island. 相似文献

14.

Computational completeness of complete,star-like,and linear hybrid networks of evolutionary processors with a small number of processors

Artiom Alhazov Rudolf Freund Vladimir Rogozhin Yurii Rogozhin 《Natural computing》2016,15(1):51-68

A hybrid network of evolutionary processors (HNEP) is a graph where each node is associated with a special rewriting system called an evolutionary processor, an input filter, and an output filter. Each evolutionary processor is given a finite set of one type of point mutations (insertion, deletion or a substitution of a symbol) which can be applied to certain positions in a string. An HNEP rewrites the strings in the nodes and then re-distributes them according to a filter-based communication protocol; the filters are defined by certain variants of random-context conditions. HNEPs can be considered both as languages generating devices (GHNEPs) and language accepting devices (AHNEPs); most previous approaches treated the accepting and generating cases separately. For both cases, in this paper we show that five nodes are sufficient to accept (AHNEPs) or generate (GHNEPs) any recursively enumerable language by showing the more general result that any partial recursive relation can be computed by an HNEP with (at most) five nodes with the underlying graph structure for the communication between the evolutionary processors being the complete or the linear graph with five nodes, whereas with a star-like communication graph we need six nodes. If the final results are defined by only taking the terminal strings out of the designated output node, then for these extended HNEPs we can prove that only four nodes are needed in all cases—for computing any partial recursive relation as well as for generating and accepting any recursively enumerable language—and the underlying communication structure can be a complete or a linear graph, but now even a star-like graph, too. 相似文献

15.

受启动空间约束的装箱问题 总被引：1，自引：0，他引：1

顾晓东许胤龙陈国良黄刘生《软件学报》2002,13(3):390-397

提出了一种带有启动空间的约束装箱问题(start-up bin packing problem,简称SBPP),即不同类型的物品放入同一箱子中需要一个启动空间.该问题在工作分配、任务调度和日常生活中的包装等问题中有着广泛的应用背景.给出了一个求解SBPP的线性脱线算法C-NF,其最坏情况渐近性能比为2,与启动空间的大小无关.对该算法的平均性能进行了实验分析.另外,还分析了SBPP的在线特性,指出大量的经典在线装箱算法应用于SBPP都不存在确定的最坏情况渐近性能比,也给出了一种具有确定的最坏情况渐近性能比的在线算法. 相似文献

16.

The worst-case analysis of the Garey–Johnson algorithm

Claire Hanen Yakov Zinder 《Journal of Scheduling》2009,12(4):389-400

The Garey–Johnson algorithm is a well known polynomial-time algorithm constructing an optimal schedule for the maximum lateness problem with unit execution time tasks, two parallel identical processors, precedence constraints and release times. The paper is concerned with the worst-case analysis of a generalization of the Garey–Johnson algorithm to the case of arbitrary number of processors. In contrast to other algorithms for the maximum lateness problem, the tight performance guarantee for the even number of processors differs from the tight performance guarantee for the odd number of processors. 相似文献

17.

一种并行扫描计算局部调度算法

刘杰陈豆豆迟利华徐涵蒋杰胡庆丰《计算机工程与科学》2009,31(Z1)

为了解决优先级调度算法的可扩展性问题,本文设计并实现了一种局部的深度优先扫描算法(PDFHDS)。该算法在计算初始优先级和计算最终优先级时,对每个结点只遍历一次,在这一次遍历中只访问该结点的全部直接前驱,避免了在PDFDS算法中每修改一个结点的优先级就要访问其全部前驱结点的情况,减少了一部分计算开销,消息传递过程使用单向传递,只向前邻处理器传递有多级外部后继的网格点信息,而不传递只具有一级外部后继的网格点信息,节省了通信开销。从实验数据可知,虽然在处理器个数少的时候性能比不上DFHDS算法,但对于多处理器的情况,PDFDS算法的性能可以比DFHDS算法的提高50%,甚至更多。相似文献

18.

Approximation algorithm for constructing data aggregation trees for wireless sensor networks

Deying LI Qinghua ZHU Jiannong CAO 《Frontiers of Computer Science》2009,3(4):524

This paper considers the problem of constructing data aggregation trees in wireless sensor networks (WSNs) for a group of sensor nodes to send collected information to a single sink node. The data aggregation tree contains the sink node, all the source nodes, and some other non-source nodes. Our goal of constructing such a data aggregation tree is to minimize the number of non-source nodes to be included in the tree so as to save energies. We prove that the data aggregation tree problem is NP-hard and then propose an approximation algorithm with a performance ratio of four and a greedy algorithm. We also give a distributed version of the approximation algorithm. Extensive simulations are performed to study the performance of the proposed algorithms. The results show that the proposed algorithms can find a tree of a good approximation to the optimal tree and has a high degree of scalability. 相似文献

19.

Partitioned EDF scheduling on a few types of unrelated multiprocessors 总被引：1，自引：1，他引：0

Andreas Wiese Vincenzo Bonifaci Sanjoy Baruah 《Real-Time Systems》2013,49(2):219-238

A polynomial-time approximation scheme (PTAS) is derived for the partitioned EDF scheduling of implicit-deadline sporadic task systems upon unrelated multiprocessor platforms that are comprised of a constant number of distinct types of processors. This generalizes earlier results showing the existence of polynomial-time approximation schemes for the partitioned EDF scheduling of implicit-deadline sporadic task systems on (1) identical multiprocessor platforms, and (2) unrelated multiprocessor platforms containing a constant number of processors. 相似文献

20.

Parallel implementations of Brunotte’s algorithm

Antal Tátrai Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(4):565-572

In this paper the author make a comprehensive comparison of different parallelizations of a sequential number theoretic algorithm having large memory requirements. Brunotte’s algorithm is one of the currently known best methods for the decision of the canonical number system (or more generally shift radix system) property. Still, it can be very space-consuming in some cases. Pushing the algorithm to its limits may hopefully shed light on mathematical patterns that would otherwise not be discernible. The algorithm contains many n-dimensional vector operations and set operations like insert, find, clear, etc. The parallel algorithms encounter two difference kinds of concurrency problems. First, they need computationally intensive arithmetic vector operations, second, the set implementations require a huge amount of memory and general purpose processors. The algorithms described in this article are basically designed for two platforms. The first platform is a generic symmetric multiprocessing (SMP) architecture without any vector processor extension, the second is the Cell Broadband Engine. The SMP platforms have several general purpose processors in contrast with the Cell Broadband Engine where the processors have Synergistic vector processors. 相似文献