共查询到20条相似文献,搜索用时 0 毫秒
1.
GPU已被广泛应用于当前的高性能计算系统中,但其性能却受到程序运行时不同控制流方向的严重制约。这一问题通常通过动态Warp重组技术来解决,即将一个或多个Warp内沿相同控制流执行的线程组合在一起,构成一个新的Warp。但是,这类方法普遍存在一些不必要的重组,引入了较大的额外性能开销。分析了线程重组的性能开销,并提出了一种称作"部分重组"的性能优化方法。这种方法在保证重组效率的前提下,避免了对包含活跃线程数量较多的Warp的重组,从而有效减少了线程重组引入的性能开销。测试结果表明,部分重组能够在保证重组效率的前提下带来较为明显的性能提升。 相似文献
2.
针对堆栈处理器特殊架构,为提高实时性,引入多任务堆栈技术,采用Forth自生成器技术,提出一种基于堆栈处理器的抢占式与时间片轮转调度方法,实现了在Forth堆栈处理器中实时多任务的运行,弥补了Forth堆栈处理器在实时多任务操作系统方面的的不足.实验表明,与当前基于寄存器处理器的嵌入式Forth实时系统相比,本文方法在最大关中断时间、任务上下文切换时间和任务响应时间三项重要的实时任务性能指标方面,实时性能有明显提升,从而保证了Forth系统应用的高效性和安全性,满足人们对Forth堆栈处理器实时多任务操作系统方面的应用需求. 相似文献
3.
In this paper we address the problem of staff scheduling at check-in counters with time varying demand. The main objective is to minimize a cost function based on the assigned shifts for a given workforce subject to flexible labor regulations and flexible assignments of lunch breaks. To solve the problem we developed a branch and price algorithm that uses master variable branching. However, since convergence of the column generation subroutine was really slow, we integrated stabilization techniques to speed up the algorithm. We introduced a new dynamic parameter updating procedure for the stabilized column generation. Our computational results show the superior behavior of stabilized column generation compared to the non-stabilized version. Since slow convergence might occur at each node in the search tree and consequently reductions are realized at each node investigated. Furthermore, we perform an in-depth investigation of the updating parameters and give useful insights to choose them. Finally, we tackle realistic problem instances with up to 65 service workers and show the efficiency of the algorithm. 相似文献
4.
We focus on the problem of scheduling n independent jobs on m identical parallel machines with the objective of minimizing total tardiness of the jobs considering a job splitting property. In this problem, it is assumed that a job can be split into sub-jobs and these sub-jobs can be processed independently on parallel machines. We develop several dominance properties and lower bounds for the problem, and suggest a branch and bound algorithm using them. Computational experiments are performed on randomly generated test problems and results show that the suggested algorithm solves problems of moderate sizes in a reasonable amount of computation time. 相似文献
5.
在GPU中,一个warp内的所有线程在锁步中执行相同的指令。某些线程的内存请求可以得到快速处理,而其余请求会经历较长时间。在最慢的请求完成之前,warp不能执行下一条指令,导致内存发散。对GPU中warp间的异构性进行了研究,实现并优化了一种基于inter warp异构性的缓存管理机制和内存调度策略,以减少内存发散和缓存排队延迟的负面影响。根据缓存命中率将warp分类,以驱动后面的3个组件:(1)基于warp类型的缓存旁路技术组件,使低缓存利用率的warp进入旁路,不访问L2缓存;(2)基于warp类型的缓存插入/提升策略组件,防止来自高缓存利用率warp的数据被过早清除;(3)基于warp类型的内存控制器组件,优先处理从高缓存利用率的warp接收到的请求,并优先处理来自相同warp的请求。基于warp间异构性的缓存管理和内存调度机制在8种不同的GPGPU应用中,与基准GPU相比,平均加速18.0%。 相似文献
6.
In this paper, we consider a single batch machine scheduling problem with incompatible job families and dynamic job arrivals. The objective is to minimize the total completion time. This problem is known to be strongly NP-hard. We present several dominance properties and two types of lower bounds, which are incorporated to construct a basic branch and bound algorithm. Furthermore, according to the characteristics of dynamic job arrivals, a decomposed branch and bound algorithm is proposed to improve the efficiency. The proposed algorithms are tested on a large set of randomly generated problem instances. 相似文献
7.
Dennis Weyland Roberto Montemanni Luca Maria Gambardella 《Journal of Parallel and Distributed Computing》2013
In this work we propose a general metaheuristic framework for solving stochastic combinatorial optimization problems based on general-purpose computing on graphics processing units (GPGPU). This framework is applied to the probabilistic traveling salesman problem with deadlines (PTSPD) as a case study. Computational studies reveal significant improvements over state-of-the-art methods for the PTSPD. Additionally, our results reveal the huge potential of the proposed framework and sampling-based methods for stochastic combinatorial optimization problems. 相似文献
8.
Radhouan Bouabda Bassem Jarboui Mansour Eddaly Abdelwaheb Rebaï 《Computers & Operations Research》2011
This paper addresses the permutation flowline manufacturing cell with sequence dependent family setup times problem with the objective to minimize the makespan criterion. We develop a cooperative approach including a genetic algorithm and a branch and bound procedure. The latter is probabilistically integrated in the genetic algorithm in order to enhance the current solution. Moreover, the application of the branch and bound algorithm is based upon the decomposition of the problem into subproblems. The performance of the proposed method is tested by numerical experiments on a large number of representative problems. 相似文献
9.
As cloud computing evolves, it is becoming more and more apparent that the future of this industry lies in interconnected cloud systems where resources will be provided by multiple “Cloud” providers instead of just one. In this way, the hosts of services that are cloud-based will have access to even larger resource pools while at the same time increasing their scalability and availability by diversifying both their computing resources and the geographical locations where those resources operate from. Furthermore the increased competition between the cloud providers in conjunction with the commoditization of hardware has already led to large decreases in the cost of cloud computing and this trend is bound to continue in the future. Scientific focus in cloud computing is also headed this way with more studies on the efficient allocation of resources and effective distribution of computing tasks between those resources. This study evaluates the use of meta-heuristic optimization algorithms in the scheduling of bag-of-tasks applications in a heterogeneous cloud of clouds. The study of both local and globally arriving jobs has been considered along with the introduction of sporadically arriving critical jobs. Simulation results show that the use of these meta-heuristics can provide significant benefits in costs and performance. 相似文献
10.
城市单元格划分算法应用于出租车调度方法时无法解决山川、河流、大型建筑等天然屏障对距离计算的负面影响,更无法确定网格之间的稳定性。针对此问题,提出了一种面向复杂地理环境的多层次网格划分算法,采用历史数据拟合出两点之间行驶距离的精确值,替代了传统路径计算方法,为距离计算问题提供了新的解决思路,多层次划分更使网格的稳定性得以保证。再结合线性规划方法,辅以时序图和散点图等时空特征识别技术,以高效益和负载均衡为目标,对空载出租车进行实时调度。实验结果表明,该方法提升了整体效益,缩短了乘客打车等待时间,达到了经济效益和社会效益的双提升,具有普适性和广泛的应用前景。 相似文献
11.
George KousiourisAuthor Vitae Tommaso CucinottaAuthor Vitae 《Journal of Systems and Software》2011,84(8):1270-1291
The aim of this paper is to study and predict the effect of a number of critical parameters on the performance of virtual machines (VMs). These parameters include allocation percentages, real-time scheduling decisions and co-placement of VMs when these are deployed concurrently on the same physical node, as dictated by the server consolidation trend and the recent advances in the Cloud computing systems. Different combinations of VM workload types are investigated in relation to the aforementioned factors in order to find the optimal allocation strategies. What is more, different levels of memory sharing are applied, based on the coupling of VMs to cores on a multi-core architecture. For all the aforementioned cases, the effect on the score of specific benchmarks running inside the VMs is measured. Finally, a black box method based on genetically optimized artificial neural networks is inserted in order to investigate the degradation prediction ability a priori of the execution and is compared to the linear regression method. 相似文献
12.
Tabu search (TS) algorithms are among the most effective approaches for solving the job shop scheduling problem (JSP) which is one of the most difficult NP-complete problems. However, neighborhood structures and move evaluation strategies play the central role in the effectiveness and efficiency of the tabu search for the JSP. In this paper, a new enhanced neighborhood structure is proposed and applied to solving the job shop scheduling problem by TS approach. Using this new neighborhood structure combined with the appropriate move evaluation strategy and parameters, we tested the TS approach on a set of standard benchmark instances and found a large number of better upper bounds among the unsolved instances. The computational results show that for the rectangular problem our approach dominates all others in terms of both solution quality and performance. 相似文献
13.
We study an online weighted interval scheduling problem on a single machine, where all intervals have unit length and the objective is to maximize the total weight of all completed intervals. We investigate how the function of finite lookahead improves the competitivities of deterministic online heuristics, under both preemptive and non-preemptive models. The lookahead model studied in this paper is that an online heuristic is said to have a lookahead ability of LD if at any time point it is able to foresee all the intervals to be released within the next LD units of time. We investigate both competitive online heuristics and lower bounds on the competitive ratio, with lookahead 0≤LD≤1 under the preemptive model, and lookahead 0≤LD≤2 under the non-preemptive model. A method to transform a preemptive lookahead online algorithm to a non-preemptive online algorithm with enhanced lookahead ability is also given. Computational tests are performed to compare the practical competitivities of the online heuristics with different lookahead abilities. 相似文献
14.
15.
雷必成 《计算机工程与应用》2009,45(28):20-23
为了研究网络调度对能控性与能观性的影响,对输入、输出通道都存在通信约束的网络控制系统的调度进行建模;利用通信序列表示通道的调度情况,最终将模型转化为离散切换系统;在静态调度下,利用循环不变子空间理论及其相关的性质,给出了网络控制系统能控性、能观性成立的充要条件,并给出了原有控制系统与经网络静态调度后的系统之间能控性、能观性的关系;最后用仿真实例说明了所提结论的正确性。 相似文献
16.
This paper presents a new approach to model a mixed-integer mathematical programming for a single machine scheduling problem with deteriorating and learning effects. Neglecting the effect of performing a job on its own processing time is mentioned as a deficiency in the literature of industrial scheduling. At first, self-influential factors are defined and then equations are presented to utilize these new factors. GAMS IDE/CPLEX software is used to validate the represented model by solving small to medium-sized sample problems, in which the associate results are compared with a hybrid meta-heuristic algorithm based on Ant Colony Optimization (ACO), Genetic Algorithm (GA) and Imperialist Competitive Algorithm (ICA). The vitality of considering and modeling the effect of learning and deteriorating made by a job, on its own processing time is tested for several sample problems. Then a significant improvement is observed by comparing the results acquired by neglecting the self-influential factors with the results based on considering them. Finally, the computational results are analyzed and the conclusion is given. 相似文献
17.
Pawe Rociszewski Pawe Czarnul Rafa Lewandowski Marcel Schally‐Kacprzak 《Concurrency and Computation》2016,28(9):2586-2607
The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
18.
Online scheduling of two job types on a set of multipurpose machines with unit processing times 总被引:1,自引:0,他引:1
We study a problem of scheduling a set of n jobs with unit processing times on a set of m multipurpose machines in which the objective is to minimize the makespan. It is assumed that there are two different job types, where each job type can be processed on a unique subset of machines. We provide an optimal offline algorithm to solve the problem in constant time and an online algorithm with a competitive ratio that equals the lower bound. We show that the worst competitive ratio is obtained for an inclusive job-machine structure in which the first job type can be processed on any of the m machines while the second job type can be processed only on a subset of m/2 machines. Moreover, we show that our online algorithm is 1-competitive if the machines are not flexible, i.e., each machine can process only a single job type. 相似文献
19.
Deshi Ye 《Information Processing Letters》2003,85(4):171-177
Speranza and Tuza [Ann. Oper. Res. 86 (1999) 494-506] studied the on-line problem of scheduling jobs on m identical machines with extendable working time. In this problem, each machine is assumed to have an identical regular working time, which can be extended if necessary. The working time of a machine is the larger one between its regular working time and the total processing time of jobs assigned to it. The objective is to minimize the total working time of machines. They presented an on-line algorithm Hx, with a competitive ratio at most 1.228 for any number of machines by choosing an appropriate parameter x. In this paper we consider a small number of machines. The best choices of x are given for m=2,3,4 and the tight bounds, 7/6, 11/9 and 19/16, respectively, are proved. Among them, the algorithm for m=2 is best possible. We then derive a new algorithm for m=3 with a competitive ratio 7/6. 相似文献
20.
Job shop scheduling problem (JSP) which is widespread in the real-world production system is one of the most general and important problems in various scheduling problems. Nowadays, the effective method for JSP is a hot topic in research area of manufacturing system. JSP is a typical NP-hard combinatorial optimization problem and has a broad engineering application background. Due to the large and complicated solution space and process constraints, JSP is very difficult to find an optimal solution within a reasonable time even for small instances. In this paper, a hybrid particle swarm optimization algorithm (PSO) based on variable neighborhood search (VNS) has been proposed to solve this problem. In order to overcome the blind selection of neighborhood structures during the hybrid algorithm design, a new neighborhood structure evaluation method based on logistic model has been developed to guide the neighborhood structures selection. This method is utilized to evaluate the performance of different neighborhood structures. Then the neighborhood structures which have good performance are selected as the main neighborhood structures in VNS. Finally, a set of benchmark instances have been conducted to evaluate the performance of proposed hybrid algorithm and the comparisons among some other state-of-art reported algorithms are also presented. The experimental results show that the proposed hybrid algorithm has achieved good improvement on the optimization of JSP, which also verifies the effectiveness and efficiency of the proposed neighborhood structure evaluation method. 相似文献