首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
通过作业日志分析和考核实验方式,对超级计算机并行作业运行稳定性进行了分析。日志分析结果表明,并行作业运行的稳定性会随作业执行时间的增长、作业使用CPU数的增多而下降;当并行作业的计算量达到105CPU小时量级,超过20%的作业会因系统故障而中止。考核实验结果表明,使用数千CPU的并行作业很容易受到多种因素的干扰而中止,很难持续运行超过24小时。最后给出了有关超级计算机稳定性改进、系统管理使用和并行程序研制的几点建议。  相似文献   

2.
Cooperation of multi-domain massively parallel processor systems in com- puting grid environment provides new opportunities for multisite job scheduling. At the same time, in the area of co-allocation, heterogeneity, network adaptability and scalability raise the challenge for the international design of multisite job scheduling models and algorithms. It presents multisite job scheduling schema through the introduction of mul- tisite job scheduling model and the performance model under the grid environment. It introduces two job multisite and cooperative scheduling models and algorithms with the core of the optimal and greedy-heuristic resource selection strategies. Meanwhile, com- pared with single and multisite cooperative scheduling models and algorithms introduced by Sabin, Yahyapour and other persons, the validity and advance of the scheduling model and the performance model herein are proved.  相似文献   

3.
讨论了在一个由高速局域网连接的高性能异构工作站平台上,如何有效地利用空闲工作站来求解计算密集型任务矩阵相乘的问题,为了获得较好的并行计算性能,文中给出了一个异构工作站群之间任务调度的模型和算法,算法中考虑了并行计算中协作任务间的通信时间、数据加栽时间、结果收集时间和各个异构工作站的任务计算时间,通过这个模型,可以在所有可利用的工作站集合中找出最适合的子集,获得最短的执行时间.  相似文献   

4.
Distributed Downloads of Bulk,Replicated Grid Data   总被引:2,自引:0,他引:2  
Data-sharing scientific communities use storage systems as distributed data stores by replicating content. In such highly replicated environments, a particular dataset can reside at multiple locations and can thus be downloaded from any one of them. Since datasets of interest are significantly large in size, improving download speeds either by server selection or by co-allocation can offer substantial benefits. In this paper, we present an architecture for co-allocating Grid data transfers across multiple connections, enabling the parallel download of datasets from multiple servers. We have developed several co-allocation strategies comprising of simple brute-force, predictive and dynamic load balancing techniques as a means both to exploit rate differences among the various client–server links and to address dynamic rate fluctuations. We evaluate our approaches using the GridFTP data movement protocol in a wide-area testbed and present our results.  相似文献   

5.
一种面向同构集群系统的并行任务节能调度优化方法   总被引:1,自引:0,他引:1  
节能调度算法设计是高性能计算领域中的一个研究热点.复制调度算法能够减少后继任务等待延时,缩短任务总体调度时间,但是耗费了更多的能量.为此,作者提出一种启发式处理器合并优化方法 PRO.该方法按照任务最早开始时间和最早结束时间查找处理器时间空隙,将轻负载处理器上的任务重新分配到其它处理器上,从而减少使用的处理器数目,降低系统总体能耗.实验结果表明,和已有的复制任务调度算法TDS、EAD和PEBD相比,优化后的调度算法在不增加调度时间的条件下,能够明显减少使用的处理器数和系统总体能耗,从而更好地实现性能和能耗之间的平衡.  相似文献   

6.
一类含同工件流水线调度问题的优化研究   总被引:1,自引:0,他引:1  
流水线调度问题是具有很强工程背景的典型NP完全问题,当其含有同工件时,批量和排序的相关性使得问题的求解更为复杂。文章有机结合模拟退火的概率突跳性和遗传算法的并行搜索结构,提出了处理一类含同工件流水线调度问题的混合优化策略。算法不仅能够动态缩小搜索空间以提高搜索效率,而且在保优策略的基础上利用重升温技术来增强克服陷入局部极小的能力,其有效性和快速性通过仿真得到了验证。  相似文献   

7.
基于机群系统的数据存储分布是并行数据库领域的一个重要问题。已有的研究工作多集中在基于单个关系的存储分布,不能有效支持复杂多连接查询处理。文章提出了多个关系整体分布方法,给出分布属性选择和处理机分配算法。实验结果表明,算法具有良好的性能,有助于提高并行多连接查询效率。  相似文献   

8.
大规模并行计算机的作业调度直接关系到其计算能力的发挥,因而相应的研究具有十分重要的意义。论文通过对国外现有作业调度评价体系的研究,建立了更能反映并行作业特点的作业调度策略评价体系,在此基础上设计并实现了一个作业模拟调度环境。模拟调度环境采用事件驱动的工作模式,支持FCFS、大作业优先、小作业优先、长作业优先、短作业优先、GANG等调度策略。模拟测试结果表明,GANG调度策略优于所有测试的空间共享调度策略;同时在空间共享调度策略中,短作业优先策略和大作业优先策略具有较好的性能。  相似文献   

9.
多维流序列并行预测算法研究   总被引:1,自引:0,他引:1  
提出并行算法MSSF-VQ(Multiple Sequential Stream Forecast algorithm based on Vector Quantization),以解决多维序列流的未来趋势预测问题.算法利用矢量空间表示序列流的计算模型,并采用量子化技术离散处理连续序列流,然后提出了序列流矢量概率树的构造算法和搜索算法,最后阐述了算法步骤.真实流序列上的实验结果表明,MSSF-VQ算法预测的准确率高,速度快,在线处理占用的空间小,并有良好的扩展性.  相似文献   

10.
【目的】为应对超大规模计算系统所带来的监控数据风暴、作业调度稳定性及灵活性、网络复杂度及高效性等实际挑战,本文分享了近期真实实践的经验和解决办法。【应用背景】当计算系统从P级逐渐向E级过渡,节点数量可超过10000个。在计算系统设计之初就需要确定网络拓扑的选型,而在系统的具体使用中更是离不开高效的调度和及时的监控。【方法】本文采用了基于动态负载均衡的分布式监控架构设计,基于高速缓存的分布式告警架构设计,基于SLURM的源码和配置优化,以及nd-Torus网络拓扑仿真对比等相关技术手段,基本满足了实际业务使用需求。【结果】数据表明,对于~10000节点的计算系统,实时告警数据库表的数据量大小基本可以控制在100万条以内。优化后的SLURM调度系统,可满足系统的业务级调度需求。网络方面,6D-Torus网络由于网络直径低、平均通信距离短,性能和网卡线缆用量较Fat-Tree网络和3D-Torus有一定提升,饱和吞吐率超过40%。【结论】分布式监控架构和告警架构可以有效解决监控数据风暴问题。SLURM在优化后可以实现对超大规模计算系统的作业调度功能。就线缆和交换机使用数量而言,6D-Torus相对于传统Fat-Tree网络更加经济,且性能优于3D-Torus,更适合超大规模计算系统。  相似文献   

11.
由于无线传感器网络具有超大规模、与环境耦合紧密等特点,物理实验不可避免受到限制,仿真建模因此成为重要的研究手段.多分辨率建模作为复杂系统分布式仿真的关键技术,能够很好地满足无线传感器网络的仿真需要.本文简述了多分辨率建模相关的理论与技术,对其在无线传感器网络仿真研究中的应用意义做了分析,并且基于多分辨率建模方法提出了一种新的传感器网络仿真框架,最后通过仿真试验论证了新框架的可行性.  相似文献   

12.
为了满足多水下机器人系统的仿真需求,根据水声信道和水声通信机模型,设计了一个基于局域网的水声网络通信协议仿真框架。该框架能够为分布式交互的水下机器人网络提供一个共享的虚拟水声信道,模拟通信协议在多个水下机器人节点上的运行情况。最后给出的是一个典型网络拓扑下ALOHA协议的仿真结果。  相似文献   

13.
    
In recent years, network of workstations/PCs (so called NOW) are becoming appealing vehicles for cost-effective parallel computing. Due to the commodity nature of workstations and networking equipment, LAN environments are gradually becoming heterogeneous. The diverse sources of heterogeneity in NOW systems pose a challenge on the design of efficient communication algorithms for this class of systems. In this paper, we propose efficient algorithms for multiple multicast on heterogeneous NOW systems, focusing on heterogeneity in processing speeds of workstations/PCs. Multiple multicast is an important operation in many scientific and industrial applications. Multicast on heterogeneous systems has not been investigated until recently. Our work distinguishes itself from others in two aspects: (1) In contrast to the blocking communication model used in prior works, we model communication in a heterogeneous cluster more accurately by a non-blocking communication model, and design multicast algorithms that can fully take advantage of non-blocking communication. (2) While prior works focus on single multicast problem, we propose efficient algorithms for general, multiple multicast (in which single multicast is a special case) on heterogeneous NOW systems. To our knowledge, our work is the earliest effort that addresses multiple multicast for heterogeneous NOW systems. These algorithms are evaluated using a network simulator for heterogeneous NOW systems. Our experimental results on a system of up to 64 nodes show that some of the algorithms outperform others in many cases. The best algorithm achieves completion time that is within 2.5 times of the lower bound.  相似文献   

14.
The quality of an approximate solution for combinatorial optimization problems with a single objective can be evaluated relatively easily. However, this becomes more difficult when there are multiple objectives. One potential approach to solving multiple criteria combinatorial optimization problems when at least one of the single objective problems is NP-complete, is to use an a posteriori method that approximates the efficient frontier. A common difficulty in this type of approach, however, is evaluating the quality of approximate solutions, since sets of multiple solutions should be evaluated and compared. This necessitates the use of a comparison measure that is robust and accurate. Furthermore, a robust measure plays an important role in metaheuristic optimization for tuning various parameters for evolutionary algorithms, simulated annealing, etc., which are frequently employed for multiple criteria combinatorial optimization problems. In this paper, the performance of a new measure, which we call Integrated Convex Preference (ICP) is compared to that of other measures appearing in the literature through numerical experiments—specifically, we use two a posteriori solution techniques based on genetic algorithms for a bi-criteria parallel machine scheduling problem and evaluate their performance (in terms of solution quality) using different measures. Experimental results show that the ICP measure evaluates the solution quality of approximations robustly (i.e., similar to visual comparison results) while other alternative measures can misjudge the solution quality. We note that the ICP measure can be applied to other non-scheduling multiple objective combinatorial optimization problems, as well.  相似文献   

15.
手写体字符识别的多特征多分类器设计   总被引:4,自引:0,他引:4  
特征选取和分类器设计是字符识别系统设计的关键。文章针对手写体汉字和阿拉伯数字混和字符集的识别提出了依据不同的分类要求,分别选取不同的字符特征并采用神经网络多分类器进行识别的设计方法。实验结果表明,该方法用于手写体混合字符集的识别是行之有效的。  相似文献   

16.
本文研究了虚拟多网卡的应用技术,提供了一个在现实工程中通用的在不同平台上的虚拟多网卡技术. 并给出了一个示例程序详细说明了虚拟多网卡的实现。  相似文献   

17.
针对传统摆渡路由中使者调度和协作的问题,设计一种交叉区域的多使者摆渡路由协议.将网络划分成若干横向区域和纵向区域,每个区域内存在一个使者轮询节点.通过单个使者或一个横向区域使者与一个纵向区域使者的协作实现数据的传递.从理论上分析了提出协议的期望延时,并从延时和容错性两个方面对协议进行了改善.仿真评估结果表明,交叉区域摆渡路由在平衡网络负载和端到端的延时的同时,具有单一使者的容错能力,是一种合理有效的多使者调度方法.  相似文献   

18.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

19.
随着我国信息化进程的不断深化,信息技术越来越多地应用于各个领域及行业,为人们的生产、生活以及学习带来了极大的便捷。目前,信息技术也广泛地应用于我国的医院各项事务之中。该文以医院的信息化建设为主要平台,对当前时期下我国医院多重网络的整合进行着重地阐述。  相似文献   

20.
LUNF--基于节点失效特征的机群作业调度策略   总被引:1,自引:0,他引:1  
良好的可扩展性使得人们可通过扩大机群系统的规模来达到所需要的计算能力,但随着机群系统节点数目的增多,节点失效对机群系统性能的影响已经成为大规模机群系统使用过程中一个不可回避的问题.机群作业调度作为机群操作系统软件的重要组成部分,完成高效资源管理与合理作业调度,机群作业调度系统功能上可分为作业选取策略和节点分配策略两部分.结合机群系统节点失效的特征,提出了正常运行时间最长节点优先(longest uptime node first,LUNF)的节点分配策略.仿真结果表明,相对于节点随机分配策略,LUNF策略的作业平均响应时间与作业平均slowdown降低10%左右.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号