首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The runtime of an evolutionary algorithm can be reduced by increasing the number of parallel evaluations. However, increasing the number of parallel evaluations can also result in wasted computational effort since there is a greater probability of creating solutions that do not contribute to convergence towards the global optimum. A trade-off, therefore, arises between the runtime and computational effort for different levels of parallelization of an evolutionary algorithm. When the computational effort is translated into cost, the trade-off can be restated as runtime versus cost. This trade-off is particularly relevant for cloud computing environments where the computing resources can be exactly matched to the level of parallelization of the algorithm, and the cost is proportional to the runtime and how many instances that are used. This paper empirically investigates this trade-off for two different evolutionary algorithms, NSGA-II and differential evolution (DE) when applied to a multi-objective discrete-event simulation (DES) problem. Both generational and steady-state asynchronous versions of both algorithms are included. The approach is to perform parameter tuning on a simplified version of the DES model. A subset of the best configurations from each tuning experiment is then evaluated on a cloud computing platform. The results indicate that, for the included DES problem, the steady-state asynchronous version of each algorithm provides a better runtime versus cost trade-off than the generational versions and that DE outperforms NSGA-II.  相似文献   

2.
Hammurabi is a software application designed to help the UNIX system administrator to evaluate better the consequences of the actions he/she carries out, in order to maintain the system and to make it change. In this paper, we describe the implementation of Hammurabi's inference engine. This engine is a multiactor Prolog-like engine. The actors have, on the one hand, the ability to spy the events occurring inside the whole machine (inside the engine, and inside the UNIX OS), and on the other hand, the ability to forbid or to freeze some events. This ability is used as a synchronization mechanism. In the first part of this paper, we describe the data representation. In its second part, we describe the implementation of the spying mechanism.  相似文献   

3.
Virtualization is a key technology to enable cloud computing. Driver domain based model for network virtualization offers isolation and high levels of flexibility. However, it suffers from poor performance and lacks scalability. In this paper, we evaluate networking performance of virtual machines within Xen. The I/O channel transferring packets between the driver domain and the virtual machines is shown to be the bottleneck. To overcome this limitation, we proposed a packet aggregation based mechanism to transfer packets from the driver domain to the virtual machines. Packet aggregation, combined with an efficient core allocation, allows virtual machines throughput to scale up by 700%, while minimizing both memory and CPU consumption. Besides, aggregation impact on packets delay and jitter remains acceptable. Hence, the proposed I/O virtualization model satisfies infrastructure providers to offer Cloud computing services.  相似文献   

4.
5.
The abundance of parallel and distributed computing platforms, such as MPP, SMP, and the Beowulf clusters, to name just a few, has added many more possibilities and challenges to high performance computing (HPC), parallel I/O, mass data storage, scalable architectures, and large-scale simulations, which traditionally belong to the realm of custom-tailored parallel systems. The intent of this special issue is to discuss problems and solutions, to identify new issues, and to help shape future research directions in these areas. From these perspectives, this special issue addresses the problems encountered at the hardware, architectural, and application levels, while providing conceptual as well as empirical treatments to the current issues in high performance computing, and the I/O architectures and systems utilized therein.  相似文献   

6.
The I/O performance of applications in multiple-disk systems can be improved by overlapping disk accesses. This requires the use of appropriate prefetching and buffer management algorithms that ensure the most useful blocks are accessed and retained in the buffer. In this paper, we answer several fundamental questions on prefetching and buffer management for distributed-buffer parallel I/O systems. First, we derive and prove the optimality of an algorithm, P-min, that minimizes the number of parallel I/Os. Second, we analyze P-con, an algorithm that always matches its replacement decisions with those of the well-known demand-paged MIN algorithm. We show that P-con can become fully sequential in the worst case. Third, we investigate the behavior of on-line algorithms for multiple-disk prefetching and buffer management. We define and analyze P-Iru, a parallel version of the traditional LRU buffer management algorithm. Unexpectedly, we find that the competitive ratio of P-Iru is independent of the number of disks. Finally, we present the practical performance of these algorithms on randomly generated reference strings. These results confirm the conclusions derived from the analysis on worst case inputs  相似文献   

7.
采样数据的并行I/O制约一些并行应用的运行效率。设计、实现了采样数据的聚集并行I/O方法。该方法在客户端部署采样数据缓存,然后合并数据到输出进程,再存储到文件。为了保障并行程序长时间运行过程中采样数据的存储一致性,该方法在JASMIN框架中监测应用程序的运行状态,当并行程序发生负载平衡或者重启动时刷新或者恢复数据。I/O过程中,进一步使用HDF5的分块I/O提高列存储数据的读写效率。测试表明,新方法不仅具有较好的可扩展性,还能在具有负载平衡与重启动等复杂功能的并行应用中提高采样数据的并行 I/O 效率7.5倍以上。  相似文献   

8.
Ring, torus and hypercube architectures/algorithms for parallel computing   总被引:1,自引:0,他引:1  
This paper provides a survey of both architectural and algorithmic aspects of solving problems using parallel processors with ring, torus and hypercube interconnection.  相似文献   

9.
读写一致性算法被广泛部署到分布式存储系统,以保证读写数据的正确性。然而,读写一致性算法通常需要使用一个复杂的通信协议来保证多个节点读写数据的正确性,会带来较大网络传输开销和读写时延。由于各种读写一致性算法实现机制存在较大差异,特定的读写一致性算法往往需要部署到特定的存储应用场景,才能高效地执行数据读写操作,保障对其上应用的服务质量。因此,实际的存储系统开发过程中,开发人员往往需要根据存储应用场景选择读写一致性算法,从而减少数据读写操作带来的系统开销。为了明确各种读写一致性算法适合的应用场景,介绍了分布式存储系统中存在的读写一致性问题,并综述了当前读写一致性算法的实现机制。总结了在副本和纠删码2种存储机制下主流的读写一致性算法,比较了这些读写一致性算法在实现机制、网络开销和数据存储开销等方面的特性。在此基础上,结合了单数据中心分布式存储系统和跨数据中心云际存储系统2种经典的应用场景,总结了开发人员在实际存储系统中部署读写一致性算法过程中需要注意的要点,分析了亟需解决的问题和提升数据读写操作性能的可能途径,展望了读写一致性算法未来的发展方向。  相似文献   

10.
A class of multiplicative algorithms for computing D-optimal designs for regression models on a finite design space is discussed and a monotonicity result for a sequence of determinants obtained by the iterations is proved. As a consequence the convergence of the sequence of designs to the D-optimal design is established. The class of algorithms is indexed by a real parameter and contains two algorithms considered previously as special cases. Numerical results are provided to demonstrate the efficiency of the proposed methods. Finally, several extensions to other optimality criteria are discussed.  相似文献   

11.
Journal of Scheduling - Observations show that some HPC applications periodically alternate between (i) operations (computations, local data accesses) executed on the compute nodes, and (ii) I/O...  相似文献   

12.
In this paper, the standard (four-block) H/sup /spl infin// control problem for systems with a single delay in the feedback loop is studied. A simple procedure of the reduction of the problem to an equivalent one-block problem having particularly simple structure is proposed. The one-block problem is then solved by the J-spectral factorization approach, resulting in the so-called dead-time compensator (DTC) form of the controller. The advantages of the proposed procedure are its simplicity, intuitively clear derivation of the DTC form of the H/sup /spl infin// controller, and extensibility to the multiple delay case.  相似文献   

13.
Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.  相似文献   

14.
孙继荣  李志蜀  殷锋  王莉  李奇 《计算机应用》2006,26(9):2232-2234
利用I/O关系对测试用例集进行约简和优化的思想,首先对I/O关系自身进行约简,然后进行关联性分析,划分成若干个彼此独立的相关组; 接着对各相关组分别进行处理:仅对每个输出涉及到的输入变量进行组合覆盖,进而利用组内元素的关联性通过公共元素进行水平拼接; 最后再把各个相关组的结果进行水平拼接。结果表明改进后的方法可以产生数量最少的用例集。  相似文献   

15.
This article describes Monte Carlo methods and algorithms for the Boltzmann equation for rarefied gases problems in the case of large-scale flow areas. We consider imitation or continuous-time Monte Carlo methods where frequencies of interactions of pairs of particles depend on the difference of the coordinates of particles. The question about reduction of computational costs of algorithms is examined using the specificity of the problem. First, algorithms of an approximated method are constructed, analyzed, and implemented. This method is obtained by using splitting (over groups of particles) of the operator in master equations system. Second, we investigate the fictitious collisions technique, where the upper bound for the number of interacting pairs is specified. The plane Poiseuille flow (in the field of external forces) problem, the heat transfer problem, and the temperature discontinuity propagation problem are numerically solved using the developed algorithms. Asymptotical estimates of the computational costs are confirmed with the data of the computational processes and the comparative properties of the later are fixed. The suggested algorithms of the method with splitting allow parallelization of a certain type.  相似文献   

16.
17.
Three algorithms for computing the diameter of a finite planar set are proposed. Although all three algorithms have (O(n 2) worst-case running time, an expected-complexity analysis shows that, under reasonable probabilistic assumptions, all three algorithms have linear expected running time. Experimental results indicate that two of these algorithms perform very well for some distributions, and are competitive with an existing method. Finally, we exhibit situations where these exact algorithms out-perform a published approximate algorithm.Research of the first author was supported by grant NSERC A 2422. Research of the second author was supported by grants NSERC A 9293, FCAC EQ-1678 and a Killam Senior Research Fellowship awarded by the Canada Council  相似文献   

18.
三维电磁场粒子模拟是研究空间众多微观物理现象的一项先进数值模拟方法。虽然应用MPI和OpenMP混合编程技术实现了程序并行,但阻塞通信的通信同步和应用网络文件系统集中式数据I/O的数据传输降低了程序效率。介绍引入非阻塞通信法,最初计算需要通信部分,在其他计算继续时,进行非阻塞通信,最后接收全部数据,从而实现计算和通信重叠,减少通信等待时间;在分布式存储系统中,各节点同时把本节点数据输入输出到本地单独文件中,大幅度减少数据并行I/O时间,随着数据量和CPU数的增加,改善更加明显,从而提高程序性能。  相似文献   

19.
PCS7近年来在新型干法水泥生产线DCS系统中得到了广泛应用。PCS7与过程信号的连接有两种方式:集中式I/O扩展和分布式I/O扩展,目前这两种方式在新型千法水泥生产线上都有应用,相对而言,分布式I/O扩展在新型干法水泥生产线上具有更大的优势。  相似文献   

20.
This paper presents a new distributed disk-array architecture for achieving high I/O performance in scalable cluster computing. In a serverless cluster of computers, all distributed local disks can be integrated as a distributed-software redundant array of independent disks (ds-RAID) with a single I/O space. We report the new RAID-x design and its benchmark performance results. The advantage of RAID-x comes mainly from its orthogonal striping and mirroring (OSM) architecture. The bandwidth is enhanced with distributed striping across local and remote disks, while the reliability comes from orthogonal mirroring on local disks at the background. Our RAID-x design is experimentally compared with the RAID-5, RAID-10, and chained-declustering RAID through benchmarking on a research Linux cluster at USC. Andrew and Bonnie benchmark results are reported on all four disk-array architectures. Cooperative disk drivers and Linux extensions are developed to enable not only the single I/O space, but also the shared virtual memory and global file hierarchy. We reveal the effects of traffic rate and stripe unit size on I/O performance. Through scalability and overhead analysis, we find the strength of RAID-x in three areas: 1) improved aggregate I/O bandwidth especially for parallel writes, 2) orthogonal mirroring with low software overhead, and 3) enhanced scalability in cluster I/O processing. Architectural strengths and weakness of all four ds-RAID architectures are evaluated comparatively. The optimal choice among them depends on parallel read/write performance desired, the level of fault tolerance required, and the cost-effectiveness in specific I/O processing applications  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号