首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于MPI的云计算模型   总被引:11,自引:4,他引:7       下载免费PDF全文
根据消息传递接口(MPI)的特点,提出云计算在MPI领域的应用方法,包括MPI的云计算算法设计模型、云计算原理、核心计算模式、处理流程,并介绍云计算的分布式及并行化特性。理论分析结果表明,该算法是有效可行的,优于传统并行技术,能够为算法分布化及并行化提供新思路。  相似文献   

2.
该文采用蒙特卡罗方法对欧式期权定价问题进行模拟,并用可移植消息传递标准MPI在分布式存储结构的机群系统上设计并实现了并行算法。该算法有效的解决了金融计算中巨大计算量的问题,在很大程度上提高了计算效率,缩短了计算时间,获得了很好的性能。  相似文献   

3.
该文采用蒙特卡罗方法对欧式期权定价问题进行模拟,并用可移植消息传递标准MPI在分布式存储结构的机群系统上设计并实现了并行算法。该算法有效的解决了金融计算中巨大计算量的问题,在很大程度上提高了计算效率,缩短了计算时间,获得了很好的性能。  相似文献   

4.
Recently, High Performance Computing (HPC) platforms have been employed to realize many computationally demanding applications in signal and image processing. These applications require real-time performance constraints to be met. These constraints include latency as well as throughput. In order to meet these performance requirements, efficient parallel algorithms are needed. These algorithms must be engineered to exploit the computational characteristics of such applications. In this paper we present a methodology for mapping a class of adaptive signal processing applications onto HPC platforms such that the throughput performance is optimized. We first define a new task model using the salient computational characteristics of a class of adaptive signal processing applications. Based on this task model, we propose a new execution model. In the earlier linear pipelined execution model, the task mapping choices were restricted. The new model permits flexible task mapping choices, leading to improved throughput performance compared with the previous model. Using the new model, a three-step task mapping methodology is developed. It consists of (1) a data remapping step, (2) a coarse resource allocation step, and (3) a fine performance tuning step. The methodology is demonstrated by designing parallel algorithms for modern radar and sonar signal processing applications. These are implemented on IBM SP2 and Cray T3E, state-of-the-art HPC platforms, to show the effectiveness of our approach. Experimental results show significant performance improvement over those obtained by previous approaches. Our code is written using C and the Message Passing Interface (MPI). Thus, it is portable across various HPC platforms. Received April 8, 1998; revised February 2, 1999.  相似文献   

5.
Advances in computer technologies have enabled corporations to accumulate data at an unprecedented speed. Large-scale business data might contain billions of observations and thousands of features, which easily brings their scale to the level of terabytes. Most traditional feature selection algorithms are designed and implemented for a centralized computing architecture. Their usability significantly deteriorates when data size exceeds tens of gigabytes. High-performance distributed computing frameworks and protocols, such as the Message Passing Interface (MPI) and MapReduce, have been proposed to facilitate software development on grid infrastructures, enabling analysts to process large-scale problems efficiently. This paper presents a novel large-scale feature selection algorithm that is based on variance analysis. The algorithm selects features by evaluating their abilities to explain data variance. It supports both supervised and unsupervised feature selection and can be readily implemented in most distributed computing environments. The algorithm was implemented as a SAS High-Performance Analytics procedure, which can read data in distributed form and perform parallel feature selection in both symmetric multiprocessing mode (SMP) and massively parallel processing mode (MPP). Experimental results demonstrated the superior performance of the proposed method for large scale feature selection.  相似文献   

6.
As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 227 simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters.  相似文献   

7.
郭东亮  张立臣 《微机发展》2004,14(10):31-33,36
实时系统消息传递界面(MPI/RT)是高性能计算中消息传递界面(MPI)在实时方面的扩展,可支持实时通信和实时系统的开发。文中介绍了MPI/RT的背景、开发的基本原理、重要技术和重要概念。描述了MPI/RT如何对双边、单边、零边3种类型的通信和时间驱动、事件驱动、优先级驱动3种实时典范以及这些典范的组合的支持。讨论了MPI/RT如何描述和满足高性能的实时系统的QoS要求。最后给出了一个用MPI/RT开发高性能的实时系统的一般过程。  相似文献   

8.
Yang  Jian  Xiang  Zhen  Mou  Lisha  Liu  Shumu 《Multimedia Tools and Applications》2020,79(47-48):35353-35367

The virtualized resource allocation (mapping) algorithm is the core issue of network virtualization technology. Universal and excellent resource allocation algorithms not only provide efficient and reliable network resources sharing for systems and users, but also simplify the complexity of resource scheduling and management, improve the utilization of basic resources, balance network load and optimize network performance. Based on the application of wireless sensor network, this paper proposes a wireless sensor network architecture based on cloud computing. The WSN hardware resources are mapped into resources in cloud computing through virtualization technology, and the resource allocation strategy of the network architecture is proposed. The experiment evaluates the performance of the resource allocation strategy. The proposed heuristic algorithm is a distributed algorithm. The complexity of centralized algorithms is high, distributed algorithms can handle problems in parallel, and reduce the time required to get a good solution with limited traffic.

  相似文献   

9.
Cloud Computing has evolved to become an enabler for delivering access to large scale distributed applications running on managed network-connected computing systems. This makes possible hosting Distributed Enterprise Information Systems (dEISs) in cloud environments, while enforcing strict performance and quality of service requirements, defined using Service Level Agreements (SLAs). SLAs define the performance boundaries of distributed applications, and are enforced by a cloud management system (CMS) dynamically allocating the available computing resources to the cloud services. We present two novel VM-scaling algorithms focused on dEIS systems, which optimally detect most appropriate scaling conditions using performance-models of distributed applications derived from constant-workload benchmarks, together with SLA-specified performance constraints. We simulate the VM-scaling algorithms in a cloud simulator and compare against trace-based performance models of dEISs. We compare a total of three SLA-based VM-scaling algorithms (one using prediction mechanisms) based on a real-world application scenario involving a large variable number of users. Our results show that it is beneficial to use autoregressive predictive SLA-driven scaling algorithms in cloud management systems for guaranteeing performance invariants of distributed cloud applications, as opposed to using only reactive SLA-based VM-scaling algorithms.  相似文献   

10.
采用主从控制方式和消息传递通信相结合的非均衡设计方法,设计基于片上网络(NoC)的多核分布式操作系统。在该系统中,主控节点通过资源池统计全局资源信息,利用运行时任务调度完成相关任务分派。从节点以异步统计模式反馈资源信息,并使用虚拟内存技术实现并行应用子进程的创建、加载和执行。测试结果表明,该系统能有效支持基于消息传递接口的并行程序的调度、加载及执行。  相似文献   

11.
In this paper, a new hybrid parallelisable low order algorithm, developed by the authors for multibody dynamics analysis, is implemented numerically on a distributed memory parallel computing system. The presented implementation can currently accommodate the general spatial motion of chain systems, but key issues for its extension to general tree and closed loop systems are discussed. Explicit algebraic constraints are used to increase coarse grain parallelism, and to study the influence of the dimension of system constraint load equations on the computational efficiency of the algorithm for real parallel implementation using the Message Passing Interface (MPI). The equation formulation parallelism and linear system solution strategies which are used to reduce communication overhead are addressed. Numerical results indicate that the algorithm is scalable, that significant speed-up can be obtained, and that a quasi-logarithmic relation exists between time needed for a function call and numbers of processors used. This result agrees well with theoretical performance predictions. Numerical comparisons with results obtained from independently developed analysis codes have validated the correctness of the new hybrid parallelisable low order algorithm, and demonstrated certain computational advantages.  相似文献   

12.
对称正定矩阵的并行LDLT分解算法实现   总被引:1,自引:0,他引:1  
基于网络机群这一新的并行环境和消息传递界面MPI给出了两种不带平方根的Cholesky并行分解算法,算法采用行卷帘存储方案和提前发送策略,从而减少了负载的不平衡,增加了计算通信的重叠,减少了通信时间。理论分析和数值试验均表明,算法具有较高的并行加速比和效率。  相似文献   

13.
实现了风暴潮数值模式基于MPI的并行化;根据该模式数值计算的特点提出了一种并行求解三对角方程组的新方法,相对于传统算法编程简单而且并行效率更高;负载平衡是并行程序性能优化首先要解决的问题,以水格点的个数作为任务分解的标准,实现了较好的负载平衡,相比水陆格点不作区分的分解方法性能有明显的提高;在SMP平台上使用8个CPU时加速比可以达到7.0,在集群平台上为6.5。  相似文献   

14.
Spherical harmonic transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas, new cutting‐edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing SHT with two variants of intra‐node parallelism appropriate for novel supercomputer architectures, multi‐core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top‐level, Message Passing Interface‐based parallelisation layer ported from the S2HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest Compute Unified Device Architecture architecture (Fermi) outperforms the state of the art implementation for a multi‐core processor executed on a current Intel Core i7‐2600K. Furthermore, we show that an Message Passing Interface/Compute Unified Device Architecture version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid Message Passing Interface/OpenMP version executed on the same number of quad‐core processors Intel Nehalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for the major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance and elucidates the sources of the dichotomy between the direct and the inverse operations.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
Massive-Multiple Inputs and Multiple Outputs (M-MIMO) is considered as one of the standard techniques in improving the performance of Fifth Generation (5G) radio. 5G signal detection with low propagation delay and high throughput with minimum computational intricacy are some of the serious concerns in the deployment of 5G. The evaluation of 5G promises a high quality of service (QoS), a high data rate, low latency, and spectral efficiency, ensuring several applications that will improve the services in every sector. The existing detection techniques cannot be utilised in 5G and beyond 5G due to the high complexity issues in their implementation. In the proposed article, the Approximation Message Passing (AMP) is implemented and compared with the existing Minimum Mean Square Error (MMSE) and Message Passing Detector (MPD) algorithms. The outcomes of the work show that the performance of Bit Error Rate (BER) is improved with minimal complexity.  相似文献   

16.
Cloud computing allows execution and deployment of different types of applications such as interactive databases or web-based services which require distinctive types of resources. These applications lease cloud resources for a considerably long period and usually occupy various resources to maintain a high quality of service (QoS) factor. On the other hand, general big data batch processing workloads are less QoS-sensitive and require massively parallel cloud resources for short period. Despite the elasticity feature of cloud computing, fine-scale characteristics of cloud-based applications may cause temporal low resource utilization in the cloud computing systems, while process-intensive highly utilized workload suffers from performance issues. Therefore, ability of utilization efficient scheduling of heterogeneous workload is one challenging issue for cloud owners. In this paper, addressing the heterogeneity issue impact on low utilization of cloud computing system, conjunct resource allocation scheme of cloud applications and processing jobs is presented to enhance the cloud utilization. The main idea behind this paper is to apply processing jobs and cloud applications jointly in a preemptive way. However, utilization efficient resource allocation requires exact modeling of workloads. So, first, a novel methodology to model the processing jobs and other cloud applications is proposed. Such jobs are modeled as a collection of parallel and sequential tasks in a Markovian process. This enables us to analyze and calculate the efficient resources required to serve the tasks. The next step makes use of the proposed model to develop a preemptive scheduling algorithm for the processing jobs in order to improve resource utilization and its associated costs in the cloud computing system. Accordingly, a preemption-based resource allocation architecture is proposed to effectively and efficiently utilize the idle reserved resources for the processing jobs in the cloud paradigms. Then, performance metrics such as service time for the processing jobs are investigated. The accuracy of the proposed analytical model and scheduling analysis is verified through simulations and experimental results. The simulation and experimental results also shed light on the achievable QoS level for the preemptively allocated processing jobs.  相似文献   

17.
In the past, people have focused on cluster computing and grid computing. Now, however, this focus has shifted to cloud computing. Irrespective of what techniques are used, there are always storage requirements. The challenge people face in this area is the huge amount of data to be stored, and its complexity. People are now using many cloud applications. As a result, service providers must serve increasingly more people, causing more and more connections involving substantially more data. These problems could have been solved in the past, but in the age of cloud computing, they have become more complex. This paper focuses on cloud computing infrastructure, and especially data services. The goal of this paper is to implement a high performance and load balancing, and able-to-be-replicated system that provides data storage for private cloud users through a virtualization system. This system extends and enhances the functionality of the Hadoop distributed system. The proposed approach also implements a resource monitor of machine status factors such as CPU, memory, and network usage to help optimize the virtualization system and data storage system. To prove and extend the usability of this design, a synchronize app was also developed running on Android based on our distributed data storage.  相似文献   

18.
Abstract

As an alternative to traditional computing architecture, cloud computing now is rapidly growing. However, it is based on models like cluster computing in general. Now supercomputers are getting more and more powerful, helping scientists have more indepth understanding of the world. At the same time, clusters of commodity servers have been mainstream in the IT industry, powering not only large Internet services but also a growing number of data-intensive scientific applications, such as MPI based deep learning applications. In order to reduce the energy cost, more and more efforts are made to improve the energy consumption of HPC systems. Because I/O accesses account for a large portion of the execution time for data intensive applications, it is critical to design energy-aware parallel I/O functions for addressing challenges related to HPC energy efficiency. As the de facto standard for designing parallel applications in cluster environment, the Message Passing Interface has been widely used in high performance computing, therefore, getting the energy consumption information of MPI applications is critical for improving the energy efficiency of HPC systems. In this work we first present our energy measurement tool, a software framework that eases the energy collection in cluster environment. And then we present an approach which can optimise the parallel I/O operation’s energy efficiency. The energy scheduling algorithm is evaluated in a cluster.  相似文献   

19.
该文提出了一个基于WWW的虚拟并行程序环境框架,并且对实现该环境的关键技术如消息传送接口、基于Java图形用户接口、可视化子系统应用等进行了研究。该环境在大型复杂工程研究开发、远程教育等领域有着重要的应用。  相似文献   

20.
Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80–100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号