期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于MPI的云计算模型 总被引：11，自引：4，他引：7

郭本俊王鹏陈高云黄健《计算机工程》2009,35(24):84-86

根据消息传递接口（MPI）的特点,提出云计算在MPI领域的应用方法,包括MPI的云计算算法设计模型、云计算原理、核心计算模式、处理流程,并介绍云计算的分布式及并行化特性。理论分析结果表明,该算法是有效可行的,优于传统并行技术,能够为算法分布化及并行化提供新思路。相似文献

2.

并行蒙特卡罗方法的应用

申杰王文凡《数字社区&智能家居》2009,(22)

该文采用蒙特卡罗方法对欧式期权定价问题进行模拟,并用可移植消息传递标准MPI在分布式存储结构的机群系统上设计并实现了并行算法。该算法有效的解决了金融计算中巨大计算量的问题,在很大程度上提高了计算效率,缩短了计算时间,获得了很好的性能。相似文献

3.

并行蒙特卡罗方法的应用

申杰王文凡《数字社区&智能家居》2009,5(8):6296-6297

该文采用蒙特卡罗方法对欧式期权定价问题进行模拟,并用可移植消息传递标准MPI在分布式存储结构的机群系统上设计并实现了并行算法。该算法有效的解决了金融计算中巨大计算量的问题,在很大程度上提高了计算效率,缩短了计算时间,获得了很好的性能。相似文献

4.

Parallel Implementation of a Class of Adaptive Signal Processing Applications

M. Lee W. Liu V. K. Prasanna 《Algorithmica》2001,30(4):645-684

Recently, High Performance Computing (HPC) platforms have been employed to realize many computationally demanding applications in signal and image processing. These applications require real-time performance constraints to be met. These constraints include latency as well as throughput. In order to meet these performance requirements, efficient parallel algorithms are needed. These algorithms must be engineered to exploit the computational characteristics of such applications. In this paper we present a methodology for mapping a class of adaptive signal processing applications onto HPC platforms such that the throughput performance is optimized. We first define a new task model using the salient computational characteristics of a class of adaptive signal processing applications. Based on this task model, we propose a new execution model. In the earlier linear pipelined execution model, the task mapping choices were restricted. The new model permits flexible task mapping choices, leading to improved throughput performance compared with the previous model. Using the new model, a three-step task mapping methodology is developed. It consists of (1) a data remapping step, (2) a coarse resource allocation step, and (3) a fine performance tuning step. The methodology is demonstrated by designing parallel algorithms for modern radar and sonar signal processing applications. These are implemented on IBM SP2 and Cray T3E, state-of-the-art HPC platforms, to show the effectiveness of our approach. Experimental results show significant performance improvement over those obtained by previous approaches. Our code is written using C and the Message Passing Interface (MPI). Thus, it is portable across various HPC platforms. Received April 8, 1998; revised February 2, 1999. 相似文献

5.

Massively parallel feature selection: an approach based on variance preservation

Zheng Zhao Ruiwen Zhang James Cox David Duling Warren Sarle 《Machine Learning》2013,92(1):195-220

Advances in computer technologies have enabled corporations to accumulate data at an unprecedented speed. Large-scale business data might contain billions of observations and thousands of features, which easily brings their scale to the level of terabytes. Most traditional feature selection algorithms are designed and implemented for a centralized computing architecture. Their usability significantly deteriorates when data size exceeds tens of gigabytes. High-performance distributed computing frameworks and protocols, such as the Message Passing Interface (MPI) and MapReduce, have been proposed to facilitate software development on grid infrastructures, enabling analysts to process large-scale problems efficiently. This paper presents a novel large-scale feature selection algorithm that is based on variance analysis. The algorithm selects features by evaluating their abilities to explain data variance. It supports both supervised and unsupervised feature selection and can be readily implemented in most distributed computing environments. The algorithm was implemented as a SAS High-Performance Analytics procedure, which can read data in distributed form and perform parallel feature selection in both symmetric multiprocessing mode (SMP) and massively parallel processing mode (MPP). Experimental results demonstrated the superior performance of the proposed method for large scale feature selection. 相似文献

6.

Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale

《Future Generation Computer Systems》2014

As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 2²⁷ simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. 相似文献

7.

实时消息传递界面

郭东亮张立臣《微机发展》2004,14(10):31-33,36

实时系统消息传递界面(MPI/RT)是高性能计算中消息传递界面(MPI)在实时方面的扩展，可支持实时通信和实时系统的开发。文中介绍了MPI/RT的背景、开发的基本原理、重要技术和重要概念。描述了MPI/RT如何对双边、单边、零边3种类型的通信和时间驱动、事件驱动、优先级驱动3种实时典范以及这些典范的组合的支持。讨论了MPI/RT如何描述和满足高性能的实时系统的QoS要求。最后给出了一个用MPI/RT开发高性能的实时系统的一般过程。相似文献

8.

Multimedia resource allocation strategy of wireless sensor networks using distributed heuristic algorithm in cloud computing environment

Yang Jian Xiang Zhen Mou Lisha Liu Shumu 《Multimedia Tools and Applications》2020,79(47-48):35353-35367

The virtualized resource allocation (mapping) algorithm is the core issue of network virtualization technology. Universal and excellent resource allocation algorithms not only provide efficient and reliable network resources sharing for systems and users, but also simplify the complexity of resource scheduling and management, improve the utilization of basic resources, balance network load and optimize network performance. Based on the application of wireless sensor network, this paper proposes a wireless sensor network architecture based on cloud computing. The WSN hardware resources are mapped into resources in cloud computing through virtualization technology, and the resource allocation strategy of the network architecture is proposed. The experiment evaluates the performance of the resource allocation strategy. The proposed heuristic algorithm is a distributed algorithm. The complexity of centralized algorithms is high, distributed algorithms can handle problems in parallel, and reduce the time required to get a good solution with limited traffic.

相似文献

9.

Simulation of SLA-based VM-scaling algorithms for cloud-distributed applications

《Future Generation Computer Systems》2016

Cloud Computing has evolved to become an enabler for delivering access to large scale distributed applications running on managed network-connected computing systems. This makes possible hosting Distributed Enterprise Information Systems (dEISs) in cloud environments, while enforcing strict performance and quality of service requirements, defined using Service Level Agreements (SLAs). SLAs define the performance boundaries of distributed applications, and are enforced by a cloud management system (CMS) dynamically allocating the available computing resources to the cloud services. We present two novel VM-scaling algorithms focused on dEIS systems, which optimally detect most appropriate scaling conditions using performance-models of distributed applications derived from constant-workload benchmarks, together with SLA-specified performance constraints. We simulate the VM-scaling algorithms in a cloud simulator and compare against trace-based performance models of dEISs. We compare a total of three SLA-based VM-scaling algorithms (one using prediction mechanisms) based on a real-world application scenario involving a large variable number of users. Our results show that it is beneficial to use autoregressive predictive SLA-driven scaling algorithms in cloud management systems for guaranteeing performance invariants of distributed cloud applications, as opposed to using only reactive SLA-based VM-scaling algorithms. 相似文献

10.

基于NoC的多核分布式操作系统

下载免费PDF全文

胡新安付方发孙俊喻明艳《计算机工程》2012,38(5):259-261

采用主从控制方式和消息传递通信相结合的非均衡设计方法,设计基于片上网络(NoC)的多核分布式操作系统。在该系统中,主控节点通过资源池统计全局资源信息,利用运行时任务调度完成相关任务分派。从节点以异步统计模式反馈资源信息,并使用虚拟内存技术实现并行应用子进程的创建、加载和执行。测试结果表明,该系统能有效支持基于消息传递接口的并行程序的调度、加载及执行。相似文献

11.

Parallel Implementation of a Low Order Algorithm for Dynamics of Multibody Systems on a Distributed Memory Computing System

S. Duan K.S. Anderson 《Engineering with Computers》2000,16(2):96-108

In this paper, a new hybrid parallelisable low order algorithm, developed by the authors for multibody dynamics analysis, is implemented numerically on a distributed memory parallel computing system. The presented implementation can currently accommodate the general spatial motion of chain systems, but key issues for its extension to general tree and closed loop systems are discussed. Explicit algebraic constraints are used to increase coarse grain parallelism, and to study the influence of the dimension of system constraint load equations on the computational efficiency of the algorithm for real parallel implementation using the Message Passing Interface (MPI). The equation formulation parallelism and linear system solution strategies which are used to reduce communication overhead are addressed. Numerical results indicate that the algorithm is scalable, that significant speed-up can be obtained, and that a quasi-logarithmic relation exists between time needed for a function call and numbers of processors used. This result agrees well with theoretical performance predictions. Numerical comparisons with results obtained from independently developed analysis codes have validated the correctness of the new hybrid parallelisable low order algorithm, and demonstrated certain computational advantages. 相似文献

12.

对称正定矩阵的并行LDLT分解算法实现 总被引：1，自引：0，他引：1

张健飞姜弘道《计算机工程与设计》2003,24(10):75-77

基于网络机群这一新的并行环境和消息传递界面MPI给出了两种不带平方根的Cholesky并行分解算法，算法采用行卷帘存储方案和提前发送策略，从而减少了负载的不平衡，增加了计算通信的重叠，减少了通信时间。理论分析和数值试验均表明，算法具有较高的并行加速比和效率。相似文献

13.

风暴潮数值模式的并行化

苗春葆赵鹏沈飙刘永玲《计算机工程与应用》2012,48(2):39-42

实现了风暴潮数值模式基于MPI的并行化;根据该模式数值计算的特点提出了一种并行求解三对角方程组的新方法,相对于传统算法编程简单而且并行效率更高;负载平衡是并行程序性能优化首先要解决的问题,以水格点的个数作为任务分解的标准,实现了较好的负载平衡,相比水陆格点不作区分的分解方法性能有明显的提高;在SMP平台上使用8个CPU时加速比可以达到7.0,在集群平台上为6.5。相似文献

14.

Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi‐core CPUs)

Mikolaj Szydlarski Pierre Esterie Joel Falcou Laura Grigori Radek Stompor 《Concurrency and Computation》2014,26(3):683-711

Spherical harmonic transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas, new cutting‐edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing SHT with two variants of intra‐node parallelism appropriate for novel supercomputer architectures, multi‐core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top‐level, Message Passing Interface‐based parallelisation layer ported from the S²HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest Compute Unified Device Architecture architecture (Fermi) outperforms the state of the art implementation for a multi‐core processor executed on a current Intel Core i7‐2600K. Furthermore, we show that an Message Passing Interface/Compute Unified Device Architecture version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid Message Passing Interface/OpenMP version executed on the same number of quad‐core processors Intel Nehalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for the major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance and elucidates the sources of the dichotomy between the direct and the inverse operations.Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

15.

A Novel Approximate Message Passing Detection for Massive MIMO 5G System

Nidhi Gour Rajneesh Pareek Karthikeyan Rajagopal Himanshu Sharma Mrim M. Alnfiai Mohammed A. AlZain Mehedi Masud Arun Kumar 《计算机系统科学与工程》2023,45(3):2827-2835

Massive-Multiple Inputs and Multiple Outputs (M-MIMO) is considered as one of the standard techniques in improving the performance of Fifth Generation (5G) radio. 5G signal detection with low propagation delay and high throughput with minimum computational intricacy are some of the serious concerns in the deployment of 5G. The evaluation of 5G promises a high quality of service (QoS), a high data rate, low latency, and spectral efficiency, ensuring several applications that will improve the services in every sector. The existing detection techniques cannot be utilised in 5G and beyond 5G due to the high complexity issues in their implementation. In the proposed article, the Approximation Message Passing (AMP) is implemented and compared with the existing Minimum Mean Square Error (MMSE) and Message Passing Detector (MPD) algorithms. The outcomes of the work show that the performance of Bit Error Rate (BER) is improved with minimal complexity. 相似文献

16.

Preemptive cloud resource allocation modeling of processing jobs

Shahin Vakilinia Mohamed Cheriet 《The Journal of supercomputing》2018,74(5):2116-2150

Cloud computing allows execution and deployment of different types of applications such as interactive databases or web-based services which require distinctive types of resources. These applications lease cloud resources for a considerably long period and usually occupy various resources to maintain a high quality of service (QoS) factor. On the other hand, general big data batch processing workloads are less QoS-sensitive and require massively parallel cloud resources for short period. Despite the elasticity feature of cloud computing, fine-scale characteristics of cloud-based applications may cause temporal low resource utilization in the cloud computing systems, while process-intensive highly utilized workload suffers from performance issues. Therefore, ability of utilization efficient scheduling of heterogeneous workload is one challenging issue for cloud owners. In this paper, addressing the heterogeneity issue impact on low utilization of cloud computing system, conjunct resource allocation scheme of cloud applications and processing jobs is presented to enhance the cloud utilization. The main idea behind this paper is to apply processing jobs and cloud applications jointly in a preemptive way. However, utilization efficient resource allocation requires exact modeling of workloads. So, first, a novel methodology to model the processing jobs and other cloud applications is proposed. Such jobs are modeled as a collection of parallel and sequential tasks in a Markovian process. This enables us to analyze and calculate the efficient resources required to serve the tasks. The next step makes use of the proposed model to develop a preemptive scheduling algorithm for the processing jobs in order to improve resource utilization and its associated costs in the cloud computing system. Accordingly, a preemption-based resource allocation architecture is proposed to effectively and efficiently utilize the idle reserved resources for the processing jobs in the cloud paradigms. Then, performance metrics such as service time for the processing jobs are investigated. The accuracy of the proposed analytical model and scheduling analysis is verified through simulations and experimental results. The simulation and experimental results also shed light on the achievable QoS level for the preemptively allocated processing jobs. 相似文献

17.

On construction of a distributed data storage system in cloud

Chao-Tung Yang Wen-Chung Shih Chih-Lin Huang Fuu-Cheng Jiang William Cheng-Chung Chu 《Computing》2016,98(1-2):93-118

In the past, people have focused on cluster computing and grid computing. Now, however, this focus has shifted to cloud computing. Irrespective of what techniques are used, there are always storage requirements. The challenge people face in this area is the huge amount of data to be stored, and its complexity. People are now using many cloud applications. As a result, service providers must serve increasingly more people, causing more and more connections involving substantially more data. These problems could have been solved in the past, but in the age of cloud computing, they have become more complex. This paper focuses on cloud computing infrastructure, and especially data services. The goal of this paper is to implement a high performance and load balancing, and able-to-be-replicated system that provides data storage for private cloud users through a virtualization system. This system extends and enhances the functionality of the Hadoop distributed system. The proposed approach also implements a resource monitor of machine status factors such as CPU, memory, and network usage to help optimize the virtualization system and data storage system. To prove and extend the usability of this design, a synchronize app was also developed running on Android based on our distributed data storage. 相似文献

18.

Improving the energy efficiency of data-intensive applications running on clusters

Weifeng Liu Jie Zhou Bin Gong Hongjun Dai Meng Guo 《International Journal of Parallel, Emergent and Distributed Systems》2020,35(3):246-259

Abstract

As an alternative to traditional computing architecture, cloud computing now is rapidly growing. However, it is based on models like cluster computing in general. Now supercomputers are getting more and more powerful, helping scientists have more indepth understanding of the world. At the same time, clusters of commodity servers have been mainstream in the IT industry, powering not only large Internet services but also a growing number of data-intensive scientific applications, such as MPI based deep learning applications. In order to reduce the energy cost, more and more efforts are made to improve the energy consumption of HPC systems. Because I/O accesses account for a large portion of the execution time for data intensive applications, it is critical to design energy-aware parallel I/O functions for addressing challenges related to HPC energy efficiency. As the de facto standard for designing parallel applications in cluster environment, the Message Passing Interface has been widely used in high performance computing, therefore, getting the energy consumption information of MPI applications is critical for improving the energy efficiency of HPC systems. In this work we first present our energy measurement tool, a software framework that eases the energy collection in cluster environment. And then we present an approach which can optimise the parallel I/O operation’s energy efficiency. The energy scheduling algorithm is evaluated in a cluster. 相似文献

19.

基于WWW的虚拟并行程序环境的设计与实现

李建华王兆青《计算机工程与应用》1999,35(2):87-89,101

该文提出了一个基于ＷＷＷ的虚拟并行程序环境框架,并且对实现该环境的关键技术如消息传送接口、基于Ｊａｖａ图形用户接口、可视化子系统应用等进行了研究。该环境在大型复杂工程研究开发、远程教育等领域有着重要的应用。相似文献

20.

A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects

Mohammad J. Rashti Ahmad Afsahi 《International journal of parallel programming》2009,37(2):223-246

Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80–100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead. 相似文献