共查询到20条相似文献,搜索用时 15 毫秒
1.
RPC是互联网后端分布式系统的核心组件,能够降低互联网应用开发、运维成本,提高可用性和可扩展性,但是目前流行的RPC框架不能完全满足互联网应用需求。分析了互联网应用环境下RPC系统的需求,并针对需求提出了面向互联网的RPC系统RPCI。RPCI采用三层架构,将长连接服务器独立出来,以支持无状态应用服务器设计和灵活的请求路由策略,使得系统扩容、升级、运维更加容易。基于thrift实现了RPCI,优化了性能,实验结果表明,RPCI性能优秀,相比常用开源软件thrift性能提升50%以上。 相似文献
2.
RPC(Remote Procedure Call)协议有多个版本,分为内核态RPC与用户态RPC两类.开发人员根据设计需求选用合适的RPC版本,很多情况下会涉及跨内核态和用户态的通信.用户态RPC不如内核态RPC完善,缺少多线程机制、RDMA(Remote Direct Memory Access)机制等,需要优化以提高性能.结合此类需求,分析了用户态TI-RPC(Transport Independent Remote Procedure Call)运行机制,提出分层多线程的优化方法;利用TI-RPC底层接口重构RPC端口创建与服务启动;增加线程池机制,使TI-RPC在RPC层实现多线程并发服务.性能对比测试表明RPC内部多线程优化可将网络的利用率提高到网络满带宽的93%. 相似文献
3.
PRESTO is a programming system for writing object-oriented parallel programs in a multiprocessor environment. PRESTO provides the programmer with a set of pre-defined object types that simplify the construction of parallel programs. Examples of PRESTO objects are threads, which provide fine-grained control over a program's execution, and synchronization objects, which allow simultaneously executing threads to co-ordinate their activities. The goals of PRESTO are to provide a programming environment that makes it easy to express concurrent algorithms, to do so efficiently, and to do so in a manner that invites extensions and modifications. The first two goals, which are the focus of this paper, allow a programmer to use parallelism in a way that is naturally suited to the problem at hand, rather than being constrained by the limitations of a particular underlying kernel or hardware architecture. The third goal is touched upon but not emphasized in this paper. PRESTO is written in C++; it currently runs on the Sequent shared-memory multiprocessor on top of the Dynix operating system. In this paper we describe the system model, its applicability to parallel programming, experiences with the initial implementation, and some early performance measurements. 相似文献
4.
There are substantial benefits to be gained from building computing systems from a number of processors working in parallel. One of the frequently-stated advantages of parallel and distributed systems is that they may be scaled to the needs of the user. This paper discusses some of the problems associated with designing a general-purpose operating system for a scalable parallel computing engine and then describes the solutions adopted in our experimental parallel operating system. We explain why a parallel computing engine composed of a collection of processors communicating through point-to-point links provides a suitable vehicle in which to realize the advantages of scaling. We then introduce a parallel-processing abstraction which can be used as the basis of an operating system for such a computing engine. We consider how this abstraction can be implemented and retain the ability to scale. As a concrete example of the ideas presented here we describe our own experimental scalable parallel operating-system project, concentrating on the Wisdom nucleus and the Sage file system. Finally, after introducing related work, we describe some of the lessons learnt from our own project. 相似文献
5.
Francisco Heron de Carvalho Junior Cenez Araújo de Rezende 《Journal of Parallel and Distributed Computing》2013
Component-oriented programming has been applied to address the requirements of large-scale applications from computational sciences and engineering that present high performance computing (HPC) requirements. However, parallelism continues to be a challenging requirement in the design of CBHPC (Component-Based High Performance Computing) platforms. This paper presents strong evidence about the efficacy and the efficiency of HPE (Hash Programming Environment), a CBHPC platform that provides full support for parallel programming, on the development, deployment and execution of numerical simulation code onto cluster computing platforms. 相似文献
6.
A finite element code with a polycrystal plasticity model for simulating deformation processing of metals has been developed for parallel computers using High Performance Fortran (HPF). The conversion of the code from an original implementation on the Connection Machine systems using CM Fortran is described. The sections of the code requiring minimal inter-processor communication are easily parallelized, by changing only the syntax for specifying data layout. However, the solver routine based on the conjugate gradient method required additional modifications, which are discussed in detail. The performance of the code on a massively parallel distributed-memory Intel PARAGON supercomputer is evaluated through timing statistics. Published by Elsevier Science Ltd. 相似文献
7.
A parallel micro evolutionary algorithm for heterogeneous computing and grid scheduling 总被引:1,自引:0,他引:1
This work presents a novel parallel micro evolutionary algorithm for scheduling tasks in distributed heterogeneous computing and grid environments. The scheduling problem in heterogeneous environments is NP-hard, so a significant effort has been made in order to develop an efficient method to provide good schedules in reduced execution times. The parallel micro evolutionary algorithm is implemented using MALLBA, a general-purpose library for combinatorial optimization. Efficient numerical results are reported in the experimental analysis performed on both well-known problem instances and large instances that model medium-sized grid environments. The comparative study of traditional methods and evolutionary algorithms shows that the parallel micro evolutionary algorithm achieves a high problem solving efficacy, outperforming previous results already reported in the related literature, and also showing a good scalability behavior when facing high dimension problem instances. 相似文献
8.
The use of a network of shared, heterogeneous workstations each harboring a reconfigurable computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system's performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of RC systems. Our analytic performance model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. The methodology proves to be accurate in characterizing these effects for applications running on shared, homogeneous, and heterogeneous HPRC resources. The model error in all cases was found to be less than 5% for application runtimes greater than 30 s, and less than 15% for runtimes less than 30 s. 相似文献
9.
把网格和高性能计算结合起来,基于网格高性能计算平台的并行计算系统(GPCS),主要介绍了GPCS的体系结构、功能及其设计、实现等几个主要的问题。该平台以通用网络为基础,以网格平台中间件为桥梁,实现了各种高性能计算资源之间的互联互通、共享和协同工作。 相似文献
10.
Jack B. Dennis 《International journal of parallel programming》1994,22(1):47-77
It is widely believed that superscalar and superpipelined extensions of RISC style architecture will dominate future processor
design, and that needs of parallel computing will have little effect on processor architecture. This belief ignores the issues
of memory latency and synchronization, and fails to recognize the opportunity to support a general semantic model for parallel
computing. Efforts to extend the shared-memory model using standard microprocessors have led to systems that implement no
satisfactory model of computing, and present the programmer with a difficult interface on which to build parallel computing
applications. A more satisfactory model for parallel computing may be obtained on the basis of functional programming concepts
and the principles of modular software construction. We recommend that designs for computers be built on such a general semantic
model of parallel computation. Multithreading concepts and dataflow principles can frame the architecture of these new machines. 相似文献
11.
A comparative workload-based methodology for performance evaluation of parallel computers 总被引:1,自引:0,他引:1
A practical methodology for evaluating and comparing the performance of distributed memory Multiple Instruction Multiple Data (MIMD) systems is presented. The methodology determines machine parameters and program parameters separately, and predicts the performance of a given workload on the machines under consideration. Machine parameters are measured using benchmarks that consist of parallel algorithm structures. The methodology takes a workload-based approach in which a mix of application programs constitutes the workload. Performance of different systems are compared, under the given workload, using the ratio of their speeds. In order to validate the methodology, an example workload has been constructed and the time estimates have been compared with the actual runs, yielding good predicted values. Variations in the workload are analysed in terms of increase in problem sizes and changes in the frequency of particular algorithm groups. Utilization and scalability are used to compare the systems when the number of processors is increased. It has been shown that performance of parallel computers is sensitive to the changes in the workload and therefore any evaluation and comparison must consider a given user workload. Performance improvement that can be obtained by increasing the size of a distributed memory MIMD system depends on the characteristics of the workload as well as the parameters that characterize the communication speed of the parallel system. 相似文献
12.
This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors.
InfiniteDB aims at efficiently support data intensive computing in response to the rapid growing in database size and the
need of high performance analyzing of massive databases. It can be efficiently executed in the computing system composed by
thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation,
inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data
declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive
query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinatorwrapper
mechanism to support the integration of heterogeneous information resources on the Internet, and the fault tolerant and resilient
infrastructures. It has been used in many applications and has proved quite effective for data intensive computing. 相似文献
13.
《Journal of Computer and System Sciences》2016,82(2):174-190
We address scheduling independent and precedence constrained parallel tasks on multiple homogeneous processors in a data center with dynamically variable voltage and speed as combinatorial optimization problems. We consider the problem of minimizing schedule length with energy consumption constraint and the problem of minimizing energy consumption with schedule length constraint on multiple processors. Our approach is to use level-by-level scheduling algorithms to deal with precedence constraints. We use a simple system partitioning and processor allocation scheme, which always schedules as many parallel tasks as possible for simultaneous execution. We use two heuristic algorithms for scheduling independent parallel tasks in the same level, i.e., SIMPLE and GREEDY. We adopt a two-level energy/time/power allocation scheme, namely, optimal energy/time allocation among levels of tasks and equal power supply to tasks in the same level. Our approach results in significant performance improvement compared with previous algorithms in scheduling independent and precedence constrained parallel tasks. 相似文献
14.
The effectiveness of loop self-scheduling schemes has been shown on traditional multiprocessors in the past and computing
clusters in the recent years. However, parallel loop scheduling has not been widely applied to computing grids, which are
characterized by heterogeneous resources and dynamic environments. In this paper, a performance-based approach, taking the
two characteristics above into consideration, is proposed to schedule parallel loop iterations on grid environments. Furthermore,
we use a parameter, SWR, to estimate the proportion of the workload which can be scheduled statically, thus alleviating the effect of irregular workloads.
Experimental results on a grid testbed show that the proposed approach can reduce the completion time for applications with
regular or irregular workloads. Consequently, we claim that parallel loop scheduling can benefit applications on grid environments. 相似文献
15.
16.
VENUS:一个通用的并行性能可视化环境 总被引:1,自引:0,他引:1
本文介绍了一个通用的并行程序性能可视化环境VENUS。在分析当前并行性能可视化工具不足的基础上,VENUS系统采用了基于可扩展的多层性能视图模型的可视化方法,并改进了PVM的跟踪机制以支持性能可视化分析与程序源代码的直接对应。实验表明,VENUS系统能够有效地帮助发现并行程序中的性能瓶颈。 相似文献
17.
Steve C. Chiu 《The Journal of supercomputing》2008,46(2):105-107
The abundance of parallel and distributed computing platforms, such as MPP, SMP, and the Beowulf clusters, to name just a
few, has added many more possibilities and challenges to high performance computing (HPC), parallel I/O, mass data storage,
scalable architectures, and large-scale simulations, which traditionally belong to the realm of custom-tailored parallel systems.
The intent of this special issue is to discuss problems and solutions, to identify new issues, and to help shape future research
directions in these areas. From these perspectives, this special issue addresses the problems encountered at the hardware,
architectural, and application levels, while providing conceptual as well as empirical treatments to the current issues in
high performance computing, and the I/O architectures and systems utilized therein. 相似文献
18.
Model-driven monitoring support for the multi-view performance analysis of parallel embedded applications 总被引:1,自引:0,他引:1
J. Reference to Garcí a J. Reference to Entrialgo F. J. Reference to Su rez D. F. Reference to Garcí a 《Performance Evaluation》2000,39(1-4):81-98
This paper describes an approach to carry out performance analysis of parallel embedded applications. The approach is based on measurement, but in addition, the idea of driving the measurement process (application instrumentation and monitoring) by a behavioral model is introduced. Using this model, highly comprehensible performance information can be collected. The whole approach is based on this behavioral model, one instrumentation method and two tools, one for monitoring and the other for visualization and analysis. Each of these is briefly described, and the steps to carry out performance analysis using them are clearly defined. They are explained by means of a case study. Finally, one method to evaluate the intrusiveness of the monitoring approach is proposed, and the intrusiveness results for the case study are presented. 相似文献
19.
Abdel-Elah 《Performance Evaluation》2005,60(1-4):223-236
In this paper we investigate the performance of distributed heuristic search methods based on a well-known heuristic search algorithm, the iterative deepening A* (IDA*). The contribution of this paper includes proposing and assessing a distributed algorithm for IDA*. The assessment is based on space, time and solution quality that are quantified in terms of several performance parameters such as generated search space and real execution time among others. The experiments are conducted on a cluster computer system consisting of 16 hosts built around a general-purpose network. The objective of this research is to investigate the feasibility of cluster computing as an alternative for hosting applications requiring intensive graph search. The results reveal that cluster computing improves on the performance of IDA* at a reasonable cost. 相似文献