首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
I/O- and data-intensive workloads, as represented by the Grand Challenge problems, multimedia applications, cosmology simulations, climate modeling, and large collaborative visualizations, to name a few, entail innovative approaches to alleviate the I/O (both bandwidth and data access) performance bottlenecks. The advent of low-cost hardware platforms, such as the Beowulf clusters, has opened up numerous possibilities in mass data storage, scalable architectures, and large-scale simulations. The objective of this Special Issue is to discuss problems and solutions, to identify new issues, and to help shape future research and development directions in these areas. From these perspectives, the Special Issue addresses the problems encountered at the hardware, middleware, and application levels, providing conceptual as well as empirical treatments.  相似文献   

3.
Nearest-neighbor-mesh connection plus global broadcasting/control bus characterizes the architecture of the processor array PAX, that was constructed for and is now operating in many typical scientific applications. Not only these inter-processor connections, but also an MIMD structure of the machine were found effective in the particle transport problems, that require asynchronous operation.

The paper describes the bases of architecture of two recent versions of the PAX computer, their hardware and software systems, and, based on the implementation of scientific applications, the effectiveness of the PAX type architecture is presented.  相似文献   


4.
Parallel system (problem), computation history, concurrency matrix, scheduling discipline  相似文献   

5.
In this paper, we study I/O server placement for optimizing parallel I/O performance on switch-based clusters, which typically adopt irregular network topologies to allow construction of scalable systems with incremental expansion capability. Finding optimal solution to this problem is computationally intractable. We quantified the number of messages travelling through each network link by a workload function, and developed three heuristic algorithms to find good solutions based on the values of the workload function. The maximum-workload-based heuristic chooses the locations for I/O nodes in order to minimize the maximum value of the workload function. The distance-based heuristic aims to minimize the average distance between the compute nodes and I/O nodes, which is equivalent to minimizing average workload on the network links. The load-balance-based heuristic balances the workload on the links based on a recursive traversal of the routing tree for the network. Our simulation results demonstrate performance advantage of our algorithms over a number of algorithms commonly used in existing parallel systems. In particular, the load-balance-based algorithm is superior to the other algorithms in most cases, with improvement ratio of 10 to 95% in terms of parallel I/O throughput.  相似文献   

6.
Over the past two decades (1974-94), advances in semiconductor and integrated circuit technology have fuelled the drive toward faster, ever more efficient computational machines. Today, the most powerful supercomputers can perform computation at billions of floating-point operations per second (gigaflops). This increase in capability is intensifying the demand for even more powerful machines. Computational limits for the largest supercomputers are expected to exceed the teraflops barrier in the coming years. Discussion is given on the following areas: the nature of I/O in massive parallel processing; operating and file systems; runtime system and compilers; and networking technology. The recurrent themes in the parallel I/O problem are the existence of a great variety in access patterns and the sensitivity of current I/O systems to these access patterns. An increase in the variability of access patterns is also expected, and single resource-management approaches will likely not suffice. Providing the I/O infrastructure that will support these requirements will necessitate research in operating systems (parallel file systems, runtime systems, and drivers), language interfaces to high-performance storage systems, high-speed networking, graphics and visualization systems, and new hardware technology for I/O and storage systems  相似文献   

7.
Although parallel processing has been a focal point of computer architecture research for many years, fundamental questions and trade-offs still remain puzzling, not necessarily because of complexity but because of the multitude of possible answers (e.g., shared vs distributed memory, centralized vs distributed control, vector vs scalar). This paper addresses one such issue, namely heterogeneous vs homogeneous parallel machine organizations. Using simple performance and cost models, we argue that multiprocessors based on a fast global control unit capable of fast execution of serial code, and capable of managing an ensemble of slower processors, offer a performance/ cost ratio significantly better than any comparable homogeneous multiprocessor with distributed control. Although the issue of “deliverable” performance is an open question, it appears that such systems can achieve faster execution and higher program speedups at a much lower cost.  相似文献   

8.
In this note, a novel iterative learning control scheme for a class of Hamiltonian control systems is proposed, which is applicable to electromechanical systems. The proposed method has the following distinguished features. This method does not require either the precise knowledge of the model of the target system or the time derivatives of the output signals. Despite the lack of information, the tracking error monotonously decreases in L/sub 2/ sense and, further, perfect tracking is achieved when it is applied to mechanical systems. The self-adjoint related properties of Hamiltonian systems proven in this note play the key role in this learning control. Those properties are also useful for general optimal control. Furthermore, experiments of a robot manipulator demonstrate the effectiveness of the proposed method.  相似文献   

9.
I/O abstraction is offered as a new high-level approach to interprocess communication. Functional components of a distributed system are written as encapsulated modules that act upon local data structures, some of which may be published for external use. Relationships among modules are specified by logical connections among their published data structures. Whenever a module updates published data, I/O takes place implicitly according to the configuration of logical connections. The Programmers' Playground, a software library and runtime system supporting I/O abstraction, is described. Design goals include the separation of communication from computation, dynamic reconfiguration of the communication structure, and the uniform treatment of discrete and continuous data types. Support for end-user configuration of distributed multimedia applications is the motivation for the work  相似文献   

10.
The Journal of Supercomputing - Scientific application codes are often long-running time- and energy-consuming parallel codes, and the tuning of these methods towards the characteristics of a...  相似文献   

11.
12.
Before software systems are shipped, they are tuned to optimize their field performance. This process is called performance tuning. Performance tuning is used to find the best settings for a set of tunable, or changeable, parameters like buffer space, disk file allocation, main memory partition, I/O priority, process scheduling quantum, etc. Examples of performance measures to be optimized are: query or transaction loss, throughput rate, response time, etc. Improperly tuned systems can create field problems even if there are no software faults in the product. Hence, it is important that software systems be tuned for optimal performance before they are delivered. However, optimal performance tuning is quite complex because of: exponentially many alternatives, unknown functional relationships between parameters and performance measures, stochastically fluctuating system performance, and expensive empirical experiments. For these reasons, tuning is typically practiced as an art and depends heavily on the intuitions of experts. In this paper, we examine a method for tuning which is repeatable and produces consistently superior results across many different applications. This method, based upon Robust Experimental Design, has revolutionized design optimization in hardware systems. The methodology consists of conducting a few carefully chosen experiments and using the associated analysis technology to help extract the maximum possible information for performance optimization. Specifically we give some background on statistical experimental design and demonstrate it on an actual software system that provides network database services which had experienced occasional query losses. Focusing on nine carefully chosen parameters, 12 experiments were conducted. This number of experiments is far fewer and consequently far less costly in time and effort than what would be required for collecting the same amount of information by traditional methods. The selection of the experiments took into account ideas from accelerated life testing and ideas from the Robust Experimental Design. Based on the analysis of this data, new settings for the parameters in software system were implemented. All tests done with the new settings have shown that the query loss problem has been totally controlled. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

13.
The abundance of parallel and distributed computing platforms, such as MPP, SMP, and the Beowulf clusters, to name just a few, has added many more possibilities and challenges to high performance computing (HPC), parallel I/O, mass data storage, scalable architectures, and large-scale simulations, which traditionally belong to the realm of custom-tailored parallel systems. The intent of this special issue is to discuss problems and solutions, to identify new issues, and to help shape future research directions in these areas. From these perspectives, this special issue addresses the problems encountered at the hardware, architectural, and application levels, while providing conceptual as well as empirical treatments to the current issues in high performance computing, and the I/O architectures and systems utilized therein.  相似文献   

14.
15.
In this work, we evaluate the benefits of using Grids with multiple batch systems to improve the performance of multi-component and parameter sweep parallel applications by reduction in queue waiting times. Using different job traces of different loads, job distributions and queue waiting times corresponding to three different queuing policies (FCFS, conservative and EASY backfilling), we conducted a large number of experiments using simulators of two important classes of applications. The first simulator models Community Climate System Model (CCSM), a prominent multi-component application and the second simulator models parameter sweep applications. We compare the performance of the applications when executed on multiple batch systems and on a single batch system for different system and application configurations. We show that there are a large number of configurations for which application execution using multiple batch systems can give improved performance over execution on a single system.  相似文献   

16.
Distributing the workload upon all available Processing Units (PUs) of a high-performance heterogeneous platform (e.g., PCs composed by CPU–GPUs) is a challenging task, since the execution cost of a task on distinct PUs is non-deterministic and affected by parameters not known a priori. This paper presents Sm@rtConfig, a context-aware runtime and tuning system based on a compromise between reducing the execution time of engineering applications and the cost of tasks' scheduling on CPU–GPUs' platforms. Using Model-Driven Engineering and Aspect Oriented Software Development, a high-level specification and implementation for Sm@rtConfig has been created, aiming at improving modularization and reuse in different applications. As case study, the simulation subsystem of a CFD application has been developed using the proposed approach. These system's tasks were designed considering only their functional concerns, whereas scheduling and other non-functional concerns are handled by Sm@rtConfig aspects, improving tasks modularity. Although Sm@rtConfig supports multiple PUs, in this case study, these tasks have been scheduled to execute on an platform composed by one CPU and one GPU. Experimental results show an overall performance gain of 21.77% in comparison to the static assignment of all tasks only to the GPU.  相似文献   

17.
This paper presents a concept for automated architecture synthesis for adaptive multiprocessors on chip, in particular for Field-Programmable Gate-Array (FPGA) devices. Given a parallel program, the intent is to simultaneously allocate processor resources and the corresponding communication network, and at the same time, to map the parallel application to get an optimum application-specific architecture. This approach builds up on a previously proposed design platform that automates system integration and FPGA synthesis for such architectures. As a result, the overall concept offers an automated design approach from application mapping to system and FPGA configuration. The automated synthesis is based on combinatorial optimization. Automation is possible because a solvable Integer Linear Programming (ILP) model that captures all necessary design trade-off parameters of such systems has been found. Experimental results to study the feasibility of the automated synthesis indicate that problems with sizes that can be encountered in the embedded domain can be readily solved. Results obtained underscore the need for an automated synthesis for design space exploration.  相似文献   

18.
We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the Lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message Encapsulation format. This format allows for storing large blocks of binary data and corresponding metadata in the same file. Even if designed for LQCD needs, this format might be useful for any application with this type of data profile. The design, implementation and application of Lemon are described. We conclude with presenting the excellent scaling properties of Lemon on state-of-the-art high performance computers.Program summaryProgram title: LemonCatalogue identifier: AELP_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AELP_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU General Public License version 3No. of lines in distributed program, including test data, etc.: 32 860No. of bytes in distributed program, including test data, etc.: 223 762Distribution format: tar.gzProgramming language: MPI and CComputer: Any which supports MPI I/OOperating system: AnyHas the code been vectorised or parallelised?: Yes. Includes MPI directives.RAM: Depending on input usedClassification: 11.5External routines: MPINature of problem: Distributed file I/O with metadataSolution method: MPI parallel I/O based implementation of LIME formatRunning time: Varies depending on file and architecture size, in the order of seconds  相似文献   

19.
A single-board computer system and a selfdesigned interface card are used along with a monitor in ROM to interface a frequency spectrum analyser and a remote computer system using an IEEE-488 interface. Various settings on the analyser can be controlled as well as the collection of data. The monitor also has the ability to act as a terminal on a remote computer system and to transfer spectra between the microsystem and the remote host.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号