期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曾丽芳郭克榕《计算机工程与应用》1999,35(4):14-17

并行编程一般分为数据并行和消息传递两种模式。比较而言,消息传递的应用更为广泛。面向消息传递FORTRAN（MPF）的自动并行工具能很大程度上缓减用户编程的压力,并具有很好的实用价值。迭代划分和局部性分析是自动并行中的重要部分。本文介绍从串行FORTRAN程序自动转换成MPF的自动并行工具FAX中的迭代划分、数组访问局部性分析及通信优化分析。相似文献

2.

Faust: an integrated environment for parallel programming

Guarna V.A. Jr. Gannon D. Jablonowski D. Malony A.D. Gaur Y. 《Software, IEEE》1989,6(4):20-27

相似文献

3.

FP──VLSI自动综合系统

下载免费PDF全文

孙永强胡振江袁昕《软件学报》1994,5(1):26-32

ＦＰ—ＶＬＳＩ自动综合系统是一个集成化的ＶＬＳＩ自动设计工具，它能完成从并行算法到脉动算法到脉动结构再到逻辑结构最后到ＣＭＯＳ版图的自动综合过程．ＦＰ—ＶＬＳＩ系统以脉动阵列为ＶＬＳＩ的体系结构，采用具有良好代数性质的ＦＰ／Ｂ语言作为各层次的描述语言，通过程序变换进行综合和优化．该系统支持形式化的ＶＬＳＩ设计方法，能保证设计结果的正确性．相似文献

4.

PPCDS集成开发环境的设计与实现

史涛陆林生饶若楠蔡涛《计算机工程与应用》2005,41(5):106-109

PPCDS(并行程序概念设计系统)是一个将数据并行高层建模语言、并行识别方法、并行程序自动构造和人机交互界面技术集成在一起的并行程序设计环境,能简化并行程序设计,有效缩短并行程序开发周期,提高并行计算效率。PPCDS集成开发环境是PPCDS的重要组成部分,文中就PPCDS集成开发环境的设计和实现进行了简单介绍。相似文献

5.

Program Reusability through Program Transformation

Boyle James M. Muralidharan Monagur N. 《IEEE transactions on pattern analysis and machine intelligence》1984,(5):574-588

How can a program written in pure applicative LISP be reused in a Fortran environment? One answer is by automatically transforming it from LISP into Fortran. In this paper we discuss a practical application of this technique-one that yields an efficient Fortran program. We view this process as an example of abstract programming, in which the LISP program constitutes an abstract specification for the Fortran version. The idea of strategy-a strategy for getting from LISP to Fortran-is basic to designing and applying the transformations. One strategic insight is that the task is easier if the LISP program is converted to ``recursive' Fortran, and then the recursive Fortran program is converted to nonrecursive standard Fortran. Another strategic insight is that much of the task can be accomplished by converting the program from one canonical form to another. Developing a strategy also involves making various implementation decisions. One advantage of program transformation methodology is that it exposes such decisions for examination and review. Another is that it enables optimizations to be detected and implemented easily. Once a strategy has been discovered, it can be implemented by means of rewrite-rule transformations using the TAMPR program transformation system. The transformational approach to program reuse based on this strategy has a measure of elegance. It is also practical-the resulting Fortran program is 25 percent faster than its compiled LISP counterpart, even without extensive optimization. 相似文献

6.

A flexible and easy to use molecular biology workbench efficiently developed in Tcl/Tk

Hans‐Peter Pohle Bernd Drescher 《Software》2000,30(12):1433-1445

We describe the design and implementation of a workbench for molecular biology that allows the easy integration of analysis tools. The software is implemented in Tcl/Tk using the [incr Tcl] extension that provides object‐oriented programming. The program is called tkGDE and consists of four main parts. The sequence editor allows the user to perform basic editing operations on biomolecule sequences. The graphical annotation editor gives the user a graphical overview of all annotated features of a sequence. The output manager retains information on the results produced by the analysis tools. The bundle control allows several tools to run automatically, passing data from one tool to the next. Tools are integrated into the system by describing their properties in a configuration file, which drastically reduces the time needed for integration. We present results proving that Tcl/Tk has been misjudged to be slow and unsuited for large projects. To achieve sufficient performance we exploited special features of Tcl/Tk, namely idle tasks and the capabilities built into the Tk canvas widget. The system consists of more than 34000 lines of [incr Tcl] code in 182 classes. The whole development process took about one person‐year. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

7.

可视化并行编程环境P-GRADE浅析

王恩柱刘晓平《电脑应用技术》2005,(64):31-38

从科学及工程应用的计算模拟，到商业应用的数据挖掘及事务处理等许多领域，并行计算已产生了巨大的影响。但当前并行软件滞后于并行算法和高性能计算机本身的发展，这对并行计算的推广和普及产生一定的影响。本文介绍一种方便易用的可视化并行程序集成开发环境P-GRADE，并详细分析了P-GRADE中的并行程序设计部分的支撑工具GRAPNEL语言和图形编辑器GRED。相似文献

8.

可视化并行程序设计平台的研究与实现 总被引：4，自引：0，他引：4

胡志刚邹恒华钟掘《计算机工程》2001,27(12):9-11

从改善用户并行程序设计环境出发,研制了一个基于网络的可视化并行程序设计平台。该平台用一个图形表示一个并行程序,图形中的结点表示任务,弧表示任务间的数据依赖关系。用户只须将并行问题可视化地以图形方式描述出来,任务的调度、任务间通信都由系统自动完成,因而大大地方便了用户进行并行程序设计。相似文献

9.

The distributed programming language SR—Mechanisms,design and implementation

Gregory R. Andrews 《Software》1982,12(8):719-753

SR is a new language for programming software containing many processes that execute in parallel. The language allows an entire software system that controls a potentially large collection of processors to be programmed as an integrated set of software modules. The key language mechanisms are resources, operations and input statements. The language supports separate compilation, type abstraction, and dynamic communication links; it also contains novel treatments of arrays and procedures. This paper gives an overview of the language mechanisms, discusses some of the major design decisions and describes one implementation. 相似文献

10.

An integrated CAD environment for low-power design

Landman P. Mehra R. Rabaey J.M. 《Design & Test of Computers, IEEE》1996,13(2):72-82

This CAD environment supports a high-level approach to power reduction, emphasizing optimizations at the algorithm and architecture levels of abstraction. An integrated set of analysis and optimization tools spans the design hierarchy, allowing the designer to make a systematic, top-down exploration and refinement of solutions in the area-time-power design space. In a case study-a low-power implementation of a digital bandpass filter-the CAD environment and tools yield more than an order of magnitude savings in power 相似文献

11.

Finite state machine-based optimization of data parallel regular domain problems applied in low-level image processing

Seinstra F.J. Koelma D. Bagdanov A.D. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(10):865-877

A popular approach to providing nonexperts in parallel computing with an easy-to-use programming model is to design a software library consisting of a set of preparallelized routines, and hide the intricacies of parallelization behind the library's API. However, for regular domain problems (such as simple matrix manipulations or low-level image processing applications-in which all elements in a regular subset of a dense data field are accessed in turn) speedup obtained with many such library-based parallelization tools is often suboptimal. This is because interoperation optimization (or: time-optimization of communication steps across library calls) is generally not incorporated in the library implementations. We present a simple, efficient, finite state machine-based approach for communication minimization of library-based data parallel regular domain problems. In the approach, referred to as lazy parallelization, a sequential program is parallelized automatically at runtime by inserting communication primitives and memory management operations whenever necessary. Apart from being simple and cheap, lazy parallelization guarantees to generate legal, correct, and efficient parallel programs at all times. The effectiveness of the approach is demonstrated by analyzing the performance characteristics of two typical regular domain problems obtained from the field of low-level image processing. Experimental results show significant performance improvements over nonoptimized parallel applications. Moreover, obtained communication behavior is found to be optimal with respect to the abstraction level of message passing programs. 相似文献

12.

Visualization of Message Passing Parallel Programs with the TOPSYS Parallel Programming Environment

《Journal of Parallel and Distributed Computing》1993,18(2):118-128

Parallel programming is orders of magnitudes more complex than writing sequential programs. This is particularly true for programming distributed memory multiprocessor architectures based on message passing programming models. Apart from understanding the sequential parts of the parallel program, new degrees of freedom lead to additional problems. Understanding the synchronization and communication behavior of parallel programs is the most critical issue in programming distributed memory multiprocessors. The paper describes methods and tools for visualization and animation of the dynamic execution of parallel programs. Based on an evaluation and classification of existing visualization environments, the visualization and animation tool VISTOP (VISualization TOol for Parallel Systems) is presented as part of the integrated tool environment TOPSY S (TOols for Parallel SYStems) for programming distributed memory multiprocessors. VISTOP supports the interactive on-line visualization of message passing programs based on various views; in particular, a process graph based concurrency view for detecting synchronization and communication bugs. 相似文献

13.

A compiler optimization algorithm for shared-memory multiprocessors

McKinley K.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):769-787

This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines 相似文献

14.

Embedded Parallelization Approach for Optimization in Aerodynamic Design

S. Peigin B. Epstein 《The Journal of supercomputing》2004,29(3):243-263

A new efficient parallelization strategy for optimization of aerodynamic shapes is proposed. The optimization method employs a full Navier-Stokes solver for accurate estimation of the objective function. As such it requires huge computational resources which makes efficient parallelization crucial for successful promotion of the method to an engineering environment. The algorithm is based on a multilevel embedded parallelization approach, which includes (1) parallelization of the multiblock full Navier-Stokes solver with parallel CFD evaluation of objective function, (2) parallelization of optimization process with parallel optimal search on multiple search domains and, finally, (3) parallel grid generation. Applications (implemented on a 144-processors distributed memory cluster) include various transonic profile optimizations in the presence of nonlinear constraints. The results demonstrate that the approach combines high accuracy of optimization with high parallel efficiency. The proposed multilevel parallelization which efficiently makes use of computational power supplied by multiprocessor systems, leads to a significant computational time-saving and allows application of the method to practical aerodynamic design in the aircraft industry. 相似文献

15.

Integration of topology and shape optimization for design of structural components 总被引：1，自引：1，他引：0

Poh-Soong Tang Kuang-Hua Chang 《Structural and Multidisciplinary Optimization》2001,22(1):65-82

This paper presents an integrated approach that supports the topology optimization and CAD-based shape optimization. The main contribution of the paper is using the geometric reconstruction technique that is mathematically sound and error bounded for creating solid models of the topologically optimized structures with smooth geometric boundary. This geometric reconstruction method extends the integration to 3-D applications. In addition, commercial Computer-Aided Design (CAD), finite element analysis (FEA), optimization, and application software tools are incorporated to support the integrated optimization process. The integration is carried out by first converting the geometry of the topologically optimized structure into smooth and parametric B-spline curves and surfaces. The B-spline curves and surfaces are then imported into a parametric CAD environment to build solid models of the structure. The control point movements of the B-spline curves or surfaces are defined as design variables for shape optimization, in which CAD-based design velocity field computations, design sensitivity analysis (DSA), and nonlinear programming are performed. Both 2-D plane stress and 3-D solid examples are presented to demonstrate the proposed approach. Received January 27, 2000 Communicated by J. Sobieski 相似文献

16.

带类型恢复的编译器源源翻译技术

米伟李玉祥陈莉冯晓兵张兆庆《计算机研究与发展》2010,47(7)

源源翻译是使编译器的分析和优化可重定向的一种重要方式.它被广泛用来支持并行语言扩展或者各种体系结构无关的优化,并且可以帮助程序员进行正确性或者性能的调试.在多核/众核时代,程序分析和优化倾向于让用户更多地参与,这种平台无关而且用户友好的代码生成方式也越来越受到欢迎.在简单的编译器中添加源源翻译的支持很容易,但在实现了复杂的程序分析和激进的优化的编译器中,却很少有编译器提供健壮的源源翻译支持.优化对程序结构的改变是造成翻译困难的首要原因.结合大量出错实例对优化给源源翻译带来的困难进行分析,提出了一套基于类型恢复的翻译技术,并在Open64编译器中实现了这种方法.通过supertest和spec2000测试集的测试,验证了这种方法对源源翻译的健壮性有很大改善.该方法的实现模块集成在源源翻译器内,与编译器各种分析优化模块独立,所以该方法容易移植到其他编译器中. 相似文献

17.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Grigori Fursin Yuriy Kashnikov Abdul Wahid Memon Zbigniew Chamski Olivier Temam Mircea Namolaru Elad Yom-Tov Bilha Mendelson Ayal Zaks Eric Courtois Francois Bodin Phil Barnard Elton Ashton Edwin Bonilla John Thomson Christopher K. I. Williams Michael O��Boyle 《International journal of parallel programming》2011,39(3):296-327

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor. 相似文献

18.

The M³ multiprocessor laboratory

Burkhart H. Eigenmann R. Kindlimann H. Moser M. Scholian H. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):507-519

An integrated programming environment for the M³ multiprocessor is discussed. Three tools support the software development cycle of a parallel program, including the programming, configuration, and debugging/performance measurement phases. Programmer support for performance analysis has been a primary motivation for the system. The sources of performance loss are identified and the ways in which this information is gathered and analyzed are described. As a case study, a fast maze router algorithm is used to show the functionality of the different tools. The M³ environment is compared with other state-of-the-art projects 相似文献

19.

The implementation of the ecosystem module of a coastal environmental model: Port Phillip Bay,Australia

《Environmental Modelling & Software》2000,15(4):357-372

When implemented as a computer program, an ecosystem model is only a part of a larger programming environment. This programming environment includes other programs, non-model program components, program design rules, data files, and associated analysis analytical tools. These components should be divided to allow programmers to focus on their areas of expertise, but must then be rejoined in such a way as to minimise debugging and execution overheads. We describe this larger programming environment as it surrounds a model of the ecosystem of Port Phillip Bay, Australia.The ecosystem model requires a transport model to allow spatial modelling; this transport model uses currents from a computationally intensive hydrodynamic model. Implementation of the ecosystem model also requires non-model code, such as routines to initialise parameters or the integration method. Their design determines program reliability and performance. A modular structure allows different parts of the model to be independently modified; this makes for efficient programming. We describe formal design rules used to enhance readability and information content of the model's parameter names. To execute, the model must access data files and a record of the run must be kept — a Unix shell program serves both these functions. The data files may require software tools for generation or manipulation. Output from the model also requires post-processing for visualisation and analysis. The model is thus only a part of a network of software, whose development must be coordinated to ensure reliability and efficiency. 相似文献

20.

A technique to automatically determine Ad-hoc communication patterns at runtime

《Parallel Computing》2017

Current High Performance Computing (HPC) systems are typically built as interconnected clusters of shared-memory multicore computers. Several techniques to automatically generate parallel programs from high-level parallel languages or sequential codes have been proposed. To properly exploit the scalability of HPC clusters, these techniques should take into account the combination of data communication across distributed memory, and the exploitation of shared-memory models.In this paper, we present a new communication calculation technique to be applied across different SPMD (Single Program Multiple Data) code blocks, containing several uniform data access expressions. We have implemented this technique in Trasgo, a programming model and compilation framework that transforms parallel programs from a high-level parallel specification that deals with parallelism in a unified, abstract, and portable way.The proposed technique computes at runtime exact coarse-grained communications for distributed message-passing processes. Applying this technique at runtime has the advantage of being independent of compile-time decisions, such as the tile size chosen for each process. Our approach allows the automatic generation of pre-compiled multi-level parallel routines, libraries, or programs that can adapt their communication, synchronization, and optimization structures to the target system, even when computing nodes have different capabilities.Our experimental results show that, despite our runtime calculation, our approach can automatically produce efficient programs compared with MPI reference codes, and with codes generated with auto-parallelizing compilers. 相似文献