期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GridFOR: A Domain Specific Language for Parallel Grid-Based Applications

Ye Wang Zhiyuan Li 《International journal of parallel programming》2016,44(3):427-448

To ease the programming burden and to make parallel programs more maintainable, computational scientists and engineers currently have the options to use software libraries, templates, and general purpose language extensions to compose their application programs. These existing options, unfortunately, have considerable limitations with compatibility, expressive power and delivered performance. To address these issues, we design a domain specific language, GridFOR, for computational problems defined over regular geometric grids. This language allows the programmer to first implement an algorithm on simple data structures, as commonly illustrated in textbooks or papers. The programmer then specifies transformations to extend the algorithm for complex data structures required by the target applications. We build a compiler to automatically translate a GridFOR program to a parallel Fortran version with Message Passing Interface calls. Several optimization techniques are implemented in our compiler to enhance the program speed. 相似文献

2.

Application-dependent dynamic monitoring of distributed andparallel systems

Ogle D.M. Schwan K. Snodgrass R. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(7):762-778

Achieving high performance for parallel or distributed programs often requires substantial amounts of information about the programs themselves, about the systems on which they are executing, and about specific program runs. The monitoring system that collects, analyzes, and makes application-dependent monitoring information available to the programmer and to the executing program is presented. The system may be used for off-line program analysis, for on-line debugging, and for making on-line, dynamic changes to parallel or distributed programs to enhance their performance. The authors use a high-level, uniform data model for the representation of program information and monitoring data. They show how this model may be used for the specification of program views and attributes for monitoring, and demonstrate how such specifications can be translated into efficient, program-specific monitoring code that uses alternative mechanisms for the distributed analysis and collection to be performed for the specified views. The model's utility has been demonstrated on a wide variety of parallel machines 相似文献

3.

Visual programming support for graph‐oriented parallel/distributed processing

Fan Chan Jiannong Cao Alvin T. S. Chan Kang Zhang 《Software》2005,35(15):1409-1439

GOP is a graph‐oriented programming model which aims at providing high‐level abstractions for configuring and programming cooperative parallel processes. With GOP, the programmer can configure the logical structure of a parallel/distributed program by constructing a logical graph to represent the communication and synchronization between the local programs in a distributed processing environment. This paper describes a visual programming environment, called VisualGOP, for the design, coding, and execution of GOP programs. VisualGOP applies visual techniques to provide the programmer with automated and intelligent assistance throughout the program design and construction process. It provides a graphical interface with support for interactive graph drawing and editing, visual programming functions and automation facilities for program mapping and execution. VisualGOP is a generic programming environment independent of programming languages and platforms. GOP programs constructed under VisualGOP can run in heterogeneous parallel/distributed systems. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

4.

An Optimal Algorithm for Processing Distributed Star Queries

《IEEE transactions on pattern analysis and machine intelligence》1985,(10):1097-1107

The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query processing. An execution graph is introduced to represent the semijoin programs associated with the distributed processing of the queries. We then identify optimality properties of semijoin programs for star queries, and use these properties to derive the optimal semijoin program. We have shown that the optimal semijoin program can be found from serial semijoin strategies, defined as serial semijoin programs which include each semijoin associated with the query exactly once. By making certain assumptions on the file sizes and the semijoin selectivities, we can obtain the optimal semijoin program from these strategies in polynomial time. Our assumption on selectivites is consistent in the sense that we consider the selectivity of a semijoin based on the current database state, i.e., we take into consideration the reduction effects of all prior semijoins. 相似文献

5.

A transformation approach to derive efficient parallelimplementations

Rauber T. Runger G. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(4):315-339

The construction of efficient parallel programs usually requires expert knowledge in the application area and a deep insight into the architecture of a specific parallel machine. Often, the resulting performance is not portable, i.e., a program that is efficient on one machine is not necessarily efficient on another machine with a different architecture. Transformation systems provide a more flexible solution. They start with a specification of the application problem and allow the generation of efficient programs for different parallel machines. The programmer has to give an exact specification of the algorithm expressing the inherent degree of parallelism and is released from the low-level details of the architecture. We propose such a transformation system with an emphasis on the exploitation of the data parallelism combined with a hierarchically organized structure of task parallelism. Starting with a specification of the maximum degree of task and data parallelism, the transformations generate a specification of a parallel program for a specific parallel machine. The transformations are based on a cost model and are applied in a predefined order, fixing the most important design decisions like the scheduling of independent multitask activations, data distributions, pipelining of tasks, and assignment of processors to task activations. We demonstrate the usefulness of the approach with examples from scientific computing 相似文献

6.

监督学习模型指导的函数级编译优化参数选择方法研究

刘慧赵荣彩王琦《计算机工程与科学》2018,40(6):957-968

基于机器学习的迭代编译方法可以在对新程序进行迭代编译时,有效预测新程序的最佳优化参数组合。现有方法在模型训练过程中存在优化参数组合搜索效率较低、程序特征表示不恰当、预测精度不高的问题。因此,基于机器学习的迭代编译方法是当前迭代编译领域内的一个研究热点,其研究挑战在于学习算法选择、优化参数搜索以及程序特征表示等问题。基于监督学习技术,提出了一种程序优化参数预测方法。该方法首先通过约束多目标粒子群算法对优化参数空间进行搜索,找到样本函数的最佳优化参数;然后,通过动静结合的程序特征表示技术,对函数特征进行抽取;最后,通过由函数特征和优化参数形成的样本构建监督学习模型,对新程序的优化参数进行预测。分别采用k近邻法和softmax回归建立统计模型,实验结果表明,新方法在NPB测试集和大型科学计算程序上实现了较好的预测性能。相似文献

7.

Initial Results for Glacial Variable Analysis

Tito Autrey Michael Wolfe 《International journal of parallel programming》1998,26(1):43-64

Runtime code generation that uses the values of one or more variables to generate specialized code is called value-specific optimization. Typically, value-specific optimization focuses on variables that are modified much less frequently than they are referenced; we call these glacial variables. In current systems that use runtime code generation, glacial variables are identified by programmer directives. Next, we describe glacial variable analysis, the first data-flow analysis for automatically identifying glacial variables. We introduce the term staging analysis to describe analyses that divide a program into stages or use the stage structure of a program. Glacial variable analysis is an interprocedural staging analysis that identifies the relative modification and reference frequencies for each variable and expression. Later, several experiments are given to characterize a set of benchmark programs with respect to their stage structure, and we show how often value-specific optimization might be applied. Finally, we explain how staging analysis relates to runtime code generation, briefly describe glacial variable analysis and present some initial results. 相似文献

8.

基于迭代序的流程序局部性分析和优化

唐滔杨学军林一松《计算机研究与发展》2012,49(6):1363-1375

流编程模型是一种近年来被广泛研究的并行编程模型,它在基于软件管理的流式存储器,如流寄存器文件的流体系结构上得到了良好的应用.但同时也有研究指出流编程模型同样适合于基于硬件管理的一致性cache的体系结构.流编程模型目前最重要的应用背景GPGPU在发展中也逐渐引入通用的数据cache,因此发掘流程序的cache局部性就成为在这类体系结构上提高流程序性能的关键.由于流程序特殊的执行模型,其重用向局部性转化的过程与传统的串行程序不一致,无法直接使用传统的局部性分析方法直接对流程序进行分析.在深入分析了重用向局部性转化过程的基础上,提出了"迭代序"的概念用于描述流和串行程序重用向局部性转化时的不同,同时结合流程序的执行特点面向并行扩展了传统的局部性分析理论,给出了基于迭代序的局部性分析方法.此外,结合局部性分析模型还提出了两种流程序的cache局部性优化方法.在GPGPUSim模拟平台上进行的验证结果表明对流程序局部性的定量分析是有效的,并且提出的优化方法也可以有效改善流程序的cache局部性,提高流程序的性能. 相似文献

9.

一个可预测并行程序效率的评价模型 总被引：2，自引：0，他引：2

陈昌生孙永强何积丰《软件学报》2000,11(11):1485-1491

并行程序的性能分析,特别是效率分析往往需要程序在实际运行后,根据实验结果再对并行算法进行优化,或改变数据的分配策略, 甚至重新选择并行算法.结合通用并行计算模型BSP(bulk-synchronous parallel),提出一种有效的并行程序效率评测模型,使得程序员在设计、分析阶段即可对程序效率进行分析和评估,并据此进一步优化程序.实验结果表明,该模型的预测是精确的. 相似文献

10.

Integrating program analyses with programmer productivity tools

Daniel von Dincklage Amer Diwan 《Software》2011,41(7):817-840

Because software continues to grow in size and complexity, programmers increasingly rely on productivity tools to understand, debug, and modify their programs. These tools typically use program analyses to produce information for the programmer. This is problematic because it is based on the assumption that the programmer and program analyses all use the same vocabulary. If the programmer and analyses did not use the same vocabulary then the results of the analyses will be meaningless to the programmer. For example, ‘v124 may be NULL’ does not mean much to the programmer but ‘myStack may be NULL’ is meaningful. Often, the programmer and analyses prefer different vocabularies. While the programmer prefers his programs' source code, an analysis will prefer a simplified representation. Unfortunately, writing an analysis that works on the source code is difficult because the analysis must deal with the idiosyncracies of the source language (e.g. nested classes). In comparison, writing an analysis on SSA form is easy but the output of the analysis is not meaningful to the programmer; it must somehow be translated into something the programmer understands. We present a system, RTalk, that makes it easy to support both the programmers' and the analysis' needs. RTalk generates a translator between the programmers' and the analysis' vocabulary. Thus both the programmer and the analysis can use the vocabulary most natural to them. We demonstrate the effectiveness of RTalk by describing program understanding and program optimization tools that we have already built using RTalk. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

11.

DIALOG: A scheme for the quick and effective production of interactive applications software

B. Negus M. J. Hunt J. A. Prentice 《Software》1981,11(3):205-224

DIALOG is a collection of routines, including a main ‘driver’ program, which is used by an applications programmer as the user interface to interactive applications programs. The routines handle command analysis, data input and editing, as well as processing standard commands such as HELP. DIALOG offers, with no extra effort from the applications programmer, not only a simple interface for first-time users which gives complete instruction in using the program, but also a ‘command driven’ interface for more experienced users. DIALOG permits the quick and effective production of interactive applications software by programmers with no previous experience of writing such programs. User reaction to the programs so far produced and offered as part of a university computing service has been extremely favourable. 相似文献

12.

A parallel programming environment supporting multiple data-parallel modules

Bradley K. Seevers Michael J. Quinn Philip J. Hatcher 《International journal of parallel programming》1992,21(5):363-386

We describe a system that allows programmers to take advantage of both control and data parallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream I/O to include intermodule communication channels. The programmer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on to the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer. 相似文献

13.

Tool support for planning the restructuring of data abstractions inlarge systems

Griswold W.G. Chen M.I. Bowdidge R.W. Cabaniss J.L. Nguyen V.B. Morgenthaler J.D. 《IEEE transactions on pattern analysis and machine intelligence》1998,24(7):534-558

Restructuring software to improve its design can lower software maintenance costs. One problem encountered during restructuring is formulating the new design. A meaning-preserving program restructuring tool with a star diagram manipulable visualization can help a programmer redesign a program based on abstract data types. However, the transformational support required for meaning-preserving restructuring is costly to provide. Also, programmers encounter comprehension and recall difficulties in complex restructuring tasks. Consequently, transformations were replaced with visual and organizational aids that help a programmer to plan and carry out a complex restructuring. For example, a star diagram manipulation called trimming was added, which mimics the way that basic restructuring transformations affect the star diagram display, allowing a programmer to plan a restructuring without depending upon restructuring transformations. With the ability to annotate trimmed star diagram components, plans can be recorded and later recalled. Programmer-controlled elision was added to help remove clutter from star diagram views. We implemented a star diagram planning tool for C programs, measured its elision capabilities, and performed a programmer study. We found that elision is effective in controlling star diagram size, and the study revealed that each programming team successfully planned its restructuring in rather different, unanticipated ways. These experiments resulted in important improvements in the tool's software design and user interface 相似文献

14.

A cloud-based intelligent TV program recommendation system

Jui-Hung Chang Chin-Feng Lai Ming-Shi Wang Tin-Yu Wu 《Computers & Electrical Engineering》2013

In recent years, cloud computing technology has matured significantly, as has the development of digital TV services. This, therefore, has led to an increased demand for improved quality TV services. In this paper, cloud computing technology is used to build a program recommendation system for digital TV programs, and the Hadoop Fair Scheduler is utilized to improve processing performance. Historical data of watched TV programs are collected through an electronic program guide, and then processed using K-means clustering, term frequency/inverse document frequency and k-nearest neighbor algorithms, to obtain clusters of audience groups and to find popular TV programs for each cluster. The proposed system can process massive amounts of user data in real-time, and can easily be scaled up. 相似文献

15.

StreamTMC: Stream compilation for tiled multi-core architectures

Haitao Wei Mingkang Qin Weiwei Zhang Junqing Yu Dongrui Fan Guang R. Gao 《Journal of Parallel and Distributed Computing》2013

Tiled multi-core architectures have become an important kind of multi-core design for its good scalability and low power consumption. Stream programming has been productively applied to a number of important application domains. It provides an attractive way to exploit the parallelism. However, the architecture characteristics of large amounts of cores, memory hierarchy and exposed communication between tiles have presented a performance challenge for stream programs running on tiled multi-cores. In this paper, we present StreamTMC, an efficient stream compilation framework that optimizes the execution of stream applications for the tiled multi-core. This framework is composed of three optimization phases. First, a software pipelining schedule is constructed to exploit the parallelism. Second, an efficient hybrid of SPM and cache buffer allocation algorithm and data copy elimination mechanism is proposed to improve the efficiency of the data access. Last, a communication aware mapping is proposed to reduce the network communication and synchronization overhead. We implement the StreamTMC compiler on Godson-T, a 64-core tiled architecture and conduct an experimental study to verify the effectiveness. The experimental results indicate that StreamTMC can achieve an average of 58% improvement over the performance before optimization. 相似文献

16.

Criticality: static profiling for real-time programs

Florian Brandner Stefan Hepp Alexander Jordan 《Real-Time Systems》2014,50(3):377-410

With the increasing performance demand in real-time systems it becomes more and more important to provide feedback to programmers and software development tools on the performance-relevant code parts of a real-time program. So far, this information was limited to an estimation of the worst-case execution time (WCET) and its associated worst-case execution path (WCEP) only. However, both, the WCET and the WCEP, only provide partial information. Only code parts that are on one of the WCEPs are indicated to the programmer. No information is provided for all other code parts. To give a comprehensive view covering the entire code base, tools in the spirit of program profiling are required. This work proposes an efficient approach to compute worst-case timing information for all code parts of a program using a complementary metric, called criticality. Every statement of a program is assigned a criticality value, expressing how critical the code is with respect to the global WCET. This gives valuable information how close the worst execution path passing through a specific program part is to the global WCEP. We formally define the criticality metric and investigate some of its properties with respect to dominance in control-flow graphs. Exploiting some of those properties, we propose an algorithm that reduces the overhead of computing the metric to cover complete programs. We also investigate ways to efficiently find only those code parts whose criticality is above a given threshold. Experiments using well-established real-time benchmark programs show an interesting distribution of the criticality values, revealing considerable amounts of highly critical as well as uncritical code. The metric thus provides ideal information to programmers and software development tools to optimize the worst-case execution time of these programs. 相似文献

17.

高清H.264变换编码的流式实现

苏华友伍楠文梅任巨吴伟张春元《计算机工程与科学》2011,33(8):148

H.264作为新一代视频编码标准,具有很好的性能,但计算复杂度比较高。Storm处理器是一款面向媒体应用和信号处理的高效能流处理器,在媒体处理方面具有很好的应用前景。针对H.264对计算性能的要求,本文给出了高清H.264(1080P)变换编码在Storm-SP16 G160流处理器上的流式实现。本文根据不同算法的数据流特征,结合具体的流化过程详细介绍了并行粒度选择以及数据流组织、规范化处理等流化技术。实验结果表明:编码的流式实现具有很好的性能,按照此编码效率加速整个程序可满足实时要求。提供了一种不同于硬件加速的程序加速方法,对其他媒体应用在流处理器上的映射具有很大的借鉴意义。相似文献

18.

非干涉测试中的数据流处理算法*

王莉莉金惠华张炯尚利宏《计算机应用研究》2007,24(1):127-130

嵌入式软件非干涉测试(NIT)方法[1]是一种不在被测软件中插桩的白盒测试方法,NIT以采集被测软件运行时处理器总线数据得到的数据流为依据进行分析,实现对被测软件的测试与评估[1]。NIT的关键问题在于如何实时分析处理器总线数据流,获得其实际执行的指令序列。为此提出了一种通用的实时数据流分析算法——滑动窗口分析算法,并对该算法的正确性、复杂度和工程实现进行讨论。相似文献

19.

TreadMarks: shared memory computing on networks of workstations 总被引：2，自引：0，他引：2

Amza C. Cox A.L. Dwarkadas S. Keleher P. Honghui Lu Rajamony R. Weimin Yu Zwaenepoel W. 《Computer》1996,29(2):18-28

Shared memory facilitates the transition from sequential to parallel processing. Since most data structures can be retained, simply adding synchronization achieves correct, efficient programs for many applications. We discuss our experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system. DSM allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory. We illustrate a DSM system consisting of N networked workstations, each with its own memory. The DSM software provides the abstraction of a globally shared memory, in which each processor can access any data item without the programmer having to worry about where the data is or how to obtain its value 相似文献

20.

GIOP的CDR编码特性的研究及其应用

薛斐杜庆伟《计算机应用》2003,23(12):48-51

首先用数学语言描述CDR的边界对齐原则，对若干开源Java ORB系统的TypeCode编码问题进行了论述。重点提出并证明了CDR编码的字节流在复制过程中表现的两个数学特征，利用此特征提出复制CDR编码字节流的高效改进方法。对改进前后以及不使用边界对齐时的性能进行测试，说明改进方案取得了较好效果。相似文献