期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

高红玉赵学锋王占华《计算机工程与应用》2012,48(13):89-93

连通支配集（CDS）在无线网络设计中有着广泛应用,现有多数连通支配集算法每次处理一个节点。提出了一个同时处理多个节点的贪心算法（GCDS）,依次选取最小度数节点以及该节点两跳内的一至两个节点为处理节点,当删除处理节点后剩余点不连通时减少处理的节点数,进而把节点分为支配点和受支配点;最终所有支配点构成一个近似最小连通支配集。在模拟无线传感器网络的单位圆盘图上的仿真结果表明,GCDS算法具有较低的时间复杂度,所得到的连通支配集大小优于已有算法。相似文献

2.

Improving performance through deep value profiling and specialization with code transformation

Minhaj Ahmad Khan 《Computer Languages, Systems and Structures》2011,37(4):193-203

Specialization of code is used to improve the performance of the applications. However, specialization based on ineffective profiles deteriorates the performance. Existing value profiling algorithms are not yet able to address the issue of code size explosion incurred due to specialization of code. This problem can be mitigated by capturing data through profiling that would be useful for specialization of code with minimum code size.In this article, we present an approach to optimize code through value profiling and specialization with code transformation. The values of the parameters selected through an analysis of code are captured in the intervals which are automatically adapted to dynamic behavior of the application. The code is then specialized based on value profiles. The specialized code contains optimizations and may be converted back to the generalized code through a transformation. This approach facilitates the code to obtain optimizations through specialization with minimum size, and no runtime overhead.Using this approach, the experiments performed on Itanium-II (IA-64) architecture with icc compiler v 9.0 show a significant improvement in the performance of the SPEC 2000 benchmarks. 相似文献

3.

Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

Meikang Qiu Edwin H.-M. Sha Meilin Liu Man Lin Shaoxiong Hua Laurence T. Yang 《Journal of Parallel and Distributed Computing》2008

Energy saving is becoming one of the major design issues in processor architectures with multiple functional units (FUs). Nested loops are usually the most critical part in multimedia and high-performance DSP systems. There is a tradeoff between power saving and performance, such as timing constraint and code size requirement, of nested loops. This paper studies how to minimize the total energy while satisfying performance requirement for applications with multidimensional nested loops. An algorithm, energy minimization with loop fusion and FU schedule (EMLFS), is proposed. We first use retiming and partition to fuse nested loops. Then we use novel FU scheduling algorithms to maximize energy saving without sacrificing performance. The experimental results show that the average improvement on energy saving is significant by using our EMLFS algorithm. 相似文献

4.

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

de Melo Menezes Breno A. Herrmann Nina Kuchen Herbert Buarque de Lima Neto Fernando 《International journal of parallel programming》2021,49(6):776-801

Parallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

相似文献

5.

Optimal Orthogonal Tiling of 2-D Iterations

Rumen Andonov Sanjay Rajopadhye 《Journal of Parallel and Distributed Computing》1997,45(2):1324

Iteration space tiling is a common strategy used by parallelizing compilers and in performance tuning of parallel codes. We address the problem of determining the tile size that minimizes the total execution time. We restrict our attention to uniform dependency computations with two-dimensional, parallelogram-shaped iteration domain which can be tiled with lines parallel to the domain boundaries. The target architecture is a linear array (or a ring). Our model is developed in two steps. We first abstract each tile by two simple parameters, namely tile periodP_tand intertile latencyL_t. We formulate and partially resolve the corresponding optimization problem independent of the machine and program. Next, we refine the model with realistic machine and program parameters, yielding a discrete nonlinear optimization problem. We solve this analytically, yielding a closed form solution, which can be used by a compiler before code generation. 相似文献

6.

Refactoring techniques for aggressive object inlining in Java applications

Yosi Ben?Asher Tomer Gal Gadi Haber Marcel Zalmanovici 《Automated Software Engineering》2012,19(1):97-136

Object Inlining (OI) is a known optimization in object oriented programming in which referenced objects of class B are inlined into their referencing objects of class A by making all fields and methods of class B part of class A. The optimization saves all the new operations of B type objects from class A and at the same time replaces all indirect accesses, from A to fields of B, by direct accesses. To the best of our knowledge, in-spite of the significant performance potential of the OI optimization, reported performance measurements were relatively moderate. This is because an aggressive OI optimization requires complex analysis and code transformations to overcome problems like multiple references to the inlinable object, object references that escape their object scope, etc. 相似文献

7.

Simulation tools to study a distributed shared memory for clusters of symmetric multiprocessors

《Future Generation Computer Systems》2006,22(1-2):57-66

Distributed shared memory (DSM) systems have become popular as a means of utilizing clusters of computers for solving large applications. We have developed a high-performance DSM, Strings. In addition, to improve the performance of our DSM, a memory hierarchy simulator has been developed that allows us to compare various techniques very quickly and with much less effort. This paper describes our simulator, DSMSim. We show that the simulator's performance closely matches the real system and demonstrate potential performance gains of up to 60% after adding optimization features to the simulator. The simulator also accepts the same code as the software distributed shared memory. 相似文献

8.

Translation and Run-Time Validation of Loop Transformations

Lenore?Zuck Email author Amir?Pnueli Benjamin?Goldberg Clark?Barrett Yi?Fang Ying?Hu 《Formal Methods in System Design》2005,27(3):335-360

This paper presents new approaches to the validation of loop optimizations that compilers use to obtain the highest performance from modern architectures. Rather than verify the compiler, the approach of translation validationperforms a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code. As part of an active and ongoing research project on translation validation, we have previously described approaches for validating optimizations that preserve the loop structure of the code and have presented a simulation-based general technique for validating such optimizations. In this paper, for more aggressive optimizations that alter the loop structure of the code—such as distribution, fusion, tiling, and interchange—we present a set of permutation ruleswhich establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the extensions to our tool voc-64 which are required to validate these structure-modifying optimizations. This paper also discusses preliminary work on run-time validation of speculative loop optimizations. This involves using run-time tests to ensure the correctness of loop optimizations whose correctness cannot be guaranteed at compile time. Unlike compiler validation, run-time validation must not only determine when an optimization has generated incorrect code, but also recover from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange and loop tiling, and appears to be quite promising. This research was supported in part by NSF grant CCR-0098299, ONR grant N00014-99-1-0131, and the John von Neumann Minerva Center for Verification of Reactive Systems. 相似文献

9.

Translation and Run-Time Validation of Optimized Code

Lenore Zuck Amir Pnueli Yi Fang Benjamin Goldberg Ying Hu 《Electronic Notes in Theoretical Computer Science》2002,70(4):179-200

The paper presents approaches to the validation of optimizing compilers. The emphasis is on aggressive and architecture-targeted optimizations which try to obtain the highest performance from modern architectures, in particular EPIC-like micro-processors. Rather than verify the compiler, the approach of translation validation performs a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code.First we survey the standard approach to validation of optimizations which preserve the loop structure of the code (though they may move code in and out of loops and radically modify individual statements), present a simulation-based general technique for validating such optimizations, and describe a tool, VOC-64, which implements these technique. For more aggressive optimizations which, typically, alter the loop structure of the code, such as loop distribution and fusion, loop tiling, and loop interchanges, we present a set of permutation rules which establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the necessary extensions to the VOC-64 in order to validate these structure-modifying optimizations.Finally, the paper discusses preliminary work on run-time validation of speculative loop optimizations, that involves using run-time tests to ensure the correctness of loop optimizations which neither the compiler nor compiler-validation techniques can guarantee the correctness of. Unlike compiler validation, run-time validation has not only the task of determining when an optimization has generated incorrect code, but also has the task of recovering from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange, loop tiling, and software pipelining and appears to be quite promising. 相似文献

10.

基于网络编码的传感网重编程协议能耗模型

张艳张羽李君伟夏先进李士宁《软件学报》2013,24(S1):125-133

在无线传感器网络中,当需要进行软件更新、修复软件Bug 时,重编程协议能够将新程序镜像分发到多跳网络中的全部节点.近来,基于网络编码的重编程协议被用于解决在高损环境下的有效代码分发问题,但对这些协议性能的分析仍有待深入.提出一种基于时间和网络拓扑的能耗分析模型,在模型中综合考虑了页面流水和节点通信距离对协议性能的影响.该模型分析结果与基于网络编码的重编程协议Rateless Deluge 的仿真结果达到了较好的一致性(平均单节点能耗相对误差在0.23%左右),验证了能耗分析模型的有效性.分析结果揭示了基于网络编码重编程过程的网络能耗与网络密度、网络大小和分发镜像页面大小的关系:网络能耗随着网络密度的减小呈上升趋势(能耗与网络密度经拟合呈二次函数关系);网络能耗与网络大小基本呈线性正相关关系(对于n×n 的网格型网络,n从3~10,随着n 的增大,网络能耗的平均增长率为50%),并且随着网络的增大,单节点能耗有所增加(平均增长率为6.9%);页面大小增加时,网络能耗呈下降趋势(平均下降率为11.2%). 相似文献