首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
连通支配集(CDS)在无线网络设计中有着广泛应用,现有多数连通支配集算法每次处理一个节点。提出了一个同时处理多个节点的贪心算法(GCDS),依次选取最小度数节点以及该节点两跳内的一至两个节点为处理节点,当删除处理节点后剩余点不连通时减少处理的节点数,进而把节点分为支配点和受支配点;最终所有支配点构成一个近似最小连通支配集。在模拟无线传感器网络的单位圆盘图上的仿真结果表明,GCDS算法具有较低的时间复杂度,所得到的连通支配集大小优于已有算法。  相似文献   

2.
Specialization of code is used to improve the performance of the applications. However, specialization based on ineffective profiles deteriorates the performance. Existing value profiling algorithms are not yet able to address the issue of code size explosion incurred due to specialization of code. This problem can be mitigated by capturing data through profiling that would be useful for specialization of code with minimum code size.In this article, we present an approach to optimize code through value profiling and specialization with code transformation. The values of the parameters selected through an analysis of code are captured in the intervals which are automatically adapted to dynamic behavior of the application. The code is then specialized based on value profiles. The specialized code contains optimizations and may be converted back to the generalized code through a transformation. This approach facilitates the code to obtain optimizations through specialization with minimum size, and no runtime overhead.Using this approach, the experiments performed on Itanium-II (IA-64) architecture with icc compiler v 9.0 show a significant improvement in the performance of the SPEC 2000 benchmarks.  相似文献   

3.
Energy saving is becoming one of the major design issues in processor architectures with multiple functional units (FUs). Nested loops are usually the most critical part in multimedia and high-performance DSP systems. There is a tradeoff between power saving and performance, such as timing constraint and code size requirement, of nested loops. This paper studies how to minimize the total energy while satisfying performance requirement for applications with multidimensional nested loops. An algorithm, energy minimization with loop fusion and FU schedule (EMLFS), is proposed. We first use retiming and partition to fuse nested loops. Then we use novel FU scheduling algorithms to maximize energy saving without sacrificing performance. The experimental results show that the average improvement on energy saving is significant by using our EMLFS algorithm.  相似文献   

4.

Parallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

  相似文献   

5.
Iteration space tiling is a common strategy used by parallelizing compilers and in performance tuning of parallel codes. We address the problem of determining the tile size that minimizes the total execution time. We restrict our attention to uniform dependency computations with two-dimensional, parallelogram-shaped iteration domain which can be tiled with lines parallel to the domain boundaries. The target architecture is a linear array (or a ring). Our model is developed in two steps. We first abstract each tile by two simple parameters, namely tile periodPtand intertile latencyLt. We formulate and partially resolve the corresponding optimization problem independent of the machine and program. Next, we refine the model with realistic machine and program parameters, yielding a discrete nonlinear optimization problem. We solve this analytically, yielding a closed form solution, which can be used by a compiler before code generation.  相似文献   

6.
Object Inlining (OI) is a known optimization in object oriented programming in which referenced objects of class B are inlined into their referencing objects of class A by making all fields and methods of class B part of class A. The optimization saves all the new operations of B type objects from class A and at the same time replaces all indirect accesses, from A to fields of B, by direct accesses. To the best of our knowledge, in-spite of the significant performance potential of the OI optimization, reported performance measurements were relatively moderate. This is because an aggressive OI optimization requires complex analysis and code transformations to overcome problems like multiple references to the inlinable object, object references that escape their object scope, etc.  相似文献   

7.
Distributed shared memory (DSM) systems have become popular as a means of utilizing clusters of computers for solving large applications. We have developed a high-performance DSM, Strings. In addition, to improve the performance of our DSM, a memory hierarchy simulator has been developed that allows us to compare various techniques very quickly and with much less effort. This paper describes our simulator, DSMSim. We show that the simulator's performance closely matches the real system and demonstrate potential performance gains of up to 60% after adding optimization features to the simulator. The simulator also accepts the same code as the software distributed shared memory.  相似文献   

8.
This paper presents new approaches to the validation of loop optimizations that compilers use to obtain the highest performance from modern architectures. Rather than verify the compiler, the approach of translation validationperforms a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code. As part of an active and ongoing research project on translation validation, we have previously described approaches for validating optimizations that preserve the loop structure of the code and have presented a simulation-based general technique for validating such optimizations. In this paper, for more aggressive optimizations that alter the loop structure of the code—such as distribution, fusion, tiling, and interchange—we present a set of permutation ruleswhich establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the extensions to our tool voc-64 which are required to validate these structure-modifying optimizations. This paper also discusses preliminary work on run-time validation of speculative loop optimizations. This involves using run-time tests to ensure the correctness of loop optimizations whose correctness cannot be guaranteed at compile time. Unlike compiler validation, run-time validation must not only determine when an optimization has generated incorrect code, but also recover from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange and loop tiling, and appears to be quite promising. This research was supported in part by NSF grant CCR-0098299, ONR grant N00014-99-1-0131, and the John von Neumann Minerva Center for Verification of Reactive Systems.  相似文献   

9.
The paper presents approaches to the validation of optimizing compilers. The emphasis is on aggressive and architecture-targeted optimizations which try to obtain the highest performance from modern architectures, in particular EPIC-like micro-processors. Rather than verify the compiler, the approach of translation validation performs a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code.First we survey the standard approach to validation of optimizations which preserve the loop structure of the code (though they may move code in and out of loops and radically modify individual statements), present a simulation-based general technique for validating such optimizations, and describe a tool, VOC-64, which implements these technique. For more aggressive optimizations which, typically, alter the loop structure of the code, such as loop distribution and fusion, loop tiling, and loop interchanges, we present a set of permutation rules which establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the necessary extensions to the VOC-64 in order to validate these structure-modifying optimizations.Finally, the paper discusses preliminary work on run-time validation of speculative loop optimizations, that involves using run-time tests to ensure the correctness of loop optimizations which neither the compiler nor compiler-validation techniques can guarantee the correctness of. Unlike compiler validation, run-time validation has not only the task of determining when an optimization has generated incorrect code, but also has the task of recovering from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange, loop tiling, and software pipelining and appears to be quite promising.  相似文献   

10.
张艳  张羽  李君伟  夏先进  李士宁 《软件学报》2013,24(S1):125-133
在无线传感器网络中,当需要进行软件更新、修复软件Bug 时,重编程协议能够将新程序镜像分发到多跳网络中的全部节点.近来,基于网络编码的重编程协议被用于解决在高损环境下的有效代码分发问题,但对这些协议性能的分析仍有待深入.提出一种基于时间和网络拓扑的能耗分析模型,在模型中综合考虑了页面流水和节点通信距离对协议性能的影响.该模型分析结果与基于网络编码的重编程协议Rateless Deluge 的仿真结果达到了较好的一致性(平均单节点能耗相对误差在0.23%左右),验证了能耗分析模型的有效性.分析结果揭示了基于网络编码重编程过程的网络能耗与网络密度、网络大小和分发镜像页面大小的关系:网络能耗随着网络密度的减小呈上升趋势(能耗与网络密度经拟合呈二次函数关系);网络能耗与网络大小基本呈线性正相关关系(对于n×n 的网格型网络,n从3~10,随着n 的增大,网络能耗的平均增长率为50%),并且随着网络的增大,单节点能耗有所增加(平均增长率为6.9%);页面大小增加时,网络能耗呈下降趋势(平均下降率为11.2%).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号