期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Experiences with Cooperating Register Allocation and Instruction Scheduling

Cindy Norris Lori L. Pollock 《International journal of parallel programming》1998,26(3):241-283

Compile-time reordering of low level instructions is successful in achieving large increases in performance of programs on fine grain parallel machines. However, because of the interdependences between instruction scheduling and register allocation, a lack of cooperation between the scheduler and register allocator can result in generating code that contains excess register spills and/or a lower degree of parallelism than actually achievable. This paper describes a strategy for providing cooperation between register allocation and both global and local instruction scheduling. We experimentally compare this strategy with other cooperative and uncooperative scenarios. 相似文献

2.

基于Trace合并和寄存器分配的Dalvik优化

余超君李春强尚云海张培勇《计算机工程》2014,(10)

Dalvik虚拟机作为Android系统上运行所有应用程序的基础,其性能瓶颈一直制约着Android系统的用户体验。通过研究Android系统中的Dalvik架构,分析其解释器和JIT模块的工作原理,发现热Trace选择过程中短Trace编译损耗大以及即时编译过程中寄存器分配不合理的情况。结合Java虚拟机技术和编译器技术,在现有热Trace选择和寄存器分配机制的基础上,提出基于Trace合并和寄存器分配的优化算法,在国产高性能嵌入式CPU CSKY体系下移植Dalvik虚拟机并实现了上述优化算法。通过实验证明优化后Dalvik执行Java程序的性能提高了近10%。相似文献

3.

Sorting and Counting Networks of Arbitrary Width and Small Depth

C. Busch M. Herlihy 《Theory of Computing Systems》2002,35(2):99-128

We present the first construction for sorting and counting networks of arbitrary width that requires both small depth and small constant factors in the depth expression. Let w be the product w = p ₀ ⋅ p ₁ ⋅s p _n-1 , whose factors are not necessarily prime. We present a novel network construction of width w and depth O(n ² ) = O(log ² w) , using comparators (or balancers) of width less than or equal to max(p _i ) . This construction is practical in the sense that the asymptotic notation does not hide any large constants. An interesting aspect of this construction is that it establishes a family of sorting and counting networks of width w , one for each distinct factorization of w . A factorization in which max(p _i ) is large and n is small yields a network that trades small depth for large comparators (or balancers), and a factorization where max(p _i ) is small and n is large makes the opposite tradeoff. Received June 18, 2001. Online publication October 30, 2001. 相似文献

4.

Efficient Register Mapping and Allocation in LaTTe, an Open-Source Java Just-in-Time Compiler

Byung-Sun Yang Junpyo Lee SeungIl Lee Seongbae Park Yoo C. Chung Suhyun Kim Kemal Ebcioglu Erik Altman Soo-Mook Moon 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(1):57-69

Java just-in-time (JIT) compilers improve the performance of a Java virtual machine (JVM) by translating Java bytecode into native machine code on demand. One important problem in Java JIT compilation is how to map stack entries and local variables to registers efficiently and quickly, since register-based computations are much faster than memory-based ones, while JIT compilation overhead is part of the whole running time. This paper introduces LaTTe, an open-source Java JIT compiler that performs fast generation of efficiently register-mapped RISC code. LaTTe first maps "all" local variables and stack entries into pseudoregisters, followed by real register allocation which also coalesces copies corresponding to pushes and pops between local variables and stack entries aggressively. Our experimental results indicate that LaTTe's sophisticated register mapping and allocation really pay off, achieving twice the performance of a naive JIT compiler that maps all local variables and stack entries to memory. It is also shown that LaTTe makes a reasonable trade-off between quality and speed of register mapping and allocation for the bytecode. We expect these results will also be beneficial to parallel and distributed Java computing: 1) by enhancing single-thread Java performance; and 2) by significantly reducing the number of memory accesses which the rest of the system must properly order to maintain coherence and keep threads synchronized 相似文献

5.

ZigBee地址分配算法及树路由算法改进

尹甲别红霞《软件》2013,(1):129-132

ZigBee使用的分布式地址分配算法(DAAM)为节点分配地址时没有考虑网络拓扑结构的变化。这就造成了地址空间的严重浪费,使得节点入网成功率降低。同时基于DAAM机制的树路由算法没有考虑节点的负载,负载不均衡将导致网络分割的提前到来。本文提出一种改进的分布式地址分配算法和基于它的负载均衡的树路由算法。改进的地址分配算法通过获取邻居节点的地址空间从而提高节点入网成功率。改进的树路由算法可以均衡节点能耗,延长网络寿命。相似文献

6.

Optimal and Near-optimal Global Register Allocation Using 0–1 Integer Programming

DAVID W. GOODWIN KENT D. WILKEN 《Software》1996,26(8):929-965

This paper presents a fundamentally new approach to global register allocation that optimally allocates registers and optimally places spill code, significantly decreasing spill code overhead compared with the traditional graph-coloring approach. The Optimal Register Allocation (ORA) approach formulates global register allocation as a 0–1 integer programming problem, incorporating all aspects of register allocation within a unified framework, including copy elimination, live range splitting, rematerialization, callee and caller register spilling, special instruction-operand requirements, and paired registers. A prototype ORA allocator is built into the Gnu C Compiler (GCC). For the SPEC92 integer benchmarks, the ORA allocator actually produces a net decrease of more than 100 million cycles across the entire benchmark set, because the dynamic copies the ORA allocator removes exceed the dynamic loads and stores that are inserted. In contrast, the GCC allocator and a Chaitin-style graph-coloring allocator each cause a net increase of more than 1 billion cycles. Because global register allocation is NP-complete, optimal register allocation has been considered intractable. However, the run-time complexity of the ORA approach is shown experimentally to be O(n³). A profile-guided hybrid allocation approach is proposed that uses the ORA allocator for the performance critical regions in the performance critical functions, while using a graph-coloring allocator for the non-critical functions and regions. An ORA-GCC hybrid allocator takes an average of 4.6 seconds per function to produce an allocation that is within 1% of optimal for 97% of the SPEC92 integer benchmark functions, showing that the hybrid allocator is practical as an advanced optimization for performance-critical codes. 相似文献

7.

小蜂窝云中功率与负载的联合优化分配算法

《计算机工程》2017,(8):49-55

移动终端资源有限及本地服务基站资源不足会引起移动终端体验质量降低、卸载任务时延长的问题。为此,提出一种新的联合优化分配算法。基于小蜂窝信道质量和剩余可用计算资源建立小蜂窝云(SCC),按照信道质量和剩余可用计算资源分配负载(卸载任务)到SCC,并采用启发式算法求解发送功率的次优解。仿真结果表明,该算法在小蜂窝云计算场景中能提高无线与计算资源的利用率,同时提升用户的体验质量。相似文献

8.

Swapping a Failing Edge of a Single Source Shortest Paths Tree Is Good and Fast

Nardelli Proietti Widmayer 《Algorithmica》2003,35(1):56-74

Abstract. Let G=(V,E) be a 2-edge connected, undirected and nonnegatively weighted graph, and let S(r) be a single source shortest paths tree (SPT) of G rooted at r ∈ V . Whenever an edge e in S(r) fails, we are interested in reconnecting the nodes now disconnected from the root by means of a single edge e' crossing the cut created by the removal of e . Such an edge e' is named a swap edge for e . Let S _e/e' (r) be the swap tree (no longer an SPT, in general) obtained by swapping e with e' , and let S _e be the set of all possible swap trees with respect to e . Let F be a function defined over S _e that expresses some feature of a swap tree, such as the average length of a path from the root r to all the nodes below edge e , or the maximum length, or one of many others. A best swap edge for e with respect to F is a swap edge f such that F(S _e/f (r)) is minimum. In this paper we present efficient algorithms for the problem of finding a best swap edge, for each edge e of S(r) , with respect to several objectives. Our work is motivated by a scenario in which individual connections in a communication network suffer transient failures. As a consequence of an edge failure, the shortest paths to all the nodes below the failed edge might completely change, and it might be desirable to avoid an expensive switch to a new SPT, because the failure is only temporary. As an aside, what we get is not even far from a new SPT: our analysis shows that the trees obtained from the swapping have features very similar to those of the corresponding SPTs rebuilt from scratch. 相似文献

9.

基于小程序的在线少儿英语学习平台的研究与设计

《软件》2019,(5):249-252

随着信息技术的发展,基于互联网背景下的"在线学习"作为一种教学平台的发展显得尤为迅速,它已成为现代教育教学非常重要的一部分。在线学习不但是成人学习的一种方式,它已经进入到少儿早教学习中。本文研究了少儿在线英语学习小程序的设计,以及后台管理程序的实现。完成了外教语音的在线播放、对应课文的查看、跟读、课程购买等前端用户功能的小程序,以及使用php+mysql实现的英文课本的上架、微信会员用户的管理、音频上传、课程解锁等后台管理功能。相似文献