期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reuse-Driven Tiling for Improving Data Locality

Jingling Xue Chua-Huang Huang 《International journal of parallel programming》1998,26(6):671-696

This paper applies unimodular transformations and tiling to improve data locality of a loop nest. Due to data dependences and reuse information, not all dimensions of the iteration space will and can be tiled. By using cones to represent data dependences and vector spaces to quantify data reuse in the program, a reuse-driven transformational approach is presented, which aims at maximizing the amount of data reuse carried in the tiled dimensions of the iteration space while keeping the number of tiled dimensions to a minimum (to reduce loop control overhead). In the special case of one single fully permutable loop nest, an algorithm is presented that tiles the program optimally so that all data reuse is carried in the tiled dimensions. In the general case of multiple fully permutable loop nests, data dependences can prevent all data reuse to be carried in the tiled dimensions. An algorithm is presented that aims at localizing data reuse in the tiled dimensions so that the reuse space localized has the largest dimensionality possible. 相似文献

2.

Restructuring Computations for Temporal Data Cache Locality

Pingali Venkata K. McKee Sally A. Hsieh Wilson C. Carter John B. 《International journal of parallel programming》2003,31(4):305-338

Data access costs contribute significantly to the execution time of applications with complex data structures. A the latency of memory accesses becomes high relative to processor cycle times, application performance is increasingly limited by memory performance. In some situations it is useful to trade increased computation costs for reduced memory costs. The contributions of this paper are three-fold: we provide a detailed analysis of the memory performance of seven memory-intensive benchmarks; we describe Computation Regrouping, a source-level approach to improving the performance of memory-bound applications by increasing temporal locality to eliminate cache and TLB misses; and, we demonstrate significant performance improvement by applying Computation Regrouping to our suite of seven benchmarks. Using Computation Regrouping, we observe a geometric mean speedup of 1.90, with individual speedups ranging from 1.26 to 3.03. Most of this improvement comes from eliminating memory tall time. 相似文献

3.

保局性数据域描述单类分类器

郑建炜蒋一波王万良《计算机科学》2011,38(11):208-212

由于缺少对数据结构信息的考虑,现有的域描述型单类分类器得到的支撑面往往是次优解。因此,以支持向量数据描述((SVDD)算法为基础,通过一种简易的形式引入数据亲和因子以保持样本局部特性,提出保局性数据域描述分类器(LPDD),使成簇的数据作用被强化,而呈零星分布的数据影响力被削弱,引导分类支撑面自动靠近数据高密区而提高算法性能。此外,为适应大样本应用场合,采用序列最小优化算法进行模型参数调整。实验证明,所提算法无论在训练速率还是在分类性能上都优于SVDD。相似文献

4.

The Data Locality of Work Stealing 总被引：1，自引：0，他引：1

Umut A. Acar Guy E. Blelloch Robert D. Blumofe 《Theory of Computing Systems》2002,35(3):321-347

This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlled shared-memory machines, where movement of data to and from the cache is solely controlled by the hardware. We present lower and upper bounds on the number of cache misses when using work stealing, and introduce a locality-guided work-stealing algorithm and its experimental validation. {As a lower bound, we show that a work-stealing application that exhibits good data locality on a uniprocessor may exhibit poor data locality on a multiprocessor. In particular, we show a family of multithreaded computations G _n whose members perform Θ(n) operations (work) and incur a constant number of cache misses on a uniprocessor, while even on two processors the total number of cache misses soars to Ω(n) . On the other hand, we show a tight upper bound on the number of cache misses that nested-parallel computations, a large, important class of computations, incur due to multiprocessing. In particular, for nested-parallel computations, we show that on P processors a multiprocessor execution incurs an expected

more misses than the uniprocessor execution. Here m is the execution time of an instruction incurring a cache miss, s is the steal time, C is the size of cache, and T _∈ fty is the number of nodes on the longest chain of dependencies. Based on this we give strong execution time bounds for nested-parallel computations using work stealing.} For the second part of our results, we present a locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor. Our initial experiments on iterative data-parallel applications show that the algorithm matches the performance of static-partitioning under traditional work loads but improves the performance up to 50% over static partitioning under multiprogrammed work loads. Furthermore, locality-guided work stealing improves the performance of work stealing up to 80%. 相似文献

5.

Refactoring

Ralf Reißing 《Informatik-Spektrum》1999,22(3):210-211

Ohne Zusammenfassung 相似文献

6.

Precise Data Locality Optimization of Nested Loops

Vincent Loechner Benoît Meister Philippe Clauss 《The Journal of supercomputing》2002,21(1):37-76

A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized cost functions. The considered loops can be imperfectly nested. New data layouts are propagated through the connected references and through the loop nests as constraints for optimizing the next connected reference in the same nest or in the other ones. Unlike many existing methods, special attention is paid to TLB (Translation Lookaside Buffer) effectiveness since TLB misses can take from tens to hundreds of processor cycles. Our approach only considers active data, that is, array elements that are actually accessed by a loop, in order to prevent useless memory loads and take advantage of storage compression and temporal locality. Moreover, the same data transformation is not necessarily applied to a whole array. Depending on the referenced data subsets, the transformation can result in different data layouts for a same array. This can significantly improve the performance since a priori incompatible references can be simultaneously optimized. Finally, the process does not only consider the innermost loop level but all levels. Hence, large strides when control returns to the enclosing loop are avoided in several cases, and better optimization is provided in the case of a small index range of the innermost loop. 相似文献

7.

基于可扩展LSH的高维动态数据索引

胡海苗姜帆《软件学报》2015,26(S2):228-238

提出了一种可扩展的局部敏感哈希索引(SLSH),以解决高维动态数据索引中,由于数据集大小及分布特征无法确定而导致索引效率降低的问题.SLSH架构于E2LSH之上,继承了其对高维数据索引速度快,并可直接对欧式空间上的数据点进行索引的特点.为了使得哈希索引具有动态的相似性区分能力,SLSH修改了E2LSH的哈希族,通过哈希桶容量约束自适应调节哈希参数.因此对于分布密度动态变化的数据空间,SLSH也能够给出鲁棒的划分. 相似文献

8.

一种利用数据融合来提高局部性和减少伪共享的方法 总被引：6，自引：0，他引：6

曾丽芳杨学军夏军陈娟《计算机学报》2004,27(1):32-41

某些应用程序不能通过数组内元素的重排优化获得性能提高 .针对这一问题 ,该文扩展了数组之间数据重组优化方法 ,着重分析了将多个数组的数据按一定方式进行融合来提高局部性和减少伪共享优化方法的特性 .文章针对几种典型的数组关联模式 ,提出了相应的数据融合方法 ,并建立了一组粗略的性能代价判别规则 ,以指导编译器有选择地融合数组以提高程序的全局优化效果 .根据在多个平台上的测试结果 ,该文还分析了数据融合优化方法在不同体系结构上的性能可移植性 ,并将体系结构特征加入到性能代价判别规则中 ,使得此优化方法能适用于不同的体系结构 .测试结果表明 ,数据融合优化方法对提高某些应用程序的性能 ,尤其是其在软件DSM体系结构上的性能 ,是非常有效的相似文献

9.

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Mü ller Christoph Frey Steffen Strengert Magnus Dachsbacher Carsten Ertl Thomas 《IEEE transactions on visualization and computer graphics》2009,15(4):605-617

We present a development environment for distributed GPU computing targeted for multi-GPU systems, as well as graphics clusters. Our system is based on CUDA and logically extends its parallel programming model for graphics processors to higher levels of parallelism, namely, the PCI bus and network interconnects. While the extended API mimics the full function set of current graphics hardware—including the concept of global memory—on all distribution layers, the underlying communication mechanisms are handled transparently for the application developer. To allow for high scalability, in particular for network-interconnected environments, we introduce an automatic GPU-accelerated scheduling mechanism that is aware of data locality. This way, the overall amount of transmitted data can be heavily reduced, which leads to better GPU utilization and faster execution. We evaluate the performance and scalability of our system for bus and especially network-level parallelism on typical multi-GPU systems and graphics clusters. 相似文献

10.

Improving Data Locality for Efficient In‐Core Path Tracing

J. Bikker 《Computer Graphics Forum》2012,31(6):1936-1947

In this paper, we investigate the efficiency of ray queries on the CPU in the context of path tracing, where ray distributions are mostly random. We show that existing schemes that exploit data locality to improve ray tracing efficiency fail to do so beyond the first diffuse bounce, and analyze the cause for this. We then present an alternative scheme inspired by the work of Pharr et al. in which we improve data locality by using a data‐centric breadth‐first approach. We show that our scheme improves on state‐of‐the‐art performance for ray distributions in a path tracer. 相似文献

11.

工作流重构技术研究 总被引：1，自引：0，他引：1

田珂朱清新向培素《计算机科学》2005,32(8):87-90

先进的工作流技术与传统的企业管理信息系统相结合,日益成为提高企业信息化的一个重要手段。目前的工作流是基于模型驱动的,定义一个完整的模型是相当复杂和费时的;而且,实际业务流程同流程模型之间必然存在差异。本文介绍了工作流网,工作流日志的概念;提出了一种基于日志包含的信息来重构业务流程模型的算法,该算法还能处理日志中的干扰信息和有效地度量流程模型和实际业务流程之间的差异。相似文献

12.

面向RIA的系统重构模型 总被引：2，自引：0，他引：2

下载免费PDF全文

孙勇林菲《计算机工程》2009,35(24):282-284

利用软件工程中的重构技术,提出一种基于抽象数据视图的系统转换模型。采用抽象数据状态图与经典重构策略相结合的方式对系统进行重构,通过映射方法平滑地将Web应用程序转换为富互联网应用(RIA)程序。实验结果表明,该模型能够提升用户界面的质量与易用性,降低设计与开发过程中的工作量。相似文献

13.

提高堆数据局部性的动态池分配技术 总被引：1，自引：0，他引：1

王振江武成岗张兆庆《计算机学报》2011,34(4):665-675

动态内存分配在现代程序中被广泛使用.通用的内存分配器通常关注于降低运行时开销和内存利用率,而在发掘所分配对象之间的特性方面有所欠缺.文中展示了一个低开销的动态优化技术"动态池分配".它在运行时构造存储形状图,从中发掘动态分配对象之间的亲缘性,把具有亲缘性的对象聚集到一段内存区域(称为内存池)里,改善了它们的数据布局.作... 相似文献

14.

面向Java锁机制的字节码自动重构框架

张杨张冬雯仇晶《计算机科学》2015,42(11):84-89, 117

Java语言提供了同步锁、可重入锁和读写锁等几种锁机制,在并行程序设计中不同的数据结构使用这几种锁机制时获得的性能通常是不同的。为了在不同的锁机制之间进行自动转换,进而帮助程序员了解程序的性能,提出了一种面向Java锁机制的字节码自动重构框架,并基于该框架实现了字节码重构工具Lock2Lock。Lock2Lock在Quad中间表示的基础上对字节码进行静态分析,并对分析的结果进行一致性验证,通过Javassist完成字节码的重构。使用红黑树、消费者生产者程序以及SPECjbb2005 3个测试程序对Lock2Lock重构工具进行了测试,结果表明,Lock2Lock可以成功地实现从同步锁到可重入锁或读写锁的重构。相似文献

15.

Exploiting Data Structure Locality in the Dataflow Model

《Journal of Parallel and Distributed Computing》1995,27(2):183-200

Although the dataflow model has been shown to allow the exploitation of parallelism at all levels, research of the past decade has revealed several fundamental problems. Synchronization at the instruction level, token matching, coloring, and re-labeling operations have a negative impact on performance by significantly increasing the number of non-compute "overhead" cycles. Recently, many novel hybrid von-Neumann data driven machines have been proposed to alleviate some of these problems. The major objective has been to reduce or eliminate unnecessary synchronization costs through simplified operand matching schemes and increased task granularity. Moreover, the results from recent studies quantifying locality suggest sufficient spatial and temporal locality is present in dataflow execution to merit its exploitation. In this paper we present a data structure for exploiting locality in a data driven environment: the vector cell. A vector cell consists of a number of fixed length chunks of data elements. Each chunk is tagged with a presence bit, providing intra-chunk strictness and inter-chunk non-strictness to data structure access. We describe the semantics of the model, processor architecture and instruction set as well as a Sisal to dataflow vectorizing compiler back-end. The vector cell model is evaluated by comparing its performance to those of both a classical fine-grain dataflow processor employing I-structures and a conventional pipelined vector processor. Results indicate that the model is surprisingly resilient to long memory and communication latencies and is able to dynamically exploit the underlying parallelism across multiple processing elements at run time. 相似文献

16.

A Hierarchical Program Representation for Refactoring

Niels Van Eetvelde Dirk Janssens 《Electronic Notes in Theoretical Computer Science》2003,82(7):91

Currently there is a lot of interest in graph representations of software systems, as they provide a natural and flexible means to describe complex structures. The various visual sublanguages of the UML are perhaps the most obvious example of this. In [11] a graph representation of object-oriented programs was presented that enables one to describe refactoring operations (behaviour-preserving changes in the structure of a program) in a formal, concise way by graph rewriting productions. In general, however, a refactoring makes changes to a small part of a program, so the graph representation should only contain the information needed to carry out that refactoring. All other details are redundant and make the graph unnecessarily large for good visualization. A possible solution consists in using a hierarchical representation. Such a representation of object-oriented programs is presented in this paper. It is based on node-rewriting graph productions: each refinement step corresponds to a production. The construction is illustrated by applying it to a small Java simulation of a Local Area Network. 相似文献

17.

基于设计模式的重构技术 总被引：2，自引：0，他引：2

卜亚军于万明《计算机应用研究》2003,20(12):42-43,47

设计模式是编程过程中的设计经验，在应用程序的实现过程中可以利用设计模式，用重构的观念来对待设计模式的实现。采用测试优先的单元测试技术，可以保证重构的安全。介绍了这方面的知识，并举例说明了这种方法的可行性和有效性。相似文献

18.

Refactoring pattern matching

Meng Wang Jeremy Gibbons Kazutaka Matsuda Zhenjiang Hu 《Science of Computer Programming》2013

Defining functions by pattern matching over the arguments is advantageous for understanding and reasoning, but it tends to expose the implementation of a datatype. Significant effort has been invested in tackling this loss of modularity; however, decoupling patterns from concrete representations while maintaining soundness of reasoning has been a challenge. Inspired by the development of invertible programming, we propose an approach to program refactoring based on a right-invertible language rinv—every function has a right (or pre-) inverse. We show how this new design is able to permit a smooth incremental transition from programs with algebraic datatypes and pattern matching, to ones with proper encapsulation, while maintaining simple and sound reasoning. 相似文献

19.

Cost-Directed Refactoring for Parallel Erlang Programs

Christopher Brown Marco Danelutto Kevin Hammond Peter Kilpatrick Archibald Elliott 《International journal of parallel programming》2014,42(4):564-582

This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code. 相似文献

20.

Refactoring Tools: Fitness for Purpose 总被引：1，自引：0，他引：1

Murphy-Hill Emerson Black Andrew P. 《Software, IEEE》2008,25(5):38-44

Refactoring tools can improve the speed and accuracy with which developers create and maintain software—but only if they are used. In practice, tools are not used as much as they could be; this seems to be because sometimes they do not align with the refactoring tactic preferred by most programmers, a tactic the authors call floss refactoring. They propose five principles that characterize successful floss-refactoring tools—principles that can help programmers to choose the most appropriate refactoring tools and also help toolsmiths to design tools that fit the programmer's purpose. 相似文献