首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program. This can be accomplished via a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged. We present the low-level abstraction of memory access (LLAMA), a C++ library that provides such a data structure abstraction layer with example implementations for multidimensional arrays of nested, structured data. LLAMA provides fully C++ compliant methods for defining and switching custom memory layouts for user-defined data types. The library is extensible with third-party allocators. Providing two close-to-life examples, we show that the LLAMA-generated array of structs and struct of arrays layouts produce identical code with the same performance characteristics as manually written data structures. Integrations into the SPEC CPU® lbm benchmark and the particle-in-cell simulation PIConGPU demonstrate LLAMA's abilities in real-world applications. LLAMA's layout-aware copy routines can significantly speed up transfer and reshuffling of data between layouts compared with naive element-wise copying. LLAMA provides a novel tool for the development of high-performance C++ applications in a heterogeneous environment.  相似文献   

2.
Finding the best data layout has been an ultimate goal of memory optimization. Even with data access profile, heuristic algorithms are needed to reorganize data layout for better locality. The best layout could be found by running the given application with all possible data layouts and selecting the best performing layout. This approach, however, can incur too much overhead, particulary when the number of possible layouts are too many. In this paper, we present a composition-based cache simulation for structure reorganization. Instead of running all possible layouts, we simulate only the primary subsets of layouts and compose the cache misses for all layouts by summing up the cache misses of component subsets. Our experiment with the composition-based cache simulation shows that the differences in the cache misses are within 10% of the full cache simulation for 4-way and 8-way set associative caches. In addition to the cache miss estimation, our heuristic algorithm takes account of the extra instruction overhead incurred by structure reorganization. Our experiment with several structure intensive benchmarks shows the 37% reduction in the L1D read misses and the 28% reduction in the L2 read misses. As a result, the execution times are also reduced by 19% on average.  相似文献   

3.
4.
Modern systems for the analysis of image‐based biomedical data, such as functional magnetic resonance imaging (fMRI), require fast computational techniques and rapid, robust development. Object‐oriented programming languages such as Java and C++ provide the foundations for the development of complex data analysis applications. This case study explores the advantages and disadvantages of using these two programming environments for scientific computation as typified in the analysis of fMRI datasets. C++ is well suited for computational and memory optimization while Java is more compliant to the object‐oriented paradigm, supports cross‐platform development and has a rich set of application programming interface (API) classes. The same data model and algorithms were implemented in C++ and Java, and a user interface was developed with the Java API. Comparisons were made with respect to computational performance and ease of development. Benchmarks show that C++ generally outperforms Java, while Java is easier to use, leading to more robust code and shorter development times. However, with the advent of newer just‐in‐time compilers, Java performance is at times comparable to C++. The latest Java virtual machine technology is closing the gap and eventually Java should be a good compromise between efficient algorithm performance and effective application development. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

5.
Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved performance, however, layout optimization is very time-consuming and error-prone because container access syntax in standard programming languages is not sufficiently abstract. This means that changing the data layout of a container necessitates syntax changes in all parts of the code where the container is used. Object oriented languages allow to solve this problem by hiding the data layout behind a class interface. However, the additional coding effort is enormous in comparison to a simple structure. A clever coding pattern, previously presented by the author, significantly reduces the code overhead, however, it relies heavily on advanced C++ features, a language that is not supported on most accelerators. This paper develops a concise macro based solution that requires only support for structures and unions and can therefore be utilized in OpenCL, a widely supported programming language for parallel processors. This enables the development of high performance code without an a-priori commitment to a certain layout and includes the possibility to optimize it subsequently. This feature is used to identify the best data layouts for different processing patterns of multi-valued containers on a multi-GPU system.  相似文献   

6.
Legacy systems are often written in programming languages that support arbitrary variable overlays. When migrating to modern languages, the data model must adhere to strict structuring rules, such as those associated with an object oriented data model, supporting classes, class attributes and inter-class relationships.In this paper, we deal with the problem of automatically transforming a data model which lacks structure and relies on the explicit layout of variables in memory as defined by programmers. We introduce an abstract syntax and a set of abstract rewrite rules to describe the proposed approach in a language neutral formalism. Then, we instantiate the approach for the proprietary programming language that was used to develop a large legacy system we are migrating to Java.  相似文献   

7.
袁伟  孙永强 《软件学报》1998,9(1):47-52
面向对象的并行程序设计提供了类似于共享内存模型对通讯和计算的抽象能力,从而非常适合于大型并行软件系统的开发.但是基于远程对象调用的分布式对象的实现效率一直是面向对象方法在分布式/并行程序设计中得到广泛应用的障碍.本文介绍了并行机MANNA上所采用的面向对象的并行程序设计模型——Dual-Object模型.该模型通过引入从语义角度出发给出的数据一致特性的描述,在一定程度上解决了实现效率低下的问题.其次,文章通过程序设计实例详细地讨论了基于Dual-Object模型的扩展C++并行程序设计,并给出了部分实际测试结果.  相似文献   

8.
9.
This paper presents a graph‐oriented framework, called WebGOP, for architecture modeling and programming of Web‐based distributed applications. WebGOP is based on the graph‐oriented programming (GOP) model, under which the components of a distributed program are configured as a logical graph and implemented using a set of operations defined over the graph. WebGOP reshapes GOP with a reflective object‐oriented design, which provides powerful architectural support in the World Wide Web environment. In WebGOP, the architecture graph is reified as an explicit object which itself is distributed over the network, providing a graph‐oriented context for the execution of distributed applications. The programmer can specialize the type of graph to represent a particular architecture style tailored for an application. WebGOP also has built‐in support for flexible and dynamic architectures, including both planned and unplanned dynamic reconfiguration of distributed applications. We describe the WebGOP framework, a prototypical implementation of the framework on top of SOAP, and a performance evaluation of the prototype. The prototype demonstrated the feasibility of our approach. Results of the performance evaluation showed that the overhead introduced by WebGOP over SOAP is reasonable and acceptable. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

10.
This paper presents a procedural approach to generate furniture arrangements for large virtual indoor scenes. The interiors of buildings in 3D city scenes are often omitted. Our solution creates rich furniture arrangements for all rooms of complex buildings and even for entire cities. The key idea is to only furnish the rooms in the vicinity of the viewer while the user explores a building in real time. In order to compute the object layout we introduce an agent‐based solution and demonstrate the flexibility and effectiveness of the agent approach. Furthermore, we describe advanced features of the system, like procedural furniture geometry, persistent room layouts, and styles for high‐level control.  相似文献   

11.
In this paper we propose a set‐oriented rule‐based method definition language for object‐oriented databases. Most existing object‐oriented database systems exploit a general‐purpose imperative object‐oriented programming language as the method definition language. Because methods are written in a general‐purpose imperative language, it is difficult to analyze their properties and to optimize them. Optimization is important when dealing with a large amount of objects as in databases. We therefore believe that the use of an ad hoc, set‐oriented language can offer some advantages, at least at the specification level. In particular, such a language can offer an appropriate framework to reason about method properties. In this paper, besides defining a set‐oriented rule‐based language for method definition, we formally define its semantics, addressing the problems of inconsistency and non‐determinism in set‐oriented updates. Moreover, we characterize some relevant properties of methods, such as conflicts among method specifications in sibling classes and behavioral refinement in subclasses. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

12.
In many biomedical research laboratories, data analysis and visualization algorithms are typical prototypes using an interpreted programming language. If performance becomes an issue, they are ported to C and integrated with interpreted systems, not fully utilizing object‐oriented software development. This paper presents an overview of Scopira, an open source C++ framework suitable for biomedical data analysis and visualization. Scopira provides high‐performance end‐to‐end application development features, in the form of an extensible C++ library. This library provides general programming utilities, numerical matrices and algorithms, parallelization facilities, and graphical user interface elements. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

13.
The Internet of Things (IoT) has gained wide popularity both in academic and industrial contexts. Unlike traditional embedded devices with specialized firmwares, modern IoT devices accommodate general‐purpose operating systems, allowing developers to run more sophisticated applications written in high‐level languages like JavaScript. Because IoT devices are subject to resource constraints like available battery power, we need to dynamically migrate a running process between different devices to prevent losing state. However, it is challenging to apply migration techniques using memory snapshots across the heterogeneous pool of IoT devices. We present ThingsMigrate, a middleware providing platform‐independent migration of JavaScript processes across IoT devices. Prior to execution, ThingsMigrate instruments the source code of a given program to expose its internal state. During run‐time, the transformed program produces on demand a JSON snapshot of its current state, from which new code is generated to resume execution. Thus, ThingsMigrate enables process migration entirely in the application space without any modifications to the underlying virtual machine (VM), providing VM‐independence. We present three versions of ThingsMigrate, each building on the previous to optimize for run‐time latency and memory consumption. We report on the experience of building each successive version and discuss the insights gained and the learning outcomes. We evaluated ThingsMigrate against standard benchmarks, over two IoT platforms and a cloud‐like environment. We show that it can migrate even highly CPU‐intensive applications, with average run‐time latency overhead of 33% and memory overhead of 78%. ThingsMigrate supports multiple subsequent migrations without introducing additional overhead over each subsequent migration.  相似文献   

14.
Constraints enable flexible graph layout by combining the ease of automatic layout with customizations for a particular domain. However, constraint‐based layout often requires many individual constraints defined over specific nodes and node pairs. In addition to the effort of writing and maintaining a large number of similar constraints, such constraints are specific to the particular graph and thus cannot generalize to other graphs in the same domain. To facilitate the specification of customized and generalizable constraint layouts, we contribute SetCoLa: a domain‐specific language for specifying high‐level constraints relative to properties of the backing data. Users identify node sets based on data or graph properties and apply high‐level constraints within each set. Applying constraints to node sets rather than individual nodes reduces specification effort and facilitates reapplication of customized layouts across distinct graphs. We demonstrate the conciseness, generalizability, and expressiveness of SetCoLa on a series of real‐world examples from ecological networks, biological systems, and social networks.  相似文献   

15.
In this paper, a data-driven generative method is applied to generate synthetic space allocation probability layout. This generated layout could be helpful in the early stage of an architectural design. For this task, a specific training dataset is generated which is used to train the cGAN model. The training dataset consists of 300 existing apartment layouts which are coloured in a set of low feature representation. The cGAN model is trained with this dataset and the trained model is evaluated based on the quality of its generated layouts regarding the five pre-defined topological and geometrical benchmarks.  相似文献   

16.
Hans de Bruin 《Software》2000,30(8):849-894
A small, object‐oriented language is introduced: BCOOPL (Basic Concurrent Object‐Oriented Programming Language). This language is specifically targeted to support component‐oriented programming. The main design goal of BCOOPL was to provide a small, but powerful set of language features that supports the construction of high‐quality components through well‐established software engineering practices, which include the separation of interfaces and implementations, weakly‐coupled objects, and abstraction. A number of design patterns based on these principles is actually built in the language. In particular, the observer, the mediator and the bridge are supported directly. This provides a strong foundation on which higher level component specification languages can be built. BCOOPL has a long research history. Its roots can be traced back to path expressions, and the concurrent object‐oriented programming languages Procol and Talktalk. As a result, BCOOPL only integrates essential language features that blend well and have proven their value in practice. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

17.
Even though there have been strong research activities about distributed virtual shared-memory (DVSM) systems, their architectures have been not widely used in current high-performance computing markets. The reason is that the previously introduced DVSM systems use conventional interconnection technologies like Ethernet, which incurs high execution overhead due to process interruption at data communication for memory consistency. In this paper, we present the DVSM architecture based on the next generation of an interconnection technique, the InfiniBand Architecture (IBA). Because the IBA supports shared-memory programming semantics by means of remote direct-memory access (RDMA) and atomic operations in hardware, we can minimize the communication overhead for memory consistency on the DVSM system. For characterizing multithreaded applications on our IBA-based DVSM system, we examined two different shared-memory programming models, i.e. SPMD and OpenMP benchmarks. We show that our DVSM to use full features of the IBA can improve the performance significantly over the IPoIB-based DVSM system in all benchmarks, and also comparable to the bus-based shared-memory multiprocessor system in some benchmarks.  相似文献   

18.
We introduce a new approach for defining continuous non‐oriented gradient fields from discrete inputs, a fundamental stage for a variety of computer graphics applications such as surface or curve reconstruction, and image stylization. Our approach builds on a moving least square formalism that computes higher‐order local approximations of non‐oriented input gradients. In particular, we show that our novel isotropic linear approximation outperforms its lower‐order alternative: surface or image structures are much better preserved, and instabilities are significantly reduced. Thanks to its ease of implementation (on both CPU and GPU) and small performance overhead, we believe our approach will find a widespread use in graphics applications, as demonstrated by the variety of our results.  相似文献   

19.
很多高校都采用C++语言讲解"面向对象程序设计"课程。本文结合实例探讨了在学习面向对象技术之前应该做好的两个方面的准备工作,并从多个角度对这两个方面做了详细的论述。  相似文献   

20.
谷志奇  余松煜 《计算机工程》2004,30(22):67-68,163
利用面向对象的方法,提出了在数字电视发送和接收系统中实现DSM-CC中数据下载协议的一种规律化的伪C代码到C 语言的映射规则,这一规则利用C 面向对象的语言特点,结合DSM-CC协议栈语义定义有很大相似性的特点,充分利用了多态和继承等面向对象的思想。利用这一规则,给出了一种自顶向下的DSM-CC的C 语言实现。这一实现具有对数据转盘(DC)和对象转盘(OC)兼容,可扩展性强,发送端(编码端)与接收端(解码端)通用的特点。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号