首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 16 毫秒
1.
This paper describes MATISSE, a compiler able to translate a MATLAB subset to C targeting embedded systems. MATISSE uses LARA, an aspect‐oriented programming language, to specify additional information and transformations to the input MATLAB code, for example, insertion of code for initialization of variables, and specification of types and shapes of variables. The compiler is being developed bearing in mind flexibility, multitarget and multitoolchain support, allowing for the generation of several implementations in C from the same reference code in MATLAB. In this paper, we also present a number of techniques being employed in MATLAB to C compilation, such as element‐wise mapping operations, matrix views, weak types, and intrinsics. We validate these techniques using MATISSE and a set of representative benchmarks. More specifically, we evaluate the compiler with a set of 31 benchmarks using an embedded system board and a desktop computer. The results show speedups up to 1.8× by employing information provided by LARA aspects, when compared with C code generated without additional user information. When compared with the execution time of the original code running on MATLAB, the execution time of the generated C code achieved a geometric mean speedup of 13×. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

2.
The modern trend toward heterogeneous many‐core architectures has led to high architectural diversity in both high performance and high‐end embedded systems. To effectively exploit the computational resources of such a wide range of architectures, programming languages and APIs such as OpenCL have become increasingly popular. Although OpenCL provides functional code portability and the ability to fine tune the application to the target hardware, providing performance portability is still an open problem. Thus, many research works have investigated the optimization of specific combinations of application and target platform. In this paper, we aim at leveraging the experience obtained in the implementation of algorithms from the cryptography domain to provide a set of guidelines for modern many‐core heterogeneous architecture performance portability and to establish a base on which domain‐specific languages and compiler transformations could be built in the near future. We study algorithmic choices and the effect of compiler transformations on three representative applications in the chosen domain on a set of seven target platforms. To estimate how well the application fits the architecture, we define a metric of computational intensity both for the architecture and the application implementation. Besides being useful to compare either different implementation or algorithmic choices and their fitness to a specific architecture, it can also be useful to the compiler to guide the code optimization process. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
4.
Compile‐time metaprograms are programs executed during the compilation of a source file, usually targeting to update its source code. Even though metaprograms are essentially programs, they are typically treated as exceptional cases without sharing common practices and development tools. Toward this direction, we identify a set of primary requirements related to language implementation, metaprogramming features, software engineering support, and programming environments and elaborate on addressing these requirements in the implementation of a metaprogramming language. In particular, we introduce the notion of integrated compile‐time metaprograms, as coherent programs assembled from specific metacode fragments present in the source code. We show the expressiveness of this programming model and illustrate its advantages through various metaprogram scenarios. Additionally, we present an integrated tool chain, supporting full‐scale build features and compile‐time metaprogram debugging. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
The arrival of multicore systems, along with the speed‐up potential available in graphics processing units, has given us unprecedented low‐cost computing power. These systems address some of the known architecture problems but at the expense of considerably increased programming complexity. Heterogeneity, at both the architectural and programming levels, poses a great challenge to programmers. Many proposals have been put forth to facilitate the job of programmers. Leaving aside proposals based on the development of new programming languages because of the effort this represents for the user (effort to learn and reuse code), the remaining proposals are based on transforming sequential code into parallel code, or on transforming parallel code designed for one architecture into parallel code designed for another. A different approach relies on the use of skeletons. The programmer has available set of parallel standards that comprise the basis for developing parallel code while programming sequential code. In this context, we propose a methodology for developing an automatic source‐to‐source transformation in a specific domain. This methodology is instantiated in a framework aimed at solving dynamic programming problems. Using this framework, the final user (a physician, mathematician, biologist, etc.) can express her problem using an equation in Latex, and the system will automatically generate the optimal parallel code for homogeneous or heterogeneous architectures. This approach allows for great portability toward these new emerging architectures and for great productivity, as evidenced by the computational results.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

6.
7.
Many‐core hardware is targeted specifically at obtaining high performance, but reaching high performance is often challenging because hardware‐specific details have to be taken into account. Although there are many programming systems that try to alleviate many‐core programming, some providing a high‐level language, others providing a low‐level language for control, none of these systems have a clear and systematic methodology as a foundation. In this article, we propose stepwise‐refinement for performance: a novel, clear, and structured methodology for obtaining high performance on many‐cores. We present a system that supports this methodology, offers multiple levels of abstraction to provide programmers a trade‐off between high‐level and low‐level programming, and provides programmers detailed performance feedback. We evaluate our methodology with several widely varying compute kernels on two different many‐core architectures: a Graphical Processing Unit (GPU) and the Xeon Phi. We show that our methodology gives insight in the performance, and that in almost all cases, we gain a substantial performance improvement using our methodology. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
We present a novel approach to combined textual and visual programming by allowing visual, interactive objects to be embedded within textual source code and segments of source code to be further embedded within those objects. We retain the strengths of text‐based source code, while enabling visual programming where it is beneficial. Additionally, embedded objects and code provide a simple object‐oriented approach to adding a visual form of LISP‐style macros to a language. The ability to freely combine source code and visual, interactive objects with one another allows for the construction of interactive programming tools and experimentation with novel programming language extensions. Our visual programming system is supported by a type coercion‐based presentation protocol that displays normal Java and Python objects in a visual, interactive form. We have implemented our system within a prototype interactive programming environment called ‘The Larch Environment’. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
As the cost of processor hardware declines multiprocessor architectures become increasingly cost-effective and represent an important area for future research. In order to exploit the full potential of multiprocessors, however, it is necessary to understand how to design software which can make effective use of the available parallelism. This paper considers the impact of multiprocessor architecture on the design of high-level programming languages and, in particular, evaluates the language Ada in the light of the special requirements of realtime multiprocessor systems. We conclude that Ada does not, as currently designed, meet the needs for real-time embedded systems.  相似文献   

10.
We study issues in verifying compilers for modern imperative and object-oriented languages. We take the view that it is not the compiler but the code generated by it which must be correct. It is this subtle difference that allows for reusing standard compiler architecture, construction methods and tools also in a verifying compiler.Program checking is the main technique for avoiding the cumbersome task of verifying most parts of a compiler and the tools by which they are generated. Program checking remaps the result of a compiler phase to its origin, the input of this phase, in a provably correct manner. We then only have to compare the actual input to its regenerated form, a basically syntactic process. The correctness proof of the generation of the result is replaced by the correctness proof of the remapping process. The latter turns out to be far easier than proving the generating process correct.The only part of a compiler where program checking does not seem to work is the transformation step which replaces source language constructs and their semantics, given, e.g., by an attributed syntax tree, by an intermediate representation, e.g., in SSA-form, which is expressing the same program but in terms of the target machine. This transformation phase must be directly proven using Hoare logic and/or theorem-provers. However, we can show that given the features of today's programming languages and hardware architectures this transformation is to a large extent universal: it can be reused for any pair of source and target language. To achieve this goal we investigate annotating the syntax tree as well as the intermediate representation with constraints for exhibiting specific properties of the source language. Such annotations are necessary during code optimization anyway.  相似文献   

11.
In a variety of emerging networked computing system domains over the years, there have been bursts of activity on new medium access control (MAC) protocols, as new communication transceiver technologies with greater data‐movement performance or lower power dissipation have been introduced. To enable implementations flexible to evolving standards and improving application‐domain insight, such MAC protocols are typically initially implemented in software, and interface between applications or system software, typically executing on an embedded processor or microcontroller, and the evolving radio transceiver hardware. Many challenges exist in implementing MAC protocols across evolving or competing transceiver hardware implementations and processor architectures. Some of these challenges are peculiar to the requirements of MAC protocols, and others are a result of the plethora of system and processor architectures in the embedded systems domain. This article studies the challenges facing software implementations of MAC protocols running on embedded microcontrollers, and interfacing with radio transceiver hardware. Experience with an implementation of the IEEE 802.15.4 MAC across three hardware platforms with different processor, system, and systems software architectures is presented, focusing on implementation approach and interfaces. Pitfalls are pointed out, and guidelines are provided for ensuring that new MAC implementations are easily portable across processor architectures and transceiver hardware. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
13.
Two current trends in the real‐time and embedded systems are the multiprocessor architectures and the partitioning technology that enables several isolated applications with different criticality levels to share the same computer. This paper presents a real‐time platform for multiprocessor and partitioned systems, in which communication requirements are also considered. The paper describes the adaptation of MaRTE OS (a monoprocessor real‐time operating system) to the XtratuM hypervisor for the multiprocessor Intel x86 architecture. This adaptation makes two contributions to ease the development process of future mixed‐criticality applications: firstly, it integrates the hypervisor technology and the fully partitioned scheduling in a multiprocessor environment, and secondly, it provides the basis to interconnect partitioned and non‐partitioned applications via a homogeneous communication subsystem. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

14.
Diminishing returns from increased clock frequencies and instruction‐level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many‐core architectures have progressed at a remarkable rate, concerns arise regarding the performance and productivity of numerous parallel‐programming tools for application development. Development of parallel applications on many‐core processors often requires developers to familiarize themselves with unique characteristics of a target platform while attempting to maximize performance and maintain correctness of their applications. The family of partitioned global address space (PGAS) programming models comprises the current state of the art in balancing performance and programmability. One such PGAS approach is SHMEM, a lightweight, shared‐memory programming library that has demonstrated high performance and productivity potential for parallel‐computing systems with distributed‐memory architectures. In the paper, we present research, design, and analysis of a new SHMEM infrastructure specifically crafted for low‐level PGAS on modern and emerging many‐core processors featuring dozens of cores and more. Our approach (with a new library known as TSHMEM) is investigated and evaluated atop two generations of Tilera architectures, which are among the most sophisticated and scalable many‐core processors to date, and is intended to enable similar libraries atop other architectures now emerging. In developing TSHMEM, we explore design decisions and their impact on parallel performance for the Tilera TILE‐Gx and TILEPro many‐core architectures, and then evaluate the designs and algorithms within TSHMEM through microbenchmarking and applications studies with other communication libraries. Our results with barrier primitives provided by the Tilera libraries show dissimilar performance between the TILE‐Gx and TILEPro; therefore, TSHMEM's barrier design takes an alternative approach and leverages the on‐chip mesh network to provide consistent low‐latency performance. In addition, our experiments with TSHMEM show that naive collective algorithms consistently outperformed linear distributed collective algorithms when executed in an SMP‐centric environment. In leveraging these insights for the design of TSHMEM, our approach outperforms the OpenSHMEM reference implementation, achieves similar to positive performance over OpenMP and OSHMPI atop MPICH, and supports similar libraries in delivering high‐performance parallel computing to emerging many‐core systems. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

15.
A homogeneous system structure is proposed that enhances modularity and flexibility, and facilitates (further) development of large‐scale software systems in a major‐industry environment. In our opinion, it is an effective means of countering the inherent increase in software entropy when (further) developing existing large‐scale software systems, thereby substantially cutting down production costs. It can be applied generally both to new and existing systems, whether application programs or operating systems, promotes the parallel use of different programming paradigms and various implementation languages, and offers the option of either redesigning parts or introducing additional parts in stages based on a more modern technology. The proposed system structure is compared and contrasted with other architectures such as CORBA, and it is shown that it may be regarded as an embellishment of the CORBA architecture for the internal structuring or restructuring of possibly distributed software systems. So far it has been used in four releases of the BS2000/OSD operating system with very positive results. Dependencies between various entities – which we call ‘subsystems’ – in the course of the (further) development process, as well as during dynamic execution are minimized and well‐regulated. The subsystems may be loaded on demand during the session by a system authority. Every interface in the system is classified according to its permitted scope of use. For interfaces between subsystems, a uniform and standardized technique is introduced which uses the same format for all implementation languages at both the source‐ and binary‐code level. This technique exceeds the regulations in other architectures but actually achieves considerable rationalization. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

16.
Many tools designed to help programmers view and manipulate source code exploit the formal structure of the programming language. Language-based tools use information derived via linguistic analysis to offer services that are impractical for purely text-based tools. In order to be effective, however, language-based tools must be designed to account properly for the documentary structure of source code: a structure that is largely orthogonal to the linguistic but no less important. Documentary structure includes, in addition to the language text, all extra-lingual information added by programmers for the sole purpose of aiding the human reader: comments, white space, and choice of names. Largely ignored in the research literature, documentary structure occupies a central role in the practice of programming. An examination of the documentary structure of programs leads to a better understanding of requirements for tool architectures.  相似文献   

17.
Although constraint programming has attracted much attention in logic programming, nowadays the importance to integrate constraints with imperative programming is widely acknowledged. In particular, in artificial intelligence domains, the benefits deriving from merging constraint‐based programming with object‐oriented paradigms seem to be still more attractive because of the lack of ‘pure’ AI languages in supporting structured representations. This work presents the extension of the Java language towards finite domain constraint programming. This extension has been possible thanks to a high‐level approach to low‐level resource management: the sleeper mechanism. As practical results, this paper shows how Java programmers can develop meaningful applications in which finite domain constraints have been extensively used, as in the design of visual, interactive, user‐interface environments in a client–server architecture. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

18.
Because multicore CPUs have become the standard with all major hardware manufacturers, it becomes increasingly important for programming languages to provide programming abstractions that can be mapped effectively onto parallel architectures. Stream processing is a programming paradigm where computations are expressed as independent actors that communicate via FIFO data-channels. The coarse-grained parallelism exposed in stream programs facilitates such an efficient mapping of actors onto the underlying multicore hardware. We propose a stream-parallel programming abstraction that extends object-oriented languages with stream-programming facilities. StreamPI consists of a class hierarchy for actor-specification together with a language-independent runtime system that supports the execution of stream programs on multicore architectures. We show that the language-specific part of StreamPI, i.e., the class hierarchy, can be implemented as a library-level programming language extension. A library-level extension has the advantage that an existing programming language implementation need not be touched. Legacy-code can be mixed with a stream-parallel application, and the use of sequential legacy code with actors is supported. Unlike previous approaches, StreamPI allows dynamic creation and subsequent execution of stream programs. StreamPI actors are typed. Type-safety is achieved through type-checks at stream graph creation time. We have implemented StreamPI??s language-independent runtime system and language interfaces for Ada?2005 and C++ for Intel multicore architectures. We have evaluated StreamPI for up to 16 cores on a two?CPU 8-core Intel Xeon X7560 server, and we provide a performance comparison with StreamIt?(Gordon et al. in International Conference on Architectural Support for Programming Languages and Operating Systems, 2006), which is the de facto standard for stream-parallel programming. Although our approach provides greater programming flexibility than StreamIt, the performance of StreamPI compares favorably to the static compilation model of StreamIt.  相似文献   

19.
20.
In this paper we deal with building parallel programs based on sequential application code and generic components providing specific functionality for parallelization, like load balancing or fault tolerance. We describe an architectural approach employing aspect‐oriented programming to assemble arbitrary object‐oriented components. Several non‐trivial crosscutting concerns arising from parallelization are addressed in the light of different applications, which are representative of the most common types of parallelism. In particular, we demonstrate how aspect‐oriented techniques allow us to leave all existing code untouched. We evaluate and compare our approach with its counterparts in conventional object‐oriented programming. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号