首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we study several issues related to the medium grain dataflow model of execution. We present bottom-up compilation of medium grainclusters from a fine grain dataflow graph. We compare thebasic block and thedependence sets algorithms that partition dataflow graphs into clusters. For an extensive set of benchmarks we assess the average number of instructions in a cluster and the reduction in matching operations compared with fine grain dataflow execution. We study the performance of medium grain dataflow when several architectural parameters, such as the number of processors, matching cost, and network latency, are varied. The results indicate that medium grain execution offers a good speedup over the fine grain model, that it is scalable, and tolerates network latency and high matching costs well. Medium grain execution can benefit from a higher output bandwidth of a processor and fainally, a simple superscalar processor with an issue rate of two is sufficient to exploit the internal parallelism of a cluster. This work is supported in part by NSF Grants CCR-9010240 and MIP-9113268.  相似文献   

2.
《Parallel Computing》2014,40(3-4):1-33
There has been a renewed interest in dataflow computing models in recent years of technology scaling. Potentiality of exploiting huge parallelism, with the expense of low power, simpler circuit, less silicon area, is the main characteristic of a dataflow model. Growing trends in housing large number of functional units in a single chip, making use of local clocks, reducing energy consumptions, avoiding global wires are the main reasons behind the resurgence of dataflow models. To program a dataflow machine, new architectures suggest imperative languages rather than functional type dataflow languages or parallel languages because this is the right way to make the new architectures popular among the general community. Although for several decades scientists have been working on how imperative languages can be used in dataflow models efficiently, there is no systematic review on those works. Existing reviews on dataflow paradigm mainly focus on the architectures. Although few papers review programming languages of dataflow architectures, their discussions are limited to only dataflow languages and visual programming languages which are fundamentally different from imperative languages. In this paper, we conduct a systematic review on those works that attempt to provide a way to use imperative languages in any type of dataflow architectures. Our survey of compilers and related architectures cover the aspects like translation mechanisms of program construct, their optimization techniques, memory ordering methods, program allocation and scheduling and special architectural features. We also present some of our observations and future research directions obtained by exploring the literature.  相似文献   

3.
相关路径静态分析中协同式逆向推理方法   总被引:1,自引:1,他引:0  
郭曦  王盼 《软件学报》2015,26(1):1-13
相关路径生成,是程序动态分析中的一种重要方法.通过对目标执行路径的获取和分析来生成与其相关的近邻执行路径,在程序行为特征分析、编译优化和调试等研究方向有重要的作用.现有的方法主要通过改变路径节点序列来生成近邻的路径集合,由于缺乏关键节点的路径引导信息,导致生成大量冗余或者无效的路径集合.提出采用协同式逆向分析的近邻路径生成方法,针对目标路径的后置条件,采用逆向符号分析方法产生程序各个基本块的前置条件作为执行路径的引导信息.同时,通过调整距离因子k的取值,可以有针对性地生成与目标路径的编辑距离不超过k的近邻路径集合.实验结果表明:与现有方法相比,该方法在准确性和效率方面有明显的优势.  相似文献   

4.
This paper presents an extended architecture and a scheduling algorithm for a dataflow computer aimed at real-time processing. From the real-time processing point of view, current dataflow computers have several problems which stem from their hardware mechanisms for scheduling instructions based on data synchronization. This mechanism extracts as many eligible instructions as possible for execution of a program, then executes them in parallel. Hence, the computation in a dataflow computer is generally difficult to interrupt and schedule using software. To realize a controllable dataflow computation, two basic mechanisms are introduced for serializing concurrent processes and interrupting the execution of a process. A parallel and distributed algorithm for the scheduler is presented, with these two mechanisms, which controls and decides state transitions and execution order of the processes based on priority and execution depth, while still maintaining the number of the running state processes at a preferred value. To gear the scheduler algorithm to meet one of the requirements for real-time processing, such as time-constrained computing, a data-parallel algorithm for selection of the user-process with the current highest priority in O (x log x n) time is proposed, where n is the number of priority levels.  相似文献   

5.
The dataflow program graph execution model, or dataflow for short, is an alternative to the stored-program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. In the last thirty or so years since it was proposed, the dataflow model of computation has been used and developed in very many areas of computing research: from programming languages to processor design, and from signal processing to reconfigurable computing. This paper is a review of the current state-of-the-art in the applications of the dataflow model of computation. It focuses on three areas: multithreaded computing, signal processing and reconfigurable computing.  相似文献   

6.
The software model checker Blast   总被引:2,自引:0,他引:2  
Blast is an automatic verification tool for checking temporal safety properties of C programs. Given a C program and a temporal safety property, Blast either statically proves that the program satisfies the safety property, or provides an execution path that exhibits a violation of the property (or, since the problem is undecidable, does not terminate). Blast constructs, explores, and refines abstractions of the program state space based on lazy predicate abstraction and interpolation-based predicate discovery. This paper gives an introduction to Blast and demonstrates, through two case studies, how it can be applied to program verification and test-case generation. In the first case study, we use Blast to statically prove memory safety for C programs. We use CCured, a type-based memory-safety analyzer, to annotate a program with run-time assertions that check for safe memory operations. Then, we use Blast to remove as many of the run-time checks as possible (by proving that these checks never fail), and to generate execution scenarios that violate the assertions for the remaining run-time checks. In our second case study, we use Blast to automatically generate test suites that guarantee full coverage with respect to a given predicate. Given a C program and a target predicate p, Blast determines the program locations q for which there exists a program execution that reaches q with p true, and automatically generates a set of test vectors that cause such executions. Our experiments show that Blast can provide automated, precise, and scalable analysis for C programs.  相似文献   

7.
We explore how formal methods and tools of the verification trade could be used for malware detection and analysis. In particular, we propose a new approach to learning and generalizing from observed malware behaviors based on tree automata inference. Our approach infers k-testable tree automata from system call dataflow dependency graphs. We show how inferred automata can be used for malware recognition and classification.  相似文献   

8.
While the dataflow execution model can potentially uncover all forms and levels of parallelism in a program, in its traditional fine grain form it does not exploit any form of locality. Recent evidence indicates that the exploitation of locality in dataflow programs could have a dramatic impact on performance. The current trend in the design of dataflow processors suggests a synthesis of traditional nonstrict fine grain instruction execution and strict coarse grain execution in order to exploit locality. While an increase in instruction granularity favors the exploitation of locality within a single execution thread, the resulting grain size may increase latency among execution threads. We define fine grain intrathread locality as a dynamic measure of instruction level locality and quantify it using a set of numeric and nonnumeric benchmarks. The results point to a very large degree of intrathread locality and a remarkable uniformity and consistency of the distribution of thread locality across a wide variety of benchmarks. As the execution is moved to a coarser granularity it can result in an increase of the input latency of operands that would have a detrimental effect on performance. We evaluate the resulting latency incurred through the partitioning of fine grain instructions into coarser grain threads. We define the concept of a cluster of fine grain instructions to quantify coarse grain input and output latencies. The results of our experiments offer compelling evidence that a coarse grain execution outperforms a fine grain grain one on a significant number of numeric codes. These results suggest that the effects of increased instruction granularity on latency is minimal for a high percentage of the measured codes, and in large part is offset by available intrathread locality. Furthermore, simulation results indicate that strict or nonstrict data structure access does not change the basic cluster characteristics.  相似文献   

9.
The execution performance of an information gathering plan can suffer significantly due to remote I/O latencies. A streaming dataflow model of execution addresses the problem to some extent, exploiting all natural opportunities for parallel execution, as allowed by the data dependencies in a plan. Unfortunately, plans that integrate information from multiple sources often use the results of one operation as the basis for forming queries to a subsequent operation. Such cases require sequential execution, an inefficiency that can erase prior gains made through techniques like streaming dataflow. To address this problem, we present a technique called speculative plan execution, an out-of-order method that capitalizes on knowledge gained from prior executions as a means for overcoming remaining data dependencies between plan operators. Our approach inserts additional plan operators that generate and confirm speculative results, while preserving the safety and fairness of overall execution. To increase the utility of speculative execution, we propose a method of value prediction that combines caching with the more effective and space-efficient techniques of classification and transduction. We present experimental results that demonstrate how the performance of information gathering plans can benefit from speculative execution and how its overall utility can be increased through our hybrid method of value prediction.  相似文献   

10.
Constituent tasks of modern day Embedded Streaming Applications (ESAs), such as engine control systems, multimedia and software defined radios often exhibit execution behaviors that do not conform to conventional task models. ESAs consist of iterative, pipelined sequences of tasks that are conditioned by intra- and inter-iteration dependencies, and often have strict throughput and latency requirements. We model ESAs as dataflow graphs, where actors represent computational units, and directed edges represent communication channels between actors. Due to practical constraints like cost-effectiveness, power consumption and chip-area, multiple ESAs are run on a shared (multi-processor) platform. Thus rigorous timing analysis is required to verify whether individual ESAs meet their respective timing requirements.We look at response modeling, a compositional timing analysis approach wherein the local worst-case influence of runtime scheduling is represented within the constructs provided in dataflow. These local representations (called response models) can be composed together to construct a global understanding of an ESA’s worst-case execution which is then used to verify whether its real-time requirements are met. This paper proposes a generic response modeling technique for runtime scheduling of ESAs. We focus on preemptive Fixed Priority Scheduling (FPS) but also demonstrate that we can apply our technique to a wide range of runtime schedulers. In our experiments, we present academic and industrial case-studies that highlight the effectiveness of our approach in the timing analysis of ESAs with unconventional execution behavior.  相似文献   

11.
This paper describes a software environment that allows the programming and configuration of a heterogeneous multiprocessor architecture consisting of specialized hardware and a general purpose trasputer net. Run-time-behavior estimation is done at a compile time by using directed dataflow graphs to represent the program and a database containing all the relevant information about functions. The system offers the implementations of functions running on different hardware platforms, static configuration and scheduling, and integrated performance measurement. We show how programmers can handle such a system without any deeper knowledge of the underlying specialized hardware.  相似文献   

12.
在高性能计算领域,数据流是一类重要的计算结构,也在很多实际场景表现出很好的性能和适用性。在数据流计算模式中,程序是以数据流图来表示的,数据流计算中一个关键的问题是如何将数据流图映射到多个执行单元上。通过分析现有数据流结构的指令映射方法及其不足,提出了基于数据流结构的新型指令映射优化方法。主要是根据多地址共享数据包的特性对指令映射方法进行优化,延迟多地址共享数据路由包的拆分,减少网络拥堵。  相似文献   

13.
Dataflow programs are widely used. Each program is a directed graph where nodes are computations and edges indicate the flow of data. In prior work, we reverse-engineered legacy dataflow programs by deriving their optimized implementations from a simple specification graph using graph transformations called refinements and optimizations. In MDE speak, our derivations were PIM-to-PSM mappings. In this paper, we show how extensions complement refinements, optimizations, and PIM-to-PSM derivations to make the process of reverse engineering complex legacy dataflow programs tractable. We explain how optional functionality in transformations can be encoded, thereby enabling us to encode product lines of transformations as well as product lines of dataflow programs. We describe the implementation of extensions in the \(\mathtt{ReFlO}\) tool and present two non-trivial case studies as evidence of our work’s generality.  相似文献   

14.
两倍缓冲是有效机制隐藏在在薄片上和离开薄片记忆之间的数据转移的潜伏。在 dataflow 建筑学,因为 dataflow 加速器的重复充满并且排干,然而,交换二在许多瓦减少的执行期间缓冲性能。在这个工作,我们为 dataflow 建筑学建议连续双的缓冲机制。没有停止通过在 dataflow 建筑学优化控制逻辑处理元素的执行,建议不停的机制把瓦分到处理元素数组。而且,我们建议一个工作流节目与连续双的缓冲机制合作。在控制逻辑上并且在工作流节目上的优化以后,充满并且排干数组需要越过属于一样的 dataflow 图的所有瓦的执行被做仅仅一次。试验性的结果证明没有优化,为 dataflow 建筑学的建议双缓冲机制在那上完成 16.2% 平均效率改进。  相似文献   

15.
ChARM is a simulation tool for tuning ARM-based embedded systems that include cache memories. ChARM provides a parametric, trace-driven simulation for tuning system configuration. A designer can observe performance while varying the timing, the architectural features, and the management policies of the system components. Designers can therefore evaluate the execution time of the program, the time spent in memory accesses, miss ratio, code miss ratio, and data miss ratio, and the number of burst-read operations. They can also evaluate the number of write operations for write-through cache models and burst-write operations for copy-back cache models. finally, ChARM's program locality analysis illustrates the sequentiality, temporality, and loops of a program in easy-to-read three dimensional graphs. These graphs, together with the graphs showing the distribution of the replacement conflicts in cache, help designers understand how a program works and how it stresses the memory hierarchy  相似文献   

16.
We present a method for profiling programs that are written using domain-specific languages. Instead of reporting execution in terms of implementation details as in most existing profilers, our method operates at the level of the problem domain. Program execution generates a stream of events that summarises the execution in terms of domain concepts and operations. The events enable us to construct a hierarchical model of the execution. A flexible reporting system summarises the execution along developer-chosen model dimensions. The result is a flexible way for a developer to explore the execution of their program without requiring any knowledge of the domain-specific language implementation.These ideas are embodied in a new profiling library called dsprofile that is independent of the problem domain so it has no specific knowledge of the data and operations that are being profiled. We illustrate the utility of dsprofile by using it to profile programs that are written using our Kiama language processing library. Specifically, we instrument Kiama's attribute grammar and term rewriting domain-specific languages to use dsprofile to generate events that report on attribute evaluation and rewrite rule application. Examples of typical language processing tasks show how domain-specific profiling can help to diagnose problems in Kiama-based programs without the developer needing to know anything about how Kiama is implemented.  相似文献   

17.
This paper presents an unusually simple approach to dynamic dataflow execution, called the Explicit Token Store (ETS) architecture, and its current realization in Monsoon. The essence of dynamic dataflow execution is captured by a simple transition on state bits associated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dynamically in hardware. The processor is simple, highly pipelined, and quite general. There is exactly one instruction executed for each action on the dataflow graph. Thus, the machine-oriented ETS model provides new insight into the real cost of direct execution of dataflow graphs.  相似文献   

18.
Software architecture contains, in addition to its structural part, interaction patterns that can be regarded as part of the architectural solution of the system. The interaction patterns define architecturally significant behavior of the software system. In this paper we propose a visual modeling language, behavioral profiles, for specifying architecturally significant behavioral rules for an application. The language is built on the Unified Modeling Language (UML), which is a visual language widely used in software development. We show how behavioral profiles can be used to support software designers in creating behavioral models that conform to some predefined rules and for ensuring that an application behaves correctly with respect to the rules given in the profiles. A tool called Bebop was built to support software engineers in behavioral profile‐based design and analysis of program behavior. To evaluate the approach and the tools in different application domains, they are utilized in three cases. The size of the applications used in the cases varies from small to quite large software systems, and from academic to industrial ones. The examples demonstrate how the approach presented can be used in practice for different steps in a software engineering process. The examples cover specializing an application framework and monitoring the program execution in run‐time. In addition, they show how behavioral profiles can be used to support guided program comprehension and to validate program execution by analyzing execution traces. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
20.
This paper presents a new dataflow graph model, where only data tokens are allowed to flow. First we introduce a High-Level Dataflow System (HLDS) to describe a formal dataflow graph model, then we present a homogeneous HLDS (hHLDS) that formally describes our proposal. In this proposal the dataflow graph is obtained by employing only actors with homogeneous I/O conditions, that is, each actor, which executes an elemental operation, is characterised by having one output and two input arcs. Even though no control tokens are allowed, i.e. no T-gate, merge, and switch actors are present in this model, it is always possible to obtain dataflow graphs, which represent any programming structure and whose behaviour is well-behaved. As homogeneous I/O conditions are a severe restriction to represent the flow of a computation and the token flow in such dataflow graphs is completely asynchronous, proof is given to guarantee their determinacy. The main advantage of this representation is that it maps directly to hardware through a one-to-one correspondence between actors of the model and Functional Units of a dataflow machine.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号