首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we introduce an analytical technique based on queueing networks and Petri nets for making a performance analysis of dataflow computations when executed on the Manchester machine. This technique is also applicable for the analysis of parallel computations on multiprocessors. We characterize the parallelism in dataflow computations through a four-parameter characterization, namely, the minimum parallelism, the maximum parallelism, the average parallelism and the variance in parallelism. We observe through detailed investigation of our analytical models that the average parallelism is a good characterization of the dataflow computations only as long as the variance in parallelism is small. However, significant difference in performance measures will result when the variance in parallelism is comparable to or higher than the average parallelism.  相似文献   

2.
Issues affecting the performance of dataflow computers at the machine and language levels are explored. It is suggested that performance is dictated by the nature and the means of identification, distribution and control of workload in the hardware system. Dataflow is an asynchronous concurrent notation based on fine-grain message-passing in graphical programs. Dataflow machines comprise multiple processing elements and structure store modules connected together via a packet-based switching network. Workload is in the form of finegrain data packets which trigger instruction-level activity in the various components of the hardware architecture. Workload is identified by a compiler for a high-level, single-assignment language, and is distributed across the hardware components dynamically at run-time. The amount of work at any instant can be controlled by a parallelism “throttle”. The paper studies the performance of one example of a dataflow computer, the Manchester Dataflow Machine (MDFM).  相似文献   

3.
A 3D iteration space visualizer (ISV) is presented to analyze the parallelism in loops and to find loop transformations which enhance the parallelism. Using automatic program instrumentation, the iteration space dependency graph (ISDG) is constructed, which shows the exact data dependencies of arbitrarily nested loops. Various graphical operations such as rotation, zooming, clipping, coloring and filtering, permit a detailed examination of the dependence relations. Furthermore, an animated dataflow execution shows the maximal parallelism and the parallel loops are indicated automatically by an embedded data dependence analysis. In addition, the user may discover and indicate additional parallelism for which a suitable unimodular loop transformation is calculated and verified. The ISV has been applied to parallelize algorithmic kernel programs, a computational fluid dynamics (CFD) simulation code, the detection of statement-level parallelism and loop variable privatization. The applications show that the visualizer is a versatile and easy to use tool for the high-performance application programmer.  相似文献   

4.
Real-time computer vision systems often make use of dedicated image processing hardware to perform the pixel-oriented operations typical of early vision. This type of hardware is notoriously difficult to program, limiting the types of experiments that can be performed and posing a serious obstacle to research progress. This paper describes a pair of programming tools that we have developed to simplify the task of building real-time early vision systems using special-purpose hardware. The system allows users to describe computations in terms of coarse-grained dataflow graphs constructed using an interactive graphical tool. At initialization time it compiles these graphs into efficient executable programs for the underlying hardware. The system has been implemented on a popular commercial pipelined image processor. We describe the computational model that the system supports, the facilities it provides for building real-vision applications, and the algorithms used to generate effective execution schedules for the target machine.  相似文献   

5.
6.
Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster  相似文献   

7.
图计算已成为大数据处理领域的主流应用,采用特定硬件加速可以显著提高图计算的性能和能效.众所周知,硬件代码的编写和验证十分耗时,尽管通用高层次综合(high level synthesis,HLS)系统允许用户使用高级语言(如C语言)特性自动生成硬件结构,但是对于图计算这种不规则算法,其仍缺乏有效的并行性和访存技术支撑,存在综合效果不理想、效率不高等突出问题.提出一种面向图计算的高效HLS方法,结合图算法嵌套循环、随机访存、数据冲突以及幂律分布等特性,采用数据流架构实现高效的并行流水线,保证处理单元的负载均衡.通过提供的编程原语,提出的方法可将通用图算法转化为模块化的数据流中间表示形式,进而映射到参数化的硬件模板.在Xilinx Virtex UltraScale+XCVU9P的实现验证了方法的正确性,不同类型的图算法在多个数据集上的实验结果表明,相比国际上通用的Spatial HLS系统,提出的方法可达到7.9~30.6倍的性能提升.  相似文献   

8.
Lee  B. Hurson  A.R. 《Computer》1994,27(8):27-39
Contrary to initial expectations, implementing dataflow computers has presented a monumental challenge. Now, however, multithreading offers a viable alternative for building hybrid architectures that exploit parallelism. The eventual success of dataflow computers will depend on their programmability. Traditionally, they've been programmed in languages such as Id and SISAL (Streams and Iterations in a Single Assignment Language) that use functional semantics. These languages reveal high levels of concurrency and translate on to dataflow machines and conventional parallel machines via the Threaded Abstract Machine (TAM). However, because their syntax and semantics differ from the imperative counterparts such as Fortran and C, they have been slow to gain acceptance in the programming community. An alternative is to explore the use of established imperative languages to program dataflow machines. However, the difficulty will be analyzing data dependencies and extracting parallelism from source code that contains side effects. Therefore, more research is still needed to develop compilers for conventional languages that can produce parallel code comparable to that of parallel functional languages  相似文献   

9.
This paper presents an interactive Java software platform which enables users to easily create advanced robotic applications together with Computer Vision processing. This novel tool is composed of two layers: (1) Easy Java Simulations (EJS), an open-source tool which provides support for creating applications with a full 2D/3D interactive graphical interface, and (2) EjsRL, a high-level Java library specifically designed for EJS which provides a complete functional framework for modeling and simulation of arbitrary serial-link manipulators, Computer Vision algorithms and remote operation. The combination of both components sets up a software architecture which contains a high number of functionalities in the same platform to develop complex simulations in Robotics and Computer Vision fields. In addition, the paper shows its successful application to virtual and remote laboratories, web-based resources that enhance the accessibility of experimental setups for education and research.  相似文献   

10.
The paper presents a dataflow execution model, DIALOG, for logic programs which operates on an intermediate virtual machine. The virtual machine is granulated at clause argument level to exploit argument parallelism through unification. The model utilises a new variable binding scheme that eliminates dereference operations for accessing variables, and therefore supports OR-parallelism in the highly distributed dataflow environment. The model has been implemented in Occam. A conventional dataflow architecture in support of the model has been simulated as a testbed for the evaluation. The simulation indicates some encouraging results and suggests future improvements.  相似文献   

11.
Although dataflow computers have many attractive features, skepticism exists concerning their efficiency in handling arrays (vectors) in high performance scientific computation. This paper outlines an efficient implementation scheme for arrays in applicative languages (such as VAL and SISAL) based on the principles of dataflow software pipelining. It illustrates how the fine-grain parallelism of dataflow approach can effectively handle large amount of data structured in applicative array operations. This is done through dataflow software pipelining between pairs of code blocks which act as producer-consumer of array values. To make effective use of the pipelined code mapping scheme, a compiler needs information concerning the overall program structure as well as the structure of each code block. An applicative language provides a basis for such analysis.

The program transformation techniques described here are developed primarily for the computationally intensive part of a scientific numerical program, which is usually formed by one or a few clusters of acyclic connected code blocks. Each code block defines an array value from several input arrays. We outline how mapping decisions of arrays can be based on a global analysis of attributes of the code blocks. We emphasize the role of overall program structure and the strategy of global optimization of the machine code structure. The structure of a proposed dataflow compiler based on the scheme described in this paper is outlined.  相似文献   


12.
To effectively investigate the mechanical performance of microstructure-based layered composites, an object-oriented software with interactive graphical user interface has been developed. This software, named PCLab, is able to analyze the microstructure evolution and mechanical performance by both Monte Carlo (MC) simulation and the Finite Element method (FEM). The software has integrated preprocessors, solvers and postprocessors. Some examples are tested and explored the functionality of the software package. It shows that the PCLab software with a user-friendly graphical interface provides an efficient tool for faster material analysis, design and application. It also provides a flexible, robust platform for the future extensity in the material multi-physics research.  相似文献   

13.
Dataflow models are free of side effects and have no notion of state or sequencing. Because these representations place a partial, as opposed to a total, ordering on the execution of their component operations, the concurrent aspects of computation are clearly revealed. The correspondence between dataflow graphs and purely functional programs allows computations to be expressed in a high-level functional language and subsequently transformed into a dataflow graph. This paper describes the use of dataflow models as an alternative control strategy for engineering analysis programs and contrasts them with traditional imperative approaches. The characteristics of functional languages are also described, as is their inherent parallelism, which may be realized by compilation into dataflow graphs. The application of functional languages to finite element programming is presented, which allows the alternating assembly and solution of system equations found in frontal solvers. Issues such as the incremental update of arrays and the simulation of state are also addressed.  相似文献   

14.
15.
A programming aid for hypercube architectures   总被引:1,自引:0,他引:1  
A program development tool that automatically performs scheduling and synchronization insertion for hypercube systems is presented in this paper. We use a programming methodology in which a program is written as a set of procedures called from the main program. This program is converted into a macro dataflow graph which is used by the tool to generate executable code for hypercube machines. The program restructured by our tool outperformed the manually developed code, while increasing the program development productivity at the same time.  相似文献   

16.
In this paper, we study several issues related to the medium grain dataflow model of execution. We present bottom-up compilation of medium grainclusters from a fine grain dataflow graph. We compare thebasic block and thedependence sets algorithms that partition dataflow graphs into clusters. For an extensive set of benchmarks we assess the average number of instructions in a cluster and the reduction in matching operations compared with fine grain dataflow execution. We study the performance of medium grain dataflow when several architectural parameters, such as the number of processors, matching cost, and network latency, are varied. The results indicate that medium grain execution offers a good speedup over the fine grain model, that it is scalable, and tolerates network latency and high matching costs well. Medium grain execution can benefit from a higher output bandwidth of a processor and fainally, a simple superscalar processor with an issue rate of two is sufficient to exploit the internal parallelism of a cluster. This work is supported in part by NSF Grants CCR-9010240 and MIP-9113268.  相似文献   

17.
针对现有推荐方法存在交互信息应用不充分和推荐性能不佳的问题,充分利用用户和项目之间的间接交互信息,采用可达矩阵来表达用户和项目之间的间接交互关系,通过可达矩阵与因式分解机有机融合,构建了一个新的推荐方法.在Amazon-Book、Last-FM和Yelp2018数据集上的实验表明,所提方法在推荐效果上既优于传统的基于因式分解机的推荐方法,又好于最新的基于神经网络因式分解机的推荐模型,在推荐的时间效率上比基于知识图谱注意力网络的推荐方法具有明显优势.同时,相对其他推荐方法,该方法还具有更好的可解释性.  相似文献   

18.
Unified modeling language (UML) activity diagrams can model the flow of stateful business objects among activities, implicitly specifying the life cycles of those objects. The actual object life cycles are typically expressed in UML state machines. The implicit life cycles in UML activity diagrams need to be discovered in order to derive the actual object life cycles or to check the consistency with an existing life cycle. This paper presents an automated approach for synthesizing a UML state machine modeling the life cycle of an object that occurs in different states in a UML activity diagram. The generated state machines can contain parallelism, loops, and cross-synchronization. The approach makes life cycles that have been modeled implicitly in activity diagrams explicit. The synthesis approach has been implemented using a graph transformation tool and has been applied in several case studies.  相似文献   

19.
基本图是一种描述并发交互系统的形式化方法,它具有描述简洁、清晰和直观的特点.在基本图中,进程以及进程间的交互行为,既可以用图表示方法(其中一个图对应一个进程,图之间的归约对应进程之间的交互行为),又可以用项代数来表示.本文对基本图的图表示中进行了一些改动,并在此基础上给出了这两种表示方法之间的转换算法,同时实现了具有创建、编辑和转换功能的基本图规范描述工具.  相似文献   

20.
The construction of efficient parallel programs usually requires expert knowledge in the application area and a deep insight into the architecture of a specific parallel machine. Often, the resulting performance is not portable, i.e., a program that is efficient on one machine is not necessarily efficient on another machine with a different architecture. Transformation systems provide a more flexible solution. They start with a specification of the application problem and allow the generation of efficient programs for different parallel machines. The programmer has to give an exact specification of the algorithm expressing the inherent degree of parallelism and is released from the low-level details of the architecture. We propose such a transformation system with an emphasis on the exploitation of the data parallelism combined with a hierarchically organized structure of task parallelism. Starting with a specification of the maximum degree of task and data parallelism, the transformations generate a specification of a parallel program for a specific parallel machine. The transformations are based on a cost model and are applied in a predefined order, fixing the most important design decisions like the scheduling of independent multitask activations, data distributions, pipelining of tasks, and assignment of processors to task activations. We demonstrate the usefulness of the approach with examples from scientific computing  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号