期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Transparent Speculative Parallelization of Discrete Event Simulation Applications Using Global Variables

Alessandro?Pellegrini Sebastiano?Peluso Francesco?Quaglia Email author Roberto?Vitali 《International journal of parallel programming》2016,44(6):1200-1247

Parallelizing (compute-intensive) discrete event simulation (DES) applications is a classical approach for speeding up their execution and for making very large/complex simulation models tractable. This has been historically achieved via parallel DES (PDES) techniques, which are based on partitioning the simulation model into distinct simulation objects (somehow resembling objects in classical object-oriented programming), whose states are disjoint, which are executed concurrently and rely on explicit event-exchange (or event-scheduling) primitives as the means to support mutual dependencies and notification of their state updates. With this approach, the application developer is necessarily forced to reason about state separation across the objects, thus being not allowed to rely on shared information, such as global variables, within the application code. This implicitly leads to the shift of the user-exposed programming model to one where sequential-style global variable accesses within the application code are not allowed. In this article we remove this limitation by providing support for managing global variables in the context of DES code developed in ANSI-C, which gets automatically parallelized. Particularly, we focus on speculative (also termed optimistic) PDES systems that run on top of multi-core machines, where simulation objects can concurrently process their events with no guarantee of causal consistency and actual violations of causality rules are recovered through rollback/recovery schemes. In compliance with the nature of speculative processing, in our proposal global variables are transparently mapped to multi-versions, so as to avoid any form of safety predicate verification upon their updates. Consistency is ensured via the introduction of a new rollback/recovery scheme based on detecting global variables’ reads on non-correct versions. At the same time, efficiency in the execution is guaranteed by managing multi-version variables’ lists via non-blocking algorithms. Furthermore, the whole approach is fully transparent, being it based on automatized instrumentation of the application software (particularly ELF objects). Hence the programmer is exposed to the classical (and easy to code) sequential-style programming scheme while accessing any global variable. An experimental assessment of our proposal, based on a suite of case study applications, run on top of an off-the-shelf Linux machine equipped with 32 CPU-cores and 64 GB of RAM, is also presented. 相似文献

2.

Speculative Parallelization of Sequential Loops on Multicores

Chen Tian Min Feng Vijay Nagarajan Rajiv Gupta 《International journal of parallel programming》2009,37(5):508-535

The advent of multicores presents a promising opportunity for speeding up the execution of sequential programs through their parallelization. In this paper we present a novel solution for efficiently supporting software-based speculative parallelization of sequential loops on multicore processors. The execution model we employ is based upon state separation, an approach for separately maintaining the speculative state of parallel threads and non-speculative state of the computation. If speculation is successful, the results produced by parallel threads in speculative state are committed by copying them into the computation’s non-speculative state. If misspeculation is detected, no costly state recovery mechanisms are needed as the speculative state can be simply discarded. Techniques are proposed to reduce the cost of data copying between non-speculative and speculative state and efficiently carrying out misspeculation detection. We apply the above approach to speculative parallelization of loops in several sequential programs which results in significant speedups on a Dell PowerEdge 1900 server with two Intel Xeon quad-core processors. 相似文献

3.

基于进程投机并行的运行时系统设计与优化

刘雷李晶陈莉冯晓兵《计算机工程》2014,(3):99-102,112

投机并行化是解决遗留串行代码并行化的重要技术,但以往投机并行化运行时系统面临着诸多的性能问题,如任务分配不均衡、通信频繁、冲突代价高,以及进程启动,结柬频繁而导致开销过高等。为此,提出一种基于进程实现的投机并行化运行时系统。采用隐式单程序多数据的并行任务划分和执行模式。通过实现重甩进程的投机任务调度策略和委托正确性检查技术,降低投机进程启动/结束和通信的开销,提高投机进程的利用率,同时利用守护进程与投机进程协同执行的方式,确保在投机进程出现异常情况时程序也能正确执行。实验结果表明,该基于进程实现的投机运行时系统比同类型系统的性能提高231%。相似文献

4.

New Data Structures to Handle Speculative Parallelization at Runtime

Alvaro Estebanez Diego R. Llanos Arturo Gonzalez-Escribano 《International journal of parallel programming》2016,44(3):407-426

相似文献

5.

支持推测并行化的事务存储硬件模拟系统

李颀安虹李功明邓博斌《小型微型计算机系统》2013,34(5)

多核处理器通过增加处理器核数提高计算能力,虽然可以通过同时运行多道程序的方式利用处理器资源,但是多核处理器真正的成功取决于解决并行应用开发中的难题.为此,处理器体系结构和编程模型的协同开发是必须的.而随着核数的增多,传统上使用的软件模拟器因为软件的串行性而性能越来越差,无法支持这种软硬件协同开发.FPGA天生的并行性使它在模拟多核处理器时具有较高的模拟性能和高度的可扩放性,成为处理器体系结构研究的理想工具.本文介绍了基于FPGA的多核模拟系统,RAMP-Pink.该系统基于HASim实现,同时支持事务存储和线程级推测,用于对事务存储和线程级推测的软硬件协同开发.该模拟系统可配置不同的FPGA开发平台,也可以以软件模拟方式运行. 相似文献

6.

Variational Curve Skeletons Using Gradient Vector Flow

Hassouna M.S. Farag A.A. 《IEEE transactions on pattern analysis and machine intelligence》2009,31(12):2257-2274

Representing a 3D shape by a set of 1D curves that are locally symmetric with respect to its boundary (i.e., curve skeletons) is of importance in several machine intelligence tasks. This paper presents a fast, automatic, and robust variational framework for computing continuous, subvoxel accurate curve skeletons from volumetric objects. A reference point inside the object is considered a point source that transmits two wave fronts of different energies. The first front (beta-front) converts the object into a graph, from which the object salient topological nodes are determined. Curve skeletons are tracked from these nodes along the cost field constructed by the second front (alpha-front) until the point source is reached. The accuracy and robustness of the proposed work are validated against competing techniques as well as a database of 3D objects. Unlike other state-of-the-art techniques, the proposed framework is highly robust because it avoids locating and classifying skeletal junction nodes, employs a new energy that does not form medial surfaces, and finally extracts curve skeletons that correspond to the most prominent parts of the shape and hence are less sensitive to noise. 相似文献

7.

Polyhedral Objects Metamorphosis Using Convex Decomposition and Morphology

Wen-Yu?Liu Email author Hua?Li Fei?Wang Guang-Xi?Zhu 《Virtual Reality》2003,6(4):196-204

A new technique is presented for computing continuous shape transformations between polyhedral objects. The polyhedron shape transformations can be divided into polyhedron metamorphosis and bi-directional local rigid body rotation transformation. By decomposing two objects into sets of individual convex sub-objects respectively, and establishing the matching between two subsets, the approach can solve the metamorphosis problem of two non-homotopic objects (including concave objects and holey objects). Compared with other methods, this metamorphosis algorithm can be executed automatically for arbitrary polyhedrons and no need user interaction. The user has the ability to choose an automatic matching or to select interactively pairs of corresponding matching convex subsets to obtain special effects. Experiments show that this method can generate natural, high-fidelity, eye-pleasing metamorphosis results with simple computation. 相似文献

8.

Polyhedral Finite Elements Using Harmonic Basis Functions

Sebastian Martin Peter Kaufmann Mario Botsch Martin Wicke Markus Gross 《Computer Graphics Forum》2008,27(5):1521-1529

Finite element simulations in computer graphics are typically based on tetrahedral or hexahedral elements, which enables simple and efficient implementations, but in turn requires complicated remeshing in case of topological changes or adaptive refinement. We propose a flexible finite element method for arbitrary polyhedral elements, thereby effectively avoiding the need for remeshing. Our polyhedral finite elements are based on harmonic basis functions, which satisfy all necessary conditions for FEM simulations and seamlessly generalize both linear tetrahedral and trilinear hexahedral elements. We discretize harmonic basis functions using the method of fundamental solutions, which enables their flexible computation and efficient evaluation. The versatility of our approach is demonstrated on cutting and adaptive refinement within a simulation framework for corotated linear elasticity. 相似文献

9.

Interpolating Polyhedral Models Using Intrinsic Shape Parameters

YUE MAN SUN WENPING WANG FRANCIS Y. L. CHIN 《Computer Animation and Virtual Worlds》1997,8(2):81-96

Metamorphosis, or morphing, is the gradual transformation of one shape into another. It generally consists of two subproblems: the correspondence problem and the interpolation problem. This paper presents a solution to the interpolation problem of transforming one polyhedral model into another. It is an extension of the intrinsic shape interpolation scheme (T. W. Sederberg, P. Gao, G. Wang and H. Mu, ‘2-D shape blending: an intrinsic solution to the vertex path problem, SIGGRAPH '93, pp. 15–18.) for 2D polygons. Rather than considering a polyhedron as a set of independent points or faces, our solution treats a polyhedron as a graph representing the interrelations between faces. Intrinsic shape parameters, such as dihedral angles and edge lengths that interrelate the vertices and faces in the two graphs, are used for interpolation. This approach produces more satisfactory results than the linear or cubic curve paths would, and is translation and rotation invariant. © 1997 by John Wiley & Sons, Ltd. 相似文献

10.

Matching Polyhedral Terrains Using Overlays of Envelopes 总被引：2，自引：0，他引：2

Vladlen Koltun Carola Wenk 《Algorithmica》2005,41(3):159-183

For a collection $\F$ of $d$-variate piecewise linear functions of overall combinatorial complexity $n$, the lower envelope $\E(\F)$ of $\F$ is the pointwise minimum of these functions. The minimization diagram $\M(\F)$ is the subdivision of $\reals^d$ obtained by vertically (i.e., in direction $x_{d+1}$) projecting $\E(\F)$. The overlay $\O(\F,\G)$ of two such subdivisions $\M(\F)$ and $\M(\G)$ is their superposition. We extend and improve the analysis of de Berg et al. \cite{bgh-vdt3s-96} by showing that the combinatorial complexity of $\O(\F,\G)$ is $\Omega(n^d \alpha^{2}(n))$ and $O(n^{d+\eps})$ for any $\eps>0$ when $d \ge 2$, and $O(n^2 \alpha(n) \log n)$ when $d=2$. We also describe an algorithm that constructs $\O(\F,\G)$ in this time. We apply these results to obtain efficient general solutions to the problem of matching two polyhedral terrains in higher dimensions under translation. That is, given two piecewise linear terrains of combinatorial complexity $n$ in $\reals^{d+1}$, we wish to find a translation of the first terrain that minimizes its distance to the second, according to some distance measure. For the perpendicular distance measure, which we adopt from functional analysis since it is natural for measuring the similarity of terrains, we present a matching algorithm that runs in time $O(n^{2d+\eps})$ for any $\eps>0$. Sharper running time bounds are shown for $d \le 2$. For the directed and undirected \Hd\ distance measures, we present a matching algorithm that runs in time $O(n^{d^2+d+\eps})$ for any $\eps>0$. 相似文献

11.

Engineering Energy Efficient Visual Sensor Network Applications Using Skeletons

Stefano Chessa Susanna Pelagatti Nicoletta Triolo 《International journal of parallel programming》2014,42(4):663-680

Visual sensor networks (VSNs) perform complex scene analysis algorithms that require significant computations and communications. Under this respect, the use of skeletons contributes to reduce the complexity of VSN programming and may ensure an easier and better optimization of the code. In this context, we propose INS, a stencil based skeleton targeted for wireless/visual sensor networks (W/VSNs) and give a preliminary analysis of its benefits using tracking as a case study. INS abstracts a distributed approximation schema in which the estimation of a given metric is organized in a sequence of steps. Each step includes collecting estimates from some neighbor nodes and local computation of a new approximation. In particular, INS takes inspiration from some stencil based skeletons proposed for parallel computation and merges it with the classical event driven model typical of sensor programming. As a result, the execution of each step is triggered by the detection of a relevant event in the environment. Tracking consists in periodically predicting position and velocity of one or more mobile targets. We discuss how INS can be instantiated to a distributed version of Kalman filtering. As energy efficiency is central in W/VSNs, we derive analytic models for energy dissipation of the INS skeleton depending on different concepts of neighborhood for the data exchanged at each step. Then, these models are used to guide the deployment of our tracking application on a real scenario. 相似文献

12.

利用连续两阶段在线剖析优化多线程推测执行 总被引：1，自引：0，他引：1

刘圆安虹汪芳王莉王耀彬《小型微型计算机系统》2009,30(3)

针对当前推测多线程优化中使用的离线剖析受到训练输入集限制的问题,提出一种根据在线剖析结果自动变换推测多线程程序的动态优化方法.该方法在程序运行时执行剖析和优化工作,不需要单独的剖析过程以及通用的训练输入集.该方法也适用于那些运行时行为特征呈阶段性变化的程序.实验表明,在指导事务划分和选择并行循环方面,动态优化方法能够达到和静态优化方法相似的效果,完全可以在离线剖析失效时被使用. 相似文献

13.

Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions

Chunhua Liao Daniel J. Quinlan Jeremiah J. Willcock Thomas Panas 《International journal of parallel programming》2010,38(5-6):361-378

Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. Modern applications using high-level abstractions, such as C++ STL containers and complex user-defined class types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we use a source-to-source compiler infrastructure, ROSE, to explore compiler techniques to recognize high-level abstractions and to exploit their semantics for automatic parallelization. Several representative parallelization candidate kernels are used to study semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Preliminary results have shown that semantics of abstractions can help extend the applicability of automatic parallelization to modern applications and expose more opportunities to take advantage of multicore processors. 相似文献

14.

A Structural Approach for Modelling Performance of Systems Using Skeletons

Gagarine Yaikhom Murray Cole Stephen Gilmore Jane Hillston 《Electronic Notes in Theoretical Computer Science》2007,190(3):167

In this paper, we discuss a structural approach to automatic performance modelling of skeleton based applications. This uses a synthesis of performance evaluation process algebra (pepa) and a pattern-oriented hierarchical expression scheme. Such approaches are important in parallel and distributed systems where the performance models must be updated regularly based on the current state of the resources. 相似文献

15.

Sweeping体的拓扑结构与精确B—Rep反算策略

段卫垠王启富《计算机辅助设计与图形学学报》1992,4(4):36-43

本文根据五种造型法则:平行扫法则、回转扫法则、箱体法则、异形体法则和曲面立体法则的离散化原理,提出了Sweeping体的拓扑结构的生成方法。讨论了离散化多面体边界的几何属性码和CSG-索引方法,以及基于这种几何属性码和CSG-索引的精确B-Rep反算策略。相似文献

16.

基于事务性执行的投机并行多线程软件模拟

姚震郑启龙陈国良杨晓奇《小型微型计算机系统》2008,29(3):437-443

基于事务性执行的投机并行多线程是一种适合未来多核微处理器架构的新型并行程序设计和编译技术.但在此基础上的并行程序执行过程更为复杂,程序执行过程的模拟成为关键问题之一.本文提出利用二进制代码级动态插桩技术对投机并行多线程程序进行功能性模拟,设计并实现了完整的软件平台,可精确地模拟和监控并行程序的线程级投机执行过程,检测访存冲突,从而实现投机并行多线程的语义.该软件平台同时可以作为进一步研究投机多线程并行程序真实执行过程的基础,并有效支持投机并行多线程编译器的设计和分析. 相似文献

17.

Dynamic Output Feedback Robust MPC Using General Polyhedral State Bounds for the Polytopic Uncertain System With Bounded Disturbance

下载免费PDF全文

Baocang Ding Chenbo Gao Xubin Ping 《Asian journal of control》2016,18(2):699-708

This paper considers the dynamic output feedback robust model predictive control (MPC) for a system with both polytopic model parametric uncertainty and bounded disturbance. For this topic, the techniques for handling the unknown true state are crucial, and the strict guarantee of the input/output/state constraints favors replacing the true state by its bound in the optimization problems. The previous utilized polyhedral bounds, constructed by virtue of the error signals which are some linear combinations of the true state, the estimated state and the output, are generalized, where a bias item is utilized. Based on this unified bounding approach, new techniques for handling the unknown true state are given for both the main and the auxiliary optimization problems. As before, the main optimization problem calculates the control law parameters conditionally, and the auxiliary optimization problem determines the time to refresh these parameters. By applying the proposed method, the augmented state of the closed‐loop system is guaranteed to converge to the neighborhood of the equilibrium point. A numerical example is given to illustrate the effectiveness of the new method. 相似文献

18.

Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture 总被引：1，自引：0，他引：1

Zier David A. Lee Ben 《Parallel and Distributed Systems, IEEE Transactions on》2010,21(1):47-59

Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscalar processors. One promising method of exploiting TLP is Dynamic Speculative Multithreading (D-SpMT), which extracts multiple threads from a sequential program without compiler support or instruction set extensions. This paper introduces Cascadia, a D-SpMT multicore architecture that provides multigrain thread-level support and is used to evaluate the performance of several benchmarks. Cascadia applies a unique sustainable IPC (sIPC) metric on a comprehensive loop tree to select the best performing nested loop level to multithread. This paper also discusses the relationships that loops have on one another, in particular, how loop nesting levels can be extended through procedures. In addition, a detailed study is provided on the effects that thread granularity and interthread dependencies have on the entire system. 相似文献

19.

生物序列拼装欧拉路径算法的Gamma描述及其并行化研究 总被引：1，自引：0，他引：1

廖文昭童维勤蔡立志《小型微型计算机系统》2004,25(4):707-711

序列拼装是生物基因测序的一个重要环节,也是生物信息学重要的研究内容．[2]中将Eulerian路径的方法应用于序列拼接,较好地解决传统序列拼装软件中存在的repeat问题,从而提高序列拼装的精度,但对于该方法的研究目前还只有串行化的实现,拼装速度不够理想．在本文中,我们采用了并行化Gamma模型形式化地描述了用于序列拼装的Eulerian方法,并给出了Gamma程序的并行化实现方案．相似文献

20.

Motorcycle Graphs and Straight Skeletons

Siu-Wing Cheng Antoine Vigneron 《Algorithmica》2007,47(2):159-182

We present a new algorithm to compute motorcycle graphs. It runs in time when n is the number of motorcycles. We give a new characterization of the straight skeleton of a nondegenerate polygon. For a polygon with n vertices and h holes, we show that it yields a randomized algorithm that reduces the straight skeleton computation to a motorcycle graph computation in expected time. Combining these results, we can compute the straight skeleton of a nondegenerate polygon with h holes and with n vertices, among which r are reflex vertices, in expected time. In particular, we cancompute the straight skeleton of a nondegenerate polygon with n vertices in expected time. 相似文献