首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is widely acknowledged that improving parallel I/O performance is critical for widespread adoption of high performance computing. In this paper, we show that communication in out-of-core distributed memory problems may require both interprocessor communication and file I/O. Thus, in order to improve I/O performance, it is necessary to minimize the I/O costs associated with a communication step. We present three methods for performing communication in out-of-core distributed memory problems. The first method, called thegeneralized collective communicationmethod, follows a loosely synchronous model; computation and communication phases are clearly separated, and communication requires permutation of data in files. The second method, called thereceiver-driven in-core communication, communicates only the in-core data. The third method, called theowner-driven in-core communication, goes even one step further and tries to identify the potential future use of data (by the recipients) while it is in the senders memory. We provide performance results for two out-of-core applications: the two-dimensional FFT code, and the two-dimensional elliptic Jacobi solver.  相似文献   

2.
This paper presents a bounded model checking tool called Hydlogic{\texttt{Hydlogic}} for hybrid systems. It translates a reachability problem of a nonlinear hybrid system into a predicate logic formula involving arithmetic constraints and checks the satisfiability of the formula based on a satisfiability modulo theories method. We tightly integrate (i) an incremental SAT solver to enumerate the possible sets of constraints and (ii) an interval-based solver for hybrid constraint systems (HCSs) to solve the constraints described in the formulas. The HCS solver verifies the occurrence of a discrete change by using a set of boxes to enclose continuous states that may cause the discrete change. We utilize the existence property of a unique solution in the boxes computed by the HCS solver as (i) a proof of the reachability of a model and (ii) a guide in the over-approximation refinement procedure. Our Hydlogic{\texttt{Hydlogic}} implementation successfully handled several examples including those with nonlinear constraints.  相似文献   

3.
We consider direct methods based on Gaussian elimination for solving sparse sets of linear equations. Among conventional approaches, band and frontal methods are obviously vectorizable and general sparse methods equally do not vectorize easily since they involve indirect addressing in inner loops. We illustrate these effects with actual times from runs on the Cray at Harwell. To avoid indirect addressing we have been developing code that uses a “multi-frontal” technique. This moves the reals within storage in such a way that all operations are performed on full matrices although the pivotal strategy is minimum degree. We describe how the in-core and out-of-core versions perform on theCRAY-1.  相似文献   

4.
Linear equality and inequality constraints arise naturally in specifying many aspects of user interfaces, such as requiring that one window be to the left of another, requiring that a pane occupy the leftmost 1/3 of a window, or preferring that an object be contained within a rectangle if possible. For interactive use, we need to solve similar constraint satisfaction problems repeatedly for each screen refresh, with each successive problem differing from the previous one only in the position of an input device and the previous state of the system. We present an algorithm for solving such systems of constraints using projection. The solution is compiled into very efficient, constraint-free code, which is parameterized by the new inputs. Producing straight-line, constraint-free code of this sort is important in a number of applications: for example, to provide predictable performance in real-time systems, to allow companies to ship products without including a runtime constraint solver, to compile Java applets that can be downloaded and run remotely (again without having to include a runtime solver), or for applications where runtime efficiency is particularly important. Even for less time-critical user interface applications, the smooth performance of the resulting code is more pleasing than that of code produced using other current techniques.  相似文献   

5.

This paper presents methods used to perform discrete adjoint gradient evaluations for linear stress and vibration analysis. The methods are implemented within the framework of a discrete adjoint structural solver being developed for multidisciplinary adjoint optimizations of turbomachinery components. The code is differentiated using the algorithmic differentiation tool CoDiPack in tandem with manual treatment of the iterative solvers. Stress analysis leads to a linear system of equations that is typically solved by an iterative solver (e.g. GMRES). To ensure accuracy, the adjoint problem is formulated as a new linear system of equations to be solved. Vibration analysis results in a generalized eigenvalue problem that is also typically solved by an interative solver. The adjoint problem takes out the generalized eigenvalue solve and replaces it by one outer product per eigenfrequency, leading to significantly cheap eigenfrequency gradients for vibration analysis.

  相似文献   

6.
Averbuch  A.  Epstein  B.  Ioffe  L.  Yavneh  I. 《The Journal of supercomputing》2000,17(2):123-142
We present an efficient parallelization strategy for speeding up the computation of a high-accuracy 3-dimensional serial Navier-Stokes solver that treats turbulent transonic high-Reynolds flows. The code solves the full compressible Navier-Stokes equations and is applicable to realistic large size aerodynamic configurations and as such requires huge computational resources in terms of computer memory and execution time. The solver can resolve the flow properly on relatively coarse grids. Since the serial code contains a complex infrastructure typical for industrial code (which ensures its flexibility and applicability to complex configurations), then the parallelization task is not straightforward. We get scalable implementation on massively parallel machines by maintaining efficiency at a fixed value by simultaneously increasing the number of processors and the size of the problem.The 3-D Navier-Stokes solver was implemented on three MIMD message-passing multiprocessors (a 64-processors IBM SP2, a 20-processors MOSIX, and a 64-processors Origin 2000). The same code written with PVM and MPI software packages was executed on all the above distinct computational platforms. The examples in the paper demonstrate that we can achieve efficiency of about 60% for as many as 64 processors on Origin 2000 on a full-size 3-D aerodynamic problem which is solved on realistic computational grids.  相似文献   

7.
Constraint Handling Rules (CHRs) are a high-level rule-based programming language commonly used to define constraint solvers. We present a method for automatic implication checking between constraints of CHR solvers. Supporting implication is important for implementing extensible solvers and reification, and for building hierarchical CHR constraint solvers. Our method does not copy the entire constraint store, but performs the check in place using a trailing mechanism. The necessary code enhancements can be done by automatic program transformation based on the rules of the solver. We extend our method to work for hierarchically organized modular CHR solvers. We show the soundness of our method and its completeness for a restricted class of canonical solver as well as for specific existing non-canonical CHR solvers. We evaluate our trailing method experimentally by comparing with the copy approach: runtime is almost halved.  相似文献   

8.
This paper extends the procedures and ideas presented in an earlier paper [1] to the out-of-core solution of sets of equations of almost unlimited size. Block partitioning of the coefficient matrix and load vectors is considered and an illustrative example presented. The data transfer procedures, for input to or output from low speed storage, which are essential for an equation solver of this type, are explained in detail. Based on these procedures, a listing of a large capacity general purpose equation solver is presented. Example problems are solved to demonstrate the computational efficiency of the equation solver.  相似文献   

9.
Solving nonlinear constraints over real numbers is a complex problem. Hence constraint logic programming languages like CLPR or Prolog III solve only linear constraints and delay nonlinear constraints until they become linear. This efficient implementation method has the disadvantage that sometimes computed answers are unsatisfiable or infinite loops occur due to the unsatisfiability of delayed nonlinear constraint These problems could be solved by using a more powerful constraint solver which can deal with nonlinear constraints like in RISC-CLP(Real). Since such powerful constraint solvers are not very efficient, we propose a compromise between these two extremes. We characterize a class of CLPR programs for which all delayed nonlinear constraints become linear at run time. Programs belonging to this class can be safely executed with the efficient CLPR method while the remaining programs need a more powerful constraint solver. This paper is an extended and revised version of Ref. 12). The research described in this paper was made during the author’s stay at the Max-planck-Institut für Informatik in Saarbrücken, Germany. It was supported in part by the German Ministry for Research and Technology (BMFT) under grant ITS 9103 and by the ESPRIT Basic Research Working Group 6028 (Construction of Computational Logics). The responsibility for the contents of this publication lies with the author.  相似文献   

10.
11.
任胜兵  吴斌  张健威  王志健 《计算机应用》2016,36(10):2806-2810
针对程序中因存在路径条数过多或复杂循环路径而导致路径验证时的路径搜索空间过大,直接影响验证的效率和准确率的问题,提出一种基于可满足性模理论(SMT)求解器的程序路径验证方法。首先利用决策树的方法对复杂循环路径提取不变式,构造无循环控制流图(NLCFG);然后通过基本路径法对控制流图(CFG)进行遍历,提取基本路径信息;最后利用SMT求解器作为约束求解器,将路径验证问题转化为约束求解问题来进行处理。与同样基于SMT求解器的路径验证工具CBMC和FSoft-SMT相比,该方法在对测试集程序的验证时间上比CBMC降低了25%以上,比FSoft-SMT降低了15%以上;在验证精度上,该方法有明显的提升。实验结果表明,方法可以有效解决路径搜索空间过大的问题,同时提高路径验证的效率和准确率。  相似文献   

12.
The richness and expressive power of geometric constraints causes unintended ambiguities and inconsistencies during their solution or realization. For example, geometric constraint problems may turn out to be overconstrained requiring the user to delete one or more of the input constraints, and the solutions must then be dynamically updated. Without proper guidance by the constraint solver, the user must have profound insight into the mathematical nature of constraint systems and understand the internals of the solver algorithm. But a general user is most likely unfamiliar with those problems, so that the required interaction with the constraint solver may well be beyond the user's ability. In this paper, we present strategies and techniques to empower the user to deal effectively with the overconstraint problem while not requiring him or her to become an expert in the mathematics of constraint solving.We formulate this problem as a series of formal requirements that gel with other essentials of constraint solvers. We then give algorithmic solutions that are both general and efficient (running time typically linear in the number of relevant constraints).  相似文献   

13.
针对传统的自动测试图形向量生成采用逐个求解单一故障模型导致生成测试向量数据量巨大的缺点, 提出一种基于布尔满足性(boolean satisfiability, SAT)的多目标故障测试向量动态压缩方法, 同时论证多目标故障测试生成问题为布尔满足性问题。该方法将具有鲁棒性的SAT算法嵌入经典的动态压缩流程中, 首先利用经典动态压缩算法求解最小测试向量检测大部分失效故障, 然后采用SAT求解器对未测出的多故障电路进行同一求解和附加约束求解方式, 最终得到故障覆盖率高的测试向量和同一测试最大故障列表。实验数据表明, 在相同电路模型情况下, 此方法求得的测试向量相比经典动态压缩减少高达70%。  相似文献   

14.
Piecewise polynomial constraint systems are common in numerous problems in computational geometry, such as constraint programming, modeling, and kinematics. We propose a framework that is capable of decomposing, and efficiently solving a wide variety of complex piecewise polynomial constraint systems, that include both zero constraints and inequality constraints, with zero-dimensional or univariate solution spaces. Our framework combines a subdivision-based polynomial solver with a decomposition algorithm in order to handle large and complex systems. We demonstrate the capabilities of our framework on several types of problems and show its performance improvement over a state-of-the-art solver.  相似文献   

15.
In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track distribution cyclic(Bd ), where Bd is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing efficient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the effectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas  相似文献   

16.
This paper describes a method for combining “off-the-shelf” SAT and constraint solvers for building an efficient Satisfiability Modulo Theories (SMT) solver for a wide range of theories. Our method follows the abstraction/refinement approach to simplify the implementation of custom SMT solvers. The expected performance penalty by not using an interweaved combination of SAT and theory solvers is reduced by generalising a Boolean solution of an SMT problem first via assigning don’t care to as many variables as possible. We then use the generalised solution to determine a thereby smaller constraint set to be handed over to the constraint solver for a background theory. We show that for many benchmarks and real-world problems, this optimisation results in considerably smaller and less complex constraint problems. The presented approach is particularly useful for assembling a practically viable SMT solver quickly, when neither a suitable SMT solver nor a corresponding incremental theory solver is available. We have implemented our approach in the ABsolver framework and applied the resulting solver successfully to an industrial case-study: the verification problems arising in verifying an electronic car steering control system impose non-linear arithmetic constraints, which do not fall into the domain of any other available solver.  相似文献   

17.
We present a sensitivity-based nonlinear model predictive control (NMPC) algorithm and demonstrate it on a case study with an economic cost function. In contrast to existing sensitivity-based approaches that make strong assumptions on the underlying optimization problem (e.g. the linear independence constraint qualification implying unique multiplier), our method is designed to handle problems satisfying a weaker constraint qualification, namely the Mangasarian-Fromovitz constraint qualification (MFCQ). Our nonlinear programming (NLP) sensitivity update consists of three steps. The first step is a corrector step in which a system of linear equations is solved. Then a predictor step is computed by a quadratic program (QP). Finally, a linear program (LP) is solved to select the multipliers that give the correct sensitivity information. A path-following scheme containing these steps is embedded in the advanced-step NMPC (asNMPC) framework. We demonstrate our method on a large-scale case example consisting of a reactor and distillation process. We show that LICQ does not hold and the path-following method is able to accurately approximate the ideal solutions generated by an NLP solver.  相似文献   

18.
R.  T.  T.   《Future Generation Computer Systems》2005,21(8):1345-1355
FastOpt’s new automatic differentiation tool TAF is applied to the two-dimensional Navier–Stokes solver NSC2KE. For a configuration that simulates the Euler flow around an NACA airfoil, TAF has generated the tangent linear and adjoint models as well as the second derivative (Hessian) code. Owing to TAF’s capability of generating efficient adjoints of iterative solvers, the derivative code has a high performance: running both the solver and its adjoint requires 3.4 times as long as running the solver only. Further examples of highly efficient tangent linear, adjoint, and Hessian codes for large and complex three-dimensional Fortran 77-90 climate models are listed. These examples suggest that the performance of the NSC2KE adjoint may well be generalised to more complex three-dimensional CFD codes. We also sketch how TAF can improve the adjoint’s performance by exploiting self-adjointness, which is a common feature of CFD codes.  相似文献   

19.
傅立国  姚远  丁锐 《计算机应用》2014,34(4):1014-1018
不规则计算在大规模并行应用中广泛存在。在面向分布存储结构的自动并行化过程中,较难在编译时为不规则循环生成并行代码。并行代码中的通信代码对程序运行结果的正确性以及加速效果有着严重的影响。通过分析程序的数组重分布图,使用部分冗余的通信方式来维持不规则数组访问的生产者消费者关系,可以在编译时为一类常见的不规则循环自动生成有效的通信代码。该方法使用计算分解和数组引用的访问表达式求解不规则数组在各处理器的本地定义集作为通信的数据集,分析针对此类不规则循环划分的通信策略,继而生成相应的通信代码。实验测试的结果取得了预期的加速效果,验证了方法的有效性。  相似文献   

20.
在一组相同处理器上调度带有通信延迟的任务图以实现其最短的执行时间,这在并行计算的调度理论和实践中具有重要的意义。针对具有通信延迟的任务图调度问题,提出一种基于可满足性模理论(SMT)的改进SMT方法。首先,将处理器映射约束和任务执行顺序等约束条件进行编码,将任务图调度问题转化为SMT问题;然后,调用SMT求解器对可行解空间进行搜索,以确定问题最优解。在约束编码阶段,使用整型变量表示任务和处理器的映射关系,从而降低处理器约束编码的复杂程度;在求解器调用阶段,通过添加独立任务的约束条件减小求解器的搜索空间,进一步提升最优解的查找效率。实验结果表明,与原始SMT方法相比,改进SMT方法在20 s和1 min超时实验中的平均求解时间分别减少了65.9%与53.8%,并且在处理器数量较多时取得了更大的效率优势。改进的SMT方法可以有效求解带通信延迟的任务图调度问题,尤其适用于处理器数量较多的调度场景。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号