期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive execution techniques of parallel programs for multiprocessors

Jaejin Lee Jung-Ho Park Honggyu Kim Changhee Jung Daeseob Lim SangYong Han 《Journal of Parallel and Distributed Computing》2010

In simultaneous multithreading (SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems. 相似文献

2.

基于Z3的Coq自动证明策略的设计和实现

张恒若付明《软件学报》2017,28(4):819-826

形式化验证方法被认为是一种构建高可信软件系统的有效手段.在定理证明工具通过手动写证明脚本来验证系统软件的功能正确性,这种验证方式表达力强,可以证明复杂系统,但是自动化程度低、验证代价比较高,而使用程序验证器接受经过规范标注的源代码生成验证条件,并将验证条件交给约束求解器自动求解,这种方式自动化程度高,缺点在于它很难验证复杂系统软件的全部功能的正确性.本文结合上述两种方式的优点,在定理证明工具Coq中实现了一个自动证明策略smt4coq,它通过在Coq中调用约束求解器Z3自动证明32位机器整数相关的数学命题,提高了自动化验证的程度,减轻用户手动验证程序的开销. 相似文献

3.

Tuning Compiler Optimizations for Simultaneous Multithreading

Jack L. Lo Susan J. Eggers Henry M. Levy Sujay S. Parekh Dean M. Tullsen 《International journal of parallel programming》1999,27(6):477-503

Simultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions from multiple threads to a processor's functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current uniprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding long-latency operations. Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine, particularly for parallel processors. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost inter-processor communication. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT, which can benefit from finegrained resource sharing within the processor. This paper reexamines several compiler optimizations in the context of simultaneous multithreading. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated, and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines, compilers can generate code that improves the performance of programs executing on SMT machines. 相似文献

4.

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units

Alper Sen Baris Aksanli Murat Bozkurt 《International journal of parallel programming》2011,39(5):639-661

Verification has grown to dominate the cost of electronic system design, consuming about 60% of design effort. Among several verification techniques, logic simulation remains the major verification technique. Speeding up logic simulation results in great savings and shorter time-to-market. We parallelize logic simulation using Graphics Processing Units (GPUs). In the past, GPUs were special-purpose application accelerators, suitable only for conventional graphics applications. The new generations of GPU architecture provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power of traditional GPUs. We develop a parallel cycle-based logic simulation algorithm that uses And Inverter Graphs (AIGs) as design representations. AIGs have proven to be an effective representation for various design automation applications, and we obtain similar benefits for speeding up logic simulation. We develop two clustering algorithms that partition the gates in the designs into independent blocks. Our algorithms exploit the massively parallel GPU architecture featuring thousands of concurrent threads, fast memory, and memory coalescing for optimizations. We demonstrate up-to 5x and 21x speedups on several benchmarks using our simulation system with the first and second clustering algorithms, respectively. Our work ultimately results in significant reduction in the overall design cycle. 相似文献

5.

Accelerating data gravitation-based classification using GPU

Peng Lizhi Zhang Haibo Hassan Houcine Chen Yuehui Yang Bo 《The Journal of supercomputing》2019,75(6):2930-2949

Data gravitation-based classification model, a new physic law inspired classification model, has been demonstrated to be an effective classification model for both standard and imbalanced tasks. However, due to its large scale of gravitational computation during the feature weighting process, DGC suffers from high computational complexity, especially for large data sets. In this paper, we address the problem of speeding up gravitational computation using graphics processing unit (GPU). We design a GPU parallel algorithm namely GPU–DGC to accelerate the feature weighting process of the DGC model. Our GPU–DGC model distributes the gravitational computing process to parallel GPU threads, in order to compute gravitation simultaneously. We use 25 open classification data sets to evaluate the parallel performance of our algorithm. The relationship between the speedup ratio and the number of GPU threads is discovered and discussed based on the empirical studies. The experimental results show the effectiveness of GPU–DGC, with the maximum speedup ratio of 87 to the serial DGC. Its sensitivity to the number of GPU threads is also discovered in the empirical studies.

相似文献

6.

A bit too precise? Verification of quantized digital filters

Arlen Cox Sriram Sankaranarayanan Bor-Yuh Evan Chang 《International Journal on Software Tools for Technology Transfer (STTT)》2014,16(2):175-190

相似文献

7.

SMT proof checking using a logical framework

Aaron Stump Duckki Oe Andrew Reynolds Liana Hadarean Cesare Tinelli 《Formal Methods in System Design》2013,42(1):91-118

Producing and checking proofs from SMT solvers is currently the most feasible method for achieving high confidence in the correctness of solver results. The diversity of solvers and relative complexity of SMT over, say, SAT means that flexibility, as well as performance, is a critical characteristic of a proof-checking solution for SMT. This paper describes such a solution, based on a Logical Framework with Side Conditions (LFSC). We describe the framework and show how it can be applied for flexible proof production and checking for two different SMT solvers, clsat and cvc3. We also report empirical results showing good performance relative to solver execution time. 相似文献

8.

图形处理器空间插值并行算法的实现

下载免费PDF全文

赵艳伟程振林董慧方金云《中国图象图形学报》2012,17(4):575-581

空间插值是地理信息系统(GIS)空间分析中计算复杂且耗时的操作,因此无法满足实时性的要求。随着图形处理器(GPU)浮点计算能力的大幅提高,GPU通用计算已成为处理GIS领域内复杂计算的研究热点。为实时化一些传统低效的算法提供了良好的契机。利用GPU在并行计算上的优势,将反距离加权法插值算法映射到了统一计算设备架构(CUDA)并行编程架构。首先在GPU中建立二级索引使计算层次得到了合理的划分,然后利用多线程分块策略执行并行插值计算。最后通过实验表明,该方法的插值误差与CPU方法相比能控制在10-6数量级,并且在插值半径较大插值数据较多的情况下,该算法可达到40倍以上的加速比。充分证明了该方法的正确性及高效性。相似文献

9.

Don’t care in SMT: building flexible yet efficient abstraction/refinement solvers

Andreas Bauer Martin Leucker Christian Schallhart Michael Tautschnig 《International Journal on Software Tools for Technology Transfer (STTT)》2010,12(1):23-37

This paper describes a method for combining “off-the-shelf” SAT and constraint solvers for building an efficient Satisfiability Modulo Theories (SMT) solver for a wide range of theories. Our method follows the abstraction/refinement approach to simplify the implementation of custom SMT solvers. The expected performance penalty by not using an interweaved combination of SAT and theory solvers is reduced by generalising a Boolean solution of an SMT problem first via assigning don’t care to as many variables as possible. We then use the generalised solution to determine a thereby smaller constraint set to be handed over to the constraint solver for a background theory. We show that for many benchmarks and real-world problems, this optimisation results in considerably smaller and less complex constraint problems. The presented approach is particularly useful for assembling a practically viable SMT solver quickly, when neither a suitable SMT solver nor a corresponding incremental theory solver is available. We have implemented our approach in the ABsolver framework and applied the resulting solver successfully to an industrial case-study: the verification problems arising in verifying an electronic car steering control system impose non-linear arithmetic constraints, which do not fall into the domain of any other available solver. 相似文献

10.

SMT求解器理论组合技术研究 总被引：2，自引：0，他引：2

李婧刘万伟《计算机工程与科学》2011,33(10):111-119

可满足模理论(SMT)求解器是计算机科学中用来判定一阶逻辑公式可满足性的程序,是许多形式化方法的验证引擎.理论求解器实现了SMT基于不同理论背景的求解过程,然而实际问题常以多个理论为背景.因此,本文重点介绍理论组合判定方法,概述SMT求解器的发展现状,并分析了几个主流SMT求解器理论组合判定关键技术.通过对照实验,评估... 相似文献

11.

Building a push-button RESOLVE verifier: Progress and challenges

Murali Sitaraman Bruce Adcock Jeremy Avigad Derek Bronish Paolo Bucci David Frazier Harvey M. Friedman Heather Harton Wayne Heym Jason Kirschenbaum Joan Krone Hampton Smith Bruce W. Weide 《Formal Aspects of Computing》2011,23(5):607-626

A central objective of the verifying compiler grand challenge is to develop a push-button verifier that generates proofs of correctness in a syntax-driven fashion similar to the way an ordinary compiler generates machine code. The software developer??s role is then to provide suitable specifications and annotated code, but otherwise to have no direct involvement in the verification step. However, the general mathematical developments and results upon which software correctness is based may be established through a separate formal proof process in which proofs might be mechanically checked, but not necessarily automatically generated. While many ideas that could conceivably form the basis for software verification have been known ??in principle?? for decades, and several tools to support an aspect of verification have been devised, practical fully automated verification of full software behavior remains a grand challenge. This paper explains how RESOLVE takes a step towards addressing this challenge by integrating foundational and practical elements of software engineering, programming languages, and mathematical logic into a coherent framework. Current versions of the RESOLVE verifier generate verification conditions (VCs) for the correctness of component-based software in a modular fashion??one component at a time. The VCs are currently verified using automated capabilities of the Isabelle proof assistant, the SMT solver Z3, a minimalist rewrite prover, and some specialized decision procedures. Initial experiments with the tools and further analytic considerations show both the progress that has been made and the challenges that remain. 相似文献

12.

Approximate counting in SMT and value estimation for probabilistic programs

Chistikov Dmitry Dimitrova Rayna Majumdar Rupak 《Acta Informatica》2017,54(8):729-764

#SMT, or model counting for logical theories, is a well-known hard problem that generalizes such tasks as counting the number of satisfying assignments to a Boolean formula and computing the volume of a polytope. In the realm of satisfiability modulo theories (SMT) there is a growing need for model counting solvers, coming from several application domains (quantitative information flow, static analysis of probabilistic programs). In this paper, we show a reduction from an approximate version of #SMT to SMT. We focus on the theories of integer arithmetic and linear real arithmetic. We propose model counting algorithms that provide approximate solutions with formal bounds on the approximation error. They run in polynomial time and make a polynomial number of queries to the SMT solver for the underlying theory, exploiting “for free” the sophisticated heuristics implemented within modern SMT solvers. We have implemented the algorithms and used them to solve the value problem for a model of loop-free probabilistic programs with nondeterminism.

相似文献

13.

Mothers of Pipelines

Sava Krsti&#x; Robert B. Jones John O'Leary 《Electronic Notes in Theoretical Computer Science》2007,174(8):7

We present a method for pipeline verification using SMT solvers. It is based on a non-deterministic “mother pipeline” machine (MOP) that abstracts the instruction set architecture (ISA). The MOP vs. ISA correctness theorem splits naturally into a large number of simple subgoals. This theorem reduces proving the correctness of a given pipelined implementation of the ISA to verifying that each of its transitions can be modeled as a sequence of MOP state transitions. 相似文献

14.

Improved usability and performance of SMT solvers for debugging specifications 总被引：1，自引：0，他引：1

David R. Cok 《International Journal on Software Tools for Technology Transfer (STTT)》2010,12(6):467-481

It is now common to construct an extended static checker or software verification system using an SMT theorem prover as the underlying logical verifier. SMT provers have improved significantly in performance over the last several years. However, their usability as a component of software checking and verification systems still has gaps. This paper describes investigations in two areas: the reporting of counterexample information and the testing of vacuity, both of which are important to realistic use of such tools for typical software development. The use of solvers in verification is more effective if the solvers support minimal unsatisfiable cores and incremental construction, evolution and querying of satisfying assignments; current solvers only partially support these capabilities. 相似文献

15.

Compositional verification of sequential programs with procedures 总被引：1，自引：0，他引：1

Dilian Gurov Marieke Huisman Christoph Sprenger 《Information and Computation》2008,206(7):840-868

We present a method for algorithmic, compositional verification of control-flow-based safety properties of sequential programs with procedures. The application of the method involves three steps: (1) decomposing the desired global property into local properties of the components, (2) proving the correctness of the property decomposition by using a maximal model construction, and (3) verifying that the component implementations obey their local specifications. We consider safety properties of both the structure and the behaviour of program control flow. Our compositional verification method builds on a technique proposed by Grumberg and Long that uses maximal models to reduce compositional verification of finite-state parallel processes to standard model checking. We present a novel maximal model construction for the fragment of the modal μ-calculus with boxes and greatest fixed points only, and adapt it to control-flow graphs modelling components described in a sequential procedural language. We extend our verification method to programs with private procedures by defining an abstraction, presented as an inlining transformation. All algorithms have been implemented in a tool set automating all required verification steps. We validate our approach on an electronic purse case study. 相似文献

16.

Model generation for quantified formulas with application to test data generation

Christoph D. Gladisch 《International Journal on Software Tools for Technology Transfer (STTT)》2012,14(4):439-459

We present a new model generation approach and technique for solving first-order logic (FOL) formulas with quantifiers in unbounded domains. Model generation is important, e.g., for test data generation based on test data constraints and for counterexample generation in formal verification. In such scenarios, quantified FOL formulas have to be solved stemming, e.g., from formal specifications. Satisfiability modulo theories (SMT) solvers are considered as the state-of-the-art techniques for generating models of FOL formulas. Handling of quantified formulas in the combination of theories is, however, sometimes a problem. Our approach addresses this problem and can solve formulas that were not solvable before using SMT solvers. We present the model generation algorithm and show how to convert a representation of a model into a test preamble for state initialization with test data. A prototype of this algorithm is implemented in the formal verification and test generation tool KeY. 相似文献

17.

A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems

下载免费PDF全文

Zhi-Guang Chen Yu-Bo Liu Yong-Feng Wang Yu-Tong Lu 《计算机科学技术学报》2021,36(1):44-55

Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50％under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads. 相似文献

18.

Evaluations of OpenCL-written tsunami simulation on FPGA and comparison with GPU implementation

Fumiya Kono Naohito Nakasato Kensaku Hayashi Alexander Vazhenin Stanislav Sedukhin 《The Journal of supercomputing》2018,74(6):2747-2775

When a tsunami occurred on a sea area, prediction of its arrival time is critical for evacuating people from the coastal area. There are many problems related to tsunami to be solved for reducing negative effects of this serious disaster. Numerical modeling of tsunami wave propagation is a computationally intensive problem which needs to accelerate its calculations by parallel processing. The method of splitting tsunami (MOST) is one of the well-known numerical solvers for tsunami modeling. We have developed a tsunami propagation code based on MOST algorithm and implemented different parallel optimizations for GPU and FPGA. In the latest study, we have the best performance of OpenCL kernel which is implemented tsunami simulation on AMD Radeon 280X GPU. This paper targets on design and evaluation on FPGA using OpenCL. The performance on FPGA design generated automatically by Altera offline compiler follows the results of GPU by several kernel modifications. 相似文献

19.

Massively parallel lattice–Boltzmann codes on large GPU clusters

《Parallel Computing》2016

This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence.GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled.We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics. 相似文献

20.

Modular verification of software components in C 总被引：2，自引：0，他引：2

《IEEE transactions on pattern analysis and machine intelligence》2004,30(6):388-402

We present a new methodology for automatic verification of C programs against finite state machine specifications. Our approach is compositional, naturally enabling us to decompose the verification of large software systems into subproblems of manageable complexity. The decomposition reflects the modularity in the software design. We use weak simulation as the notion of conformance between the program and its specification. Following the counterexample guided abstraction refinement (CEGAR) paradigm, our tool MAGIC first extracts a finite model from C source code using predicate abstraction and theorem proving. Subsequently, weak simulation is checked via a reduction to Boolean satisfiability. MAGIC has been interfaced with several publicly available theorem provers and SAT solvers. We report experimental results with procedures from the Linux kernel, the OpenSSL toolkit, and several industrial strength benchmarks. 相似文献