排序方式: 共有17条查询结果,搜索用时 0 毫秒
1.
Christos D. Antonopoulos Filip Blagojevic Andrey N. Chernikov Nikos P. Chrisochoides Dimitrios S. Nikolopoulos 《Journal of Parallel and Distributed Computing》2009
This article focuses on the optimization of PCDM, a parallel, two-dimensional (2D) Delaunay mesh generation application, and its interaction with parallel architectures based on simultaneous multithreading (SMT) processors. We first present the step-by-step effect of a series of optimizations on performance. These optimizations improve the performance of PCDM by up to a factor of six. They target issues that very often limit the performance of scientific computing codes. We then evaluate the interaction of PCDM with a real SMT-based SMP system, using both high-level metrics, such as execution time, and low-level information from hardware performance counters. 相似文献
2.
The management of replicated data in distributed database systems is a classic problem with great
practical importance. Quorum consensus is one of the popular methods, combined with eager replication, for
managing replicated data. In this paper we investigate the problems of delay-optimal quorum consensus.
Firstly, we show that the problem of minimizing the total delay (or mean delay) restricted to a ring can be
solved in a constant time in contrast to the existing approximation results. Secondly, we show that the problem
of minimizing the total delay (or mean delay) is NP-hard. Thirdly, we present an approximate algorithm with
an approximate ratio 2; and the approximate algorithm can guarantee the exact solutions for some specific
network topology, such as trees and meshes. Finally, we present an improvement on the existing algorithm
to solve the problem of minimizing the maximal delay; this reduces the time complexity from O(n
3 log n) to
O(n
3) where n is the number of nodes. 相似文献
3.
Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications 总被引:1,自引:0,他引:1
Calin Glitia Pierre Boulet Eric Lenormand Michel BarreteauAuthor vitae 《Journal of Systems Architecture》2011,57(9):815-829
The efficient design of computation intensive multidimensional signal processing applications requires dealing with three kinds of constraints: those implied by the data dependencies, the non-functional requirements (real-time, power consumption) and resources availability of the execution platform. Modeling and Analysis of Real-time and Embedded systems (MARTE) UML profile through its repetitive structure modeling (RSM) package is well suited to model the inherent parallelism within these applications, a compact representation of parallel execution platforms and the distributive mapping of one on another. The execution of such a specification respects the whole set of constraints defined upon, while the quality of the scheduling is directly linked to the quality of the mapping of the multidimensional structures (data arrays or parallel loop nests) into time and space. We propose here a strategy to use a refactoring tool dedicated to this kind of application that allows to find good trade-offs in the usage of storage and computation resources and in parallelism (both task and data parallelism) exploitation. This strategy is illustrated on an industrial radar application. 相似文献
4.
首先以高维数据压缩与恢复为背景,详细阐述由香农采样理论到稀疏表示和压缩感知理论再到低秩矩阵问题的发展历程,引出低秩矩阵近似与优化问题的重要性.然后,从低秩矩阵最小化问题、低秩矩阵分解问题、低秩矩阵的优化与应用三方面对现有方法进行详细的综述.最后对当前研究的不足之处与未来的研究方向提出合理的建议. 相似文献
5.
《Calphad》2018
The binary BaO-CaO and BaO-SiO2 systems have been critically evaluated based upon available phase equilibrium and thermodynamic data and optimized model parameters have been obtained giving the Gibbs energies of all phases as functions of temperature and composition. The liquid solution has been modeled with the Modified Quasichemical Model (MQM) to account for the short-range ordering. The results have been combined with those of previous optimizations of the CaO-SiO2 system to optimize the BaO-CaO-SiO2 system. 相似文献
6.
Recently, a number of classification techniques have been introduced. However, processing large dataset in a reasonable time has become a major challenge. This made classification task more complex and expensive in calculation. Thus, the need for solutions to overcome these constraints such as field programmable gate arrays (FPGAs). In this paper, we give an overview of the various classification techniques. Then, we present the existing FPGA based implementation of these classification methods. After that, we investigate the confronted challenges and the optimizations strategies. Finally, we highlight the hardware accelerator architectures and tools for hardware design suggested to improve the FPGA implementation of classification methods. 相似文献
7.
基于GPU的稀疏矩阵向量乘优化 总被引:1,自引:0,他引:1
针对稀疏矩阵运算难以发挥图形处理器的强大运算能力的现状,基于图形处理器的统一计算架构,在线程映射、数据复用等方面研究了一系列并行计算优化方法,从而完成了一种行压缩存储表示下的稀疏矩阵向量乘并行算法.这些优化方法包括:(1)利用Warp内线程天然同步特性,Half-warp完成结果向量一个元素的计算;(2)取整读取数据,实现合并访问;(3)输入向量放入纹理存储器,数据复用;(4)申请分页锁定内存,加速数据传输;(5)使用共享存储器,加速数据存取.实验分析表明,提出的各种手段起到了优化的作用.与已有的CUDPP和SpMV library中的CSR-vector算法相比,本算法获得了更高的存储器带宽和浮点运算吞吐量;整体性能比CPU串行执行版本快了3倍以上. 相似文献
8.
Xuemin Lin 《Algorithmica》2003,38(2):397-413
The management of replicated data in distributed database systems is a classic problem with great
practical importance. Quorum consensus is one of the popular methods, combined with eager replication, for
managing replicated data. In this paper we investigate the problems of delay-optimal quorum consensus.
Firstly, we show that the problem of minimizing the total delay (or mean delay) restricted to a ring can be
solved in a constant time in contrast to the existing approximation results. Secondly, we show that the problem
of minimizing the total delay (or mean delay) is NP-hard. Thirdly, we present an approximate algorithm with
an approximate ratio 2; and the approximate algorithm can guarantee the exact solutions for some specific
network topology, such as trees and meshes. Finally, we present an improvement on the existing algorithm
to solve the problem of minimizing the maximal delay; this reduces the time complexity from O(n
3 log n) to
O(n
3) where n is the number of nodes. 相似文献
9.
优化编译技术在现代处理器的研究中表现出越来越重要的作用。文章从现代编译器的结构入手,综合介绍现代编译器所普遍采用的优化技术,并提出了一种有效的优化编译器实现策略。 相似文献
10.
Dmitry Tsarkov Ian Horrocks Peter F. Patel-Schneider 《Journal of Automated Reasoning》2007,39(3):277-316