首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于GPU的高性能稀疏矩阵向量乘及CG求解器优化   总被引:1,自引:1,他引:0  
以有限元/有限差分等为代表的一类数值方法,其总体矩阵常常具有“带状”、稀疏的特点。针对“带状”稀疏矩阵,提出和实现了一种高效的矩阵向量乘存储格式和算法“bDIA"。基于nVidia的GTX280系列GPU对其进行了测试,结果显示:与CUSP支持的5种常见稀疏矩阵存储格式和算法相比较,所提出的bDIA格式以及相应的spMV算法的单双精度浮点效率均可以提高1倍以上,并突破了该系列GPU在spMV计算时4%的单精度浮点效率上限和22.2%的双精度浮点效率上限;应用于共扼梯度(CG)与稳定双共扼梯度(BiCGStab)求解器,相对于DIA格式均有1.5倍左右的加速。  相似文献   

2.
重开始广义极小残量法(GMRES)是求解大规模线性方程组的常用算法之一,具有收敛速度快、稳定性好等优点.文中基于CUDA将GMRES算法在GPU上进行并行算法实现,尤其针对稀疏矩阵矢量乘法运算,通过合并访问和共享内存策略相结合的手段使得算法效率大幅度提升.对于大规模数据集,在GeForce GTX 260上的运行结果相对于Intel Core 2 Quad CPU Q9400@2.66GHz得到了平均40余倍的加速效果,相对于Intel Core i7 CPU 920@2.67 GHz也可得到平均20余倍的加速效果.  相似文献   

3.
Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC model measures the throughput of a memory architecture and therefore reflects the quality of service(QoS)of a memory system.The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy.The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design.These three models have been proposed separately through prior efforts.This paper reexamines the three models under one coherent mathematical framework.More specifically,we present a new memory-centric view of data accesses.We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy.This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency.Consequently,the performance model can be easily understood and applied in engineering practices.As such,the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems.  相似文献   

4.
This article presents a parallel self-verified solver for dense linear systems of equations. This kind of solver is commonly used in many different kinds of real applications which deal with large matrices. Nevertheless, two key problems appear to limit the use of linear system solvers to a more extensive range of real applications: solution correctness and high computational cost. In order to solve the first one, verified computing would be an interesting choice. An algorithm that uses this concept is able to find a highly accurate and automatically verified result providing more reliability. However, the performance of these algorithms quickly becomes a drawback. Aiming at a better performance, parallel computing techniques were employed. Two main parts of this method were parallelized: the computation of the approximate inverse of matrix A and the preconditioning step. The results obtained show that these optimizations increase significantly the overall performance.  相似文献   

5.
During the last three decades,evolutionary algorithms(EAs) have shown superiority in solving complex optimization problems,especially those with multiple objectives and non-differentiable landscapes.However,due to the stochastic search strategies,the performance of most EAs deteriorates drastically when handling a large number of decision variables.To tackle the curse of dimensionality,this work proposes an efficient EA for solving super-large-scale multi-objective optimization problems with spa...  相似文献   

6.
Spark Streaming作为主流的开源分布式流分析框架,性能优化是目前的研究热点之一。在Spark Streaming性能优化中,业务场景下的配置参数优化是其性能提升的重要因素。在Spark Streaming系统中,可配置的参数有200多个,对参数调优人员的经验要求较高,未经优化的参数配置会影响流作业执行性能。因此,针对Spark Streaming的参数配置优化问题,提出一种基于深度强化学习的Spark Streaming参数优化方法(DQN-SSPO),将Spark Streaming参数优化配置问题转化为深度强化学习模型训练中的最大回报获得问题,并提出权重状态空间转移方法来增加模型训练获得高反馈奖励的概率。在3种典型的流分析任务上进行实验,结果表明经参数优化后Spark Streaming上的流作业性能在总调度时间上平均缩减27.93%,在总处理时间上平均缩减42%。  相似文献   

7.
陶卿  高乾坤  姜纪远  储德军 《软件学报》2013,24(11):2498-2507
机器学习正面临着数据规模日益扩大的严峻挑战,如何处理大规模甚至超大规模数据问题,是当前统计学习亟需解决的关键性科学问题.大规模机器学习问题的训练样本集合往往具有冗余和稀疏的特点,机器学习优化问题中的正则化项和损失函数也蕴含着特殊的结构含义,直接使用整个目标函数梯度的批处理黑箱方法不仅难以处理大规模问题,而且无法满足机器学习对结构的要求.目前,依靠机器学习自身特点驱动而迅速发展起来的坐标优化、在线和随机优化方法成为解决大规模问题的有效手段.针对L1 正则化问题,介绍了这些大规模算法的一些研究进展.  相似文献   

8.
NORMAN RAMSEY 《Software》1996,26(4):467-487
This paper presents a simple equation solver. The solver finds solutions for sets of linear equations extended with several nonlinear operators, including integer division and modulus, sign extension, and bit slicing. The solver uses a new technique called {\em balancing}, which can eliminate some nonlinear operators from a set of equations before applying Gaussian elimination. The solver's principal advantages are its simplicity and its ability to handle some nonlinear operators, including nonlinear functions of more than one variable. The solver is part of an application generator that provides encoding and decoding of machine instructions based on equational specifications. The solver is presented not as pseudo code but as a literate program, which guarantees that the code shown in the paper is the same code that is actually used. Using real code exposes more detail than using pseudocode, but literate-programming techniques help manage the detail. The detail should benefit readers who want to implement their own solvers based on the techniques presented here.  相似文献   

9.
We explore time-based solvers for linear standing-wave problems, especially the oscillatory Helmholtz equation. Here, we show how to accelerate the convergence properties of timestepping. We introduce a new time-based solver that we call phase-adjusted time-averaging (PATA), which we couple to timestepping to form the PATA-TS solver. Numerical experiments indicate that the PATA-TS solver is faster than the PATA solver and timestepping by a factor of 1.2 and 1.5 or more, respectively. We also explain why the PATA-TS solver is robust, efficient, and easy to program for a variety of practical applications.  相似文献   

10.
In this paper, a new parallel controller is developed for continuous-time linear systems. The main contribution of the method is to establish a new parallel control law, where both state and control are considered as the input. The structure of the parallel control is provided, and the relationship between the parallel control and traditional feedback controls is presented. Considering the situations that the systems are controllable and incompletely controllable, the properties of the parallel control law are analyzed. The parallel controller design algorithms are given under the conditions that the systems are controllable and incompletely controllable. Finally, numerical simulations are carried out to demonstrate the effectiveness and applicability of the present method.   相似文献   

11.
演化元胞自动机函数优化算法案例研究   总被引:7,自引:1,他引:6  
BUMP是一个超多维,超多峰,超非线性的问题,被广泛应用于各种演化算法的性能比较。但最好解是未知的。基于元胞自动机的遗传算法报告了BUMP曾经发表过的最好解。该文设计了基于演化元胞自动机的新算法(ECAA)并获得了更好的结果。文中详细讨论了算法中各算子的设计方法及其在算法中扮演的角色,分析了该算法的极度并行,天然局部搜索等重要特性。  相似文献   

12.
随着工业计算需求的激增,计算流体力学 (Computational Fluid Dynamics, CFD) 学科对计算效率问题越来越重视。作者基于自行开发的 Navier-Stokes 解算器,引入多重网格加速收敛算法,并结合NVIDIA GPU 计算平台,从数值方法和高性能计算两个方面为 CFD 实现加速。数值加速算例测试结果表明,基于多重网格算法的 GPU 解算器相对 CPU 版本代码双精度可获得 45 倍以上的加速。  相似文献   

13.
应用罚函数求解二层线性优化问题的全局优化方法   总被引:3,自引:0,他引:3  
曹东 《控制与决策》1995,10(4):327-331
应用罚函数原理,将二层线性优化问题转化为目标函数带有罚函数子项的非线性优化问题,当罚系数大于某一数值时,库函数项为一精确项,该非线性优化问题用渐的进外逼近算法可求出其全局最优解。  相似文献   

14.
Data access costs contribute significantly to the execution time of applications with complex data structures. A the latency of memory accesses becomes high relative to processor cycle times, application performance is increasingly limited by memory performance. In some situations it is useful to trade increased computation costs for reduced memory costs. The contributions of this paper are three-fold: we provide a detailed analysis of the memory performance of seven memory-intensive benchmarks; we describe Computation Regrouping, a source-level approach to improving the performance of memory-bound applications by increasing temporal locality to eliminate cache and TLB misses; and, we demonstrate significant performance improvement by applying Computation Regrouping to our suite of seven benchmarks. Using Computation Regrouping, we observe a geometric mean speedup of 1.90, with individual speedups ranging from 1.26 to 3.03. Most of this improvement comes from eliminating memory tall time.  相似文献   

15.
Due to the inherent non-uniformity in the memory system, programmers and users of non-uniform memory access (NUMA) machines have to take special care of the memory performance of their applications. This paper discusses a variety of potential improvements with respect to cache misses, cache invalidations, and inter-node communication. This study is based on the simulation tool SIMT, which models the memory hierarchy in detail and is capable of providing complete, accurate information about all dynamic memory references. This information can be used to analyze the memory access behavior of applications and thereby forms the basis for any optimization with respect to memory accesses.  相似文献   

16.
Spark性能优化技术研究综述   总被引:2,自引:0,他引:2  
近年来,随着大数据时代的到来,大数据处理平台发展迅速,产生了诸如Hadoop,Spark,Storm等优秀的大数据处理平台,其中Spark最为突出。随着Spark在国内外的广泛应用,其许多性能问题尚待解决。由于Spark底层 的执行机制极为复杂,用户很难找到其性能瓶颈,更不要说进一步的优化。针对以上问题, 从开发原则优化、内存优化、配置参数优化、调度优化、Shuffle过程优化5个方面对 目前国内外的Spark优化技术进行总结和分析。最后,总结了目前Spark优化技术新的核心问题,并提出了未来的主要研究方向。  相似文献   

17.
粒子群优化算法( PSO)是一种仿生类的全局优化算法,它借助记忆与反馈机制完成了寻优搜索。该算法受到了鸟类觅食活动的启发而得,其基本思想源于对鸟类简化社会模型的研究及行为模拟,其中的每个个体充分利用自身与群体的智能,不断地调整学习,最终得到满意解。该算法常用于求解非线性问题、组合优化问题等。因其具有易理解,易实现,控制参数少,收敛速度快等优点,该算法一经提出就吸引了广泛的关注,逐渐成为一个新的研究热点。然而粒子群优化算法也有些不足,如搜索精度不高,易早熟以及易陷入局部极值等。而且算法在搜索后期也有产生振荡现象的可能,使得算法收敛起来会较慢。所以,文中就粒子群在迭代后期所出现的振荡现象进行了研究,并作出改进,提出了一种飞行时间单调递减的粒子群优化算法。新算法改善了算法的寻优能力,减小了粒子在寻优过程中的振荡现象。  相似文献   

18.
求解线性约束二次优化问题的神经计算模型   总被引:1,自引:0,他引:1       下载免费PDF全文
本文提出了一种求解线性约束二次优化问题的神经模型 ,研究了该神经网络的稳定性和收敛性 ,给出了电路框图 ,并通过算例证明了该神经网络的可行性。  相似文献   

19.
计算机网络服务质量优化方法研究综述   总被引:30,自引:5,他引:30  
优化方法为设计更好的计算机网络服务质量保证机制提供了有力的理论支持.相较于传统启发式的网络设计方法,优化方法可以从理论上找到问题的最优解,从而从根本上克服了启发式方法不能证明方案优劣程度的缺陷.因此,基于优化方法的机制设计与性能评价成为了当前网络服务质量领域中的一个前沿研究领域.大量的研究着眼于从优化理论的角度重新建立...  相似文献   

20.
基于遗传算法的锅炉水温PID控制寻优   总被引:1,自引:0,他引:1  
遗传算法相比传统调节方法具有更好的鲁棒性和最优性。结合锅炉动态水温单回路控制系统的PID参数整定与优化问题,采用遗传算法对其参数进行优化。仿真结果表明,这种优化算法加快了收敛速度,有效提高了系统的全局稳定性,增强了PID控制器的鲁棒性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号