首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
稀疏矩阵向量乘(SpMV)在线性系统的求解问题中具有重要意义,是科学计算和工程实践中的核心问题之一,其性能高度依赖于稀疏矩阵的非零分布。稀疏对角矩阵是一类特殊的稀疏矩阵,其非零元素按照对角线的形式密集排列。针对稀疏对角矩阵,在GPU平台上提出的多种存储格式虽然使SpMV性能有所提升,但仍存在零填充和负载不平衡的问题。针对上述问题,提出了一种DRM存储格式,利用基于固定阈值的矩阵划分策略和基于迭代归并的矩阵重构策略,实现了少量零填充和块间负载平衡。实验结果表明,在NVIDIA? Tesla? V100平台上,相比于DIA、HDC、HDIA和DIA-Adaptive格式,在时间性能方面,该存储格式分别取得了20.76,1.94,1.13和2.26倍加速;在浮点计算性能方面,分别提高了1.54,5.28,1.13和1.94倍。  相似文献   

2.
稀疏矩阵向量乘(SpMV)是科学计算中常用的内核之一,其运行速率跟非零元分布相关.针对对角线稀疏矩阵,提出了压缩行片段对角(compressed row segment diagonal,CRSD)存储格式.它利用“对角线格式”有效描述矩阵的对角线分布,区别于以往通用的计算方法,CRSD通过对给定应用的对角线稀疏矩阵采样再进行特定的优化.并且在软件安装阶段,通过自适应的方法选取适合具体运行平台的最优SpMV实现.在CPU端进行多线程并行化实现时,自适应调优过程中收集的信息还被用于线程间任务划分,以实现负载平衡.同时完成CRSD存储格式在GPU端的实现,并根据GPU端计算与访存的特点进行优化.实验结果表明:在Intel和AMD的多核平台使用相同线程数的情况下,与DIA相比,使用CRSD的加速比可以达到2.37X(平均1.7X);与CSR相比,可以达到4.6X(平均2.1X).  相似文献   

3.
稀疏矩阵存储格式中的稀疏矩阵向量乘(SpMV)计算效率低下,且分块行列(BRC)存储格式的计算结果缺少再现性和确定性。为此,提出一种改进的BRCP存储格式。采用不同的二维分块策略,根据矩阵各行非零元素分布的统计特性自适应调节分块参数,提高SpMV在GPU平台上的并行性,并设计基于快速分段求和算法的GPU内核函数,保证计算结果的确定性及其在不同GPU平台上的再现性。实验结果表明,BRCP存储格式具有较高的计算效率,相比BRC存储格式可减少并行环境中的SpMV计算误差,并提高PageRank排序的准确率。  相似文献   

4.
稀疏矩阵与向量相乘SpMV是求解稀疏线性系统中的一个重要问题,但是由于非零元素的稀疏性,计算密度较低,造成计算效率不高。针对稀疏矩阵存在的一些不规则性,利用混合存储格式来进行SpMV计算,能够提高对稀疏矩阵的压缩效率,并扩大其适应范围。HYB是一种广泛使用的混合压缩格式,其性能较为稳定。而随着GPU并行计算得到普遍应用以及CPU日趋多核化,因此利用GPU和多核CPU构建异构并行计算系统得到了普遍的认可。针对稀疏矩阵的HYB存储格式中的ELL和COO存储特征,把两部分数据分别分割到CPU和GPU进行协同并行计算,既能充分利用CPU和GPU的计算资源,又能够发挥CPU和GPU的计算特性,从而提高了计算资源的利用效能。在分析CPU+GPU异构计算模式的特征的基础上,对混合格式的数据分割和共享方面进行优化,能够较好地发挥在异构计算环境的优势,提高计算性能。  相似文献   

5.
SpMV的自动性能优化实现技术及其应用研究   总被引:1,自引:0,他引:1  
在科学计算中,稀疏矩阵向量乘(SpMV)是一个十分重要且经常被大量调用的计算内核.由于SpMV一般实现算法的浮点计算和存储访问次数比率非常低,且其存储访问模式极为不规则,其实际运行性能往往很低.通过采用寄存器分块算法和启发式分块大小选择算法,将稀疏矩阵分成小的稠密分块,重用保存在寄存器中向量x元素,可以提高该计算内核的性能.剖析和总结了OSKI软件包所采用的若干关键优化技术,并进行了实际应用性能测试.测试表明,在实际应用这些优化技术的过程中,应用程序对SpMV的调用次数要达到上百次的量级,才能抵消由于应用这些性能优化技术所带来的额外时间开销,取得性能加速效果.在Pentium 4和AMD Athlon平台上,测试了10个矩阵,其平均加速比分别达到了1.69和1.48.  相似文献   

6.
刘丽  陈长波 《计算机应用》2023,(12):3856-3867
稀疏-稠密矩阵乘法(SpMM)广泛应用于科学计算和深度学习等领域,提高它的效率具有重要意义。针对具有带状特征的一类稀疏矩阵,提出一种新的存储格式BRCV(Banded Row Column Value)以及基于此格式的SpMM算法和高效图形处理单元(GPU)实现。由于每个稀疏带可以包含多个稀疏块,所提格式可看成块稀疏矩阵格式的推广。相较于常用的CSR(Compressed Sparse Row)格式,BRCV格式通过避免稀疏带中列下标的冗余存储显著降低存储复杂度;同时,基于BRCV格式的SpMM的GPU实现通过同时复用稀疏和稠密矩阵的行更高效地利用GPU的共享内存,提升SpMM算法的计算效率。在两种不同GPU平台上针对随机生成的带状稀疏矩阵的实验结果显示,BRCV的性能不仅优于cuBLAS(CUDA Basic Linear Algebra Subroutines),也优于基于CSR和块稀疏两种不同格式的cuSPARSE。其中,相较于基于CSR格式的cuSPARSE,BRCV的最高加速比分别为6.20和4.77。此外,将新的实现应用于图神经网络(GNN)中的SpMM算子的加速。在实际应...  相似文献   

7.
稀疏矩阵向量乘(SpMV)采取压缩行存储格式的算法性能非常差,而寄存器分块算法可以使得数据尽量在靠近处理器的存储层次中访问而提高性能.利用RAM(h)模型进行分析和比较不同算法形式的存储访问复杂度,可以比较两种算法的优劣.通过RAM(h)分析SpMV两种实现形式的存储访问复杂度,同时在奔腾四平台上,测试了7个稀疏矩阵的SpMV性能,并统计了这两种算法中L1,L2,和TLB的缺失率,实验结果与模型分析的数据一致.  相似文献   

8.
稀疏矩阵向量乘法(sparse matrix vector multiplication,SpMV)是科学和工程领域中重要的核心子程序之一,也是稀疏基本线性代数子程序(basic linear algebra subprograms,BLAS)库的重要函数.目前很多SpMV的优化工作在不同程度上获得了性能提升,但大多数优化工作针对特定存储格式或一类具有特定特征的稀疏矩阵缺乏通用性.因此高性能的SpMV实现并没有广泛地应用于实际应用和数值解法器中.另外,稀疏矩阵具有众多存储格式,不同存储格式的SpMV存在较大性能差异.根据以上现象,提出一个SpMV的自动调优器(SpMV auto-tuner,SMAT).对于一个给定的稀疏矩阵,SMAT结合矩阵特征选择并返回其最优的存储格式.应用程序通过调用SMAT来得到合适的存储格式,从而获得性能提升,同时随着SMAT中存储格式的扩展,更多的SpMV优化工作可以将性能优势在实际应用中发挥作用.使用佛罗里达大学的2 366个稀疏矩阵作为测试集,在Intel上SMAT分别获得9.11GFLOPS(单精度)和2.44GFLOPS(双精度)的最高浮点性能,在AMD平台上获得了3.36GFLOPS(单精度)和1.52GFLOPS(双精度)的最高浮点性能.相比Intel的核心数学函数库(math kernel library,MKL)数学库,SMAT平均获得1.4~1.5倍的性能提升.  相似文献   

9.
稀疏矩阵向量乘是很多科学计算问题中的核心问题。本文针对稀疏对角矩阵,在DIA存储格式的基础上,设计了一种新型压缩存储格式CDIA,结合CUDA编程模型的特点,在计算线程上进行了细粒度的任务分配,同时为满足CUDA对存储器的合并访问要求,将压缩矩阵做了相应的转置处理,设计了细粒度算法与程序,并根据稀疏矩阵向量乘特点,做了相应的程序优化。实验数据显示,这种存储格式能够很好地发挥CUDA在数据处理方面的优势,在测试数据中,最高获得了单精度39.6Gflop/s和双精度19.6Gflop/s的浮点计算性能,性能在Nathan Bell和Michael Garland的基础上分别提高了7.6%和17.4%。  相似文献   

10.
稀疏矩阵向量乘(SpMV)是求解稀疏线性方程组的计算核心,被广泛应用在经济学模型、信号处理等科学计算和工程应用中,对于SpMV及其调优技术的研究有助于提升解决相关领域问题的运算效率。传统SpMV自动调优方法基于硬件平台的体系结构参数设置来提升SpMV性能,但巨大的参数设置量导致搜索空间变大且自动调优耗时大幅增加。采用深度学习技术,基于卷积神经网络,构建由双通道稀疏矩阵特征融合以及稀疏矩阵特征与体系结构特征融合组成的SpMV运算性能预测模型,实现快速自动调优。为提高SpMV运算时间的预测精度,选取特征数据并利用箱形图统计SpMV时间信息,同时在佛罗里达稀疏矩阵数据集上进行实验设计与验证,结果表明,该模型的SpMV运算时间预测准确率达到80%以上,并且具有较强的泛化能力。  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号