首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
在国产异构众核平台神威·太湖之光上的非结构网格计算具有稀疏存储、离散访存、数据依赖等特点,严重制约了众核处理器的性能发挥。为解决稀疏存储和离散访存问题,提出一种N阶对角染色算法,以有效平衡主从核计算并利用从核将全局访存转化为LDM访问。针对数据依赖造成的计算竞争问题,采用自适应和无依赖的任务划分方法,避免并行计算时的数据冲突。为对处理器架构和非结构网格计算进行优化,采用主核与从核异步并行的方式,差异化使用主从核以充分利用硬件资源,同时,取消处理器提供的寄存器通信机制,降低从核阵列的同步开销同时便于扩展到新一代神威平台。此外,使用计算访存异步重叠技术来充分隐藏访存延迟。利用SpMV、Integration、calcLudsFcc算子进行实验,结果表明,相比主核实现,组合加速算法在不同算例规模下平均取得了10倍的加速效果,加速比最高可达24倍,N阶对角染色算法相比非染色分块算法取得了超过5.8倍的性能加速,有效提升了数据局部性和计算并行度。该算法对有依赖关系的计算冲突算子同样具有良好的加速性能,验证了自适应和无依赖任务划分方法的有效性。  相似文献   

2.
Palabos软件是一款基于格子玻尔兹曼算法(Lattice Boltzmann Method, LBM)的计算流体力学软件,因其优异的计算能力被广泛用于多孔介质、自由界面、颗粒运动、血液流动等计算流体力学领域。Palabos软件广泛的用户需求使其迫切需要在神威超算上进行移植优化和并行加速,服务于能源、化工行业。文中在新一代神威超算(SW26010pro)上对Palabos软件进行异构并行设计,针对Palabos的数据结构和模块化编程不利于神威众核编程的问题,通过直接取址,设置字段标记处理多态导致的分支、数据切片处理等优化思路;并结合新一代神威超算的特性,使用共享内存和寄存器通信的优化技术,实现众核加速2~6倍。同时实现Palabos软件在新一代神威超算上的复杂化工过程多尺度计算方向上两相流算法的百万核心规模的并行计算,以6.4万核心的并行计算规模为基准,百万核心的并行效率大于40%。  相似文献   

3.
张浩  花嵘 《计算机应用研究》2020,37(7):2022-2026
随着嵌入维数的增大,排列熵(permutation entropy,PE)算法的运算规模将会成倍增加,对计算的时效性提出了更高的要求。针对国际上首台计算性能超过100P的神威·太湖之光异构众核超级计算机,提出一种针对排列熵算法移植和并行化方法,核组之间基于MPI对相空间矩阵进行数据划分,核组内部基于OpenACC实现划分区域内部并行;然后针对SW26010众核处理器结构特征,调整减少主从核通信次数和消除原子操作,将排列熵算法成功移植并加速;最后通过大坝震荡数据进行测试。测试结果表明,该方法能够很好地发挥SW26010众核处理器加速优势,单核组性能较主核版本最高可获得7.18倍加速,同时在神威·太湖之光大规模集群上进行强可扩展性分析,128核组时最高实现了85.6倍的性能提升。  相似文献   

4.
大整数运算广泛地应用于公钥加密算法、大规模科学计算中高精度浮点数运算类以及构建大特征值等领域,然而其大部分算法空间和时间开销都很大,尤其对于核心运算之一的大整数乘法,当数据达到一定规模时,超长的串行计算时间已成为制约算法应用的巨大瓶颈.近几年来,伴随着多核、众核芯片的迅猛发展,通过充分挖掘算法本身的并行度以利用并行处理器的强大计算能力,进而高效地提升算法性能,成为一种研究趋势.本文基于通用多核并行计算平台,研究了大整数乘法Comba及Karatsuba快速算法的并行化,提出了高效的多核并行算法.在算法实现及性能优化上,采用了OpenMP+SIMD的多级并行技术,使性能获得巨大提升.在性能测试上,我们使用优化的并行算法与原始串行算法进行对比试验,结果显示,8线程并行Comba算法和Karatsuba算法相比串行对应算法分别实现了5.85倍以及6.14倍的性能加速比提升.  相似文献   

5.
由申威众核处理器组成的“神威·太湖之光”是当前我国性能最高的超级计算机,可为大规模NSGA-Ⅱ求解提供硬件平台。基于硬件架构特点,设计了分岛/主从增强混合并行NSGA-Ⅱ。在主从模式基础上,利用从核间寄存器通信,实现核组内从核局部数据存储的共享。优化流程,实现更多算法模块在从核上的并行。运用DMA传输、向量化、双缓冲、存储优化等方法显著提高加速比。实验表明,优化的并行NSGA-Ⅱ在申威众核处理器上具有良好的加速比和扩展性。  相似文献   

6.
k-means算法在面对大规模数据集时,计算时间将随着数据集的增大而成倍增长。为了提升算法的运算性能,设计了一种基于CUDA(Compute Unified Device Architecture)编程模型的并化行k-means算法,即GS_k-means算法。对k-means算法进行了并行化分析,在距离计算前,运用全局选择判断数据所属聚簇是否改变,减少冗余计算;在距离计算时,采用通用矩阵乘加速,加快计算速度;在簇中心点更新时,将所有数据按照簇标签排序分组,将组内数据简单相加,减少原子内存操作,从而提高整体性能。使用KDDCUP99数据集对改进算法进行实验,结果表明,在保证实验结果的准确性的情况下,改进算法加快了计算速度,与经典的GPUMiner算法相比加速比提升5倍。  相似文献   

7.
基于Spark的人工蜂群改进算法   总被引:1,自引:0,他引:1  
针对人工蜂群(ABC)算法求解组合优化问题时效率低的问题,提出了基于Spark云计算框架的并行ABC改进算法。首先,将蜂群划分为子蜂群并将蜂群构造为弹性分布式数据集,子蜂群使用广播机制交换优秀个体;然后,采用一系列转换算子,实现蜜蜂寻找解过程的并行化;最后,用万有引力质量计算代替轮盘赌概率计算,减少计算量。通过旅行商问题(TSP)求解说明了算法的可行性。实验结果表明:对比标准ABC算法,所提算法加速比最大达到3.24;对比未改进的并行ABC算法,该算法收敛速度提高约10%。所提算法在复杂问题求解方面优势更加明显。  相似文献   

8.
为满足文本检索、计算生物学等领域海量数据匹配对高性能计算的要求,提出一种基于计算统一设备架构(CUDA)的位并行近似串匹配算法。结合图形处理器(GPU)的高并行计算结构及存储带宽特性,通过优化数据存储方式,实现并行化动态规划矩阵算法(BPM)的加速,并对加速性能进行对比测试。实验结果表明,BPM算法通过GPU加速能获得20倍左右的加速比。  相似文献   

9.
王鑫  张铭 《计算机应用研究》2023,40(6):1745-1749
针对应用普通卷积结构的卷积计算复杂度较高、计算量与参数量较大的问题,提出以国产SW26010P众核处理器为平台的并行分组卷积算法。核心思想是利用独特的数据布局,通过多核映射处理进行并行计算。实验测试结果表明,与单核串行算法相比,使用该并行分组卷积算法可以获得79.5的最高加速比及186.7MFLOPS的最大有效算力。通过SIMD指令对并行分组卷积算法进行数据并行优化后,与使用优化前的并行分组卷积算法相比,可以获得10.2的最高加速比。  相似文献   

10.
kNN算法是机器学习和数据挖掘程序中经常使用的经典算法。随着数据量的增大,kNN算法的执行时间急剧上升。为了有效利用现代计算机的GPU等计算单元减少kNN算法的计算时间,提出了一种基于OpenCL的并行kNN算法,该算法对距离计算和排序两个瓶颈点进行并行化,在距离计算阶段使用细粒度并行化策略和优化的线程模型,排序阶段使用优化内存模型的双调排序。以UCI数据集letter为测试集,分别使用E8400和GTS450运行kNN算法进行测试,采用GPU加速的并行kNN算法的计算速度比CPU版提高了40.79倍。  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号