首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
方民权  张卫民  高畅  方建滨 《软件学报》2015,26(S2):247-256
高光谱遥感影像降维最大噪声分数变换(maximum noise fraction rotation,简称MNF rotation)方法运算量大,耗时长.基于多核CPU与众核MIC(many integrated cores)平台,研究MNF算法的并行方案和性能优化.通过热点分析,针对滤波、协方差矩阵运算和MNF变换等热点,提出相应并行方案和多种优化策略,量化分析优化效果,设计MKL(math kernel library)库函数实现方案并测评其性能;设计并实现基于多核CPU的C-MNF和基于CPU/MIC的M-MNF并行算法.实验结果显示,C-MNF算法在多核CPU取得的加速比为58.9~106.4,而基于CPU/MIC异构系统的M-MNF算法性能最好,加速比最高可达137倍.  相似文献   

2.
针对真实感渲染光线追踪流程中光线和场景求交计算量大、渲染速度慢的问题,提出一种基于Intel集成众核架构的并行光线追踪加速方法.在场景预处理阶段,首先构建四分支场景加速结构,以适应于MIC的硬件架构.在光线追踪阶段,首先通过CPU主核控制光线追踪整体流程,该主核采用多线程调度优化策略,调度MIC从核进行光线和场景树的求交操作,实现CPU和MIC的异步数据传输,充分利用主从核的计算能力;在MIC从核的光线和场景树求交过程中提出一种并行求交算法,充分利用MIC宽SIMD处理单元,实现光线和场景树4个结点并行求交的向量化操作,以加速求交过程.实验结果表明,与CPU原生模式相比,文中方法在光线求交阶段可达到2~4倍的加速效果,整体光线追踪流程渲染速度亦得到显著提升.  相似文献   

3.
基于MIC集群平台的GMRES算法并行加速   总被引:1,自引:0,他引:1  
王明清  李明  张清  张广勇  吴韶华 《计算机科学》2017,44(4):197-201, 240
广义极小残量法(GMRES)是最常用的求解非对称大规模稀疏线性方程组的方法之一,其收敛速度快且稳定性良好。Intel Xeon Phi众核协处理器(MIC)具有计算能力强、易编程、易移植等特点。采用MPI+OpenMP+offload混合编程模型将GMRES算法移植到MIC集群平台上。采用进程间集合通信异步隐藏、数据传输优化、向量化以及线程亲和性优化等多种手段,大幅提升了GMRES算法的求解效率。最后将并行算法应用到“局部径向基函数求解高维偏微分方程”问题的求解中。测试表明,CPU节点集群上开启32个进程,并行效率高达71.74%,4块MIC卡的最高加速性能可达单颗CPU的7倍。  相似文献   

4.
徐启迪  刘争红  郑霖 《计算机应用》2022,42(12):3841-3846
随着通信技术的发展,通信终端逐渐采用软件的方式来兼容多种通信制式和协议。针对以计算机中央处理器(CPU)作为运算单元的传统软件无线电架构,无法满足高速无线通信系统如多进多出(MIMO)等宽带数据的吞吐率要求问题,提出了一种基于图形处理器(GPU)的低密度奇偶校验(LDPC)码译码器的加速方法。首先,根据GPU并行加速异构计算在GNU Radio 4G/5G物理层信号处理模块中的加速表现的理论分析,采用了并行效率更高的分层归一化最小和(LNMS)算法;其次,通过使用全局同步策略、合理分配GPU内存空间以及流并行机制等方法减少了译码器的译码时延,同时配合GPU多线程并行技术对LDPC码的译码流程进行了并行优化;最后,在软件无线电平台上对提出的GPU加速译码器进行了实现与验证,并分析了该并行译码器的误码率性能和加速性能的瓶颈。实验结果表明,与传统的CPU串行码处理方式相比,CPU+GPU异构平台对LDPC码的译码速率可提升至原来的200倍左右,译码器的吞吐量可以达到1 Gb/s以上,特别是在大规模数据的情况下对传统译码器的译码性有着较大的提升。  相似文献   

5.
在大数据背景下,以K-Means为代表的聚类分析对于数据分析和挖掘十分重要。海量高维数据的处理给K-Means算法带来了性能方面的强烈需求。最新提出的众核体系结构MIC(many integrated core)能够为算法加速提供众核间线程级和核内指令级并行,使其成为K-Means算法加速的很好选择。在分析K-Means基本算法特点的基础上,分析了K-Means算法的瓶颈,提出了可利用数据并行的K-Means向量化算法,优化了向量化算法的数据布局方案。最后,基于CPU/MIC的异构架构实现了向量化K-Means算法,并且探索了MIC在非传统HPC(high performance computing)应用领域的优化策略。测试结果表明,K-Means向量化算法具有良好的计算性能和扩展性。  相似文献   

6.
针对SKINNY加密算法在中央处理器(CPU)下实现效率偏低的问题,提出一种基于图形处理器(GPU)的快速实现方法。首先,结合SKINNY算法的结构特征提出优化方案,将5个分步操作优化整合为1个整体运算;然后,分析该算法的电子密码本(ECB)模式和计数器(CTR)模式的特性,并给出并行粒度、内存分配等并行设计方案。实验结果表明,与传统的CPU实现方法下的SKINNY算法相比,基于计算统一设备架构(CUDA)实现的SKINNY算法的效率和吞吐量得到很大提升。具体来说,当处理的数据达到16 MB及以上时,在所提实现方法下,SKINNY算法的ECB模式的加速效率提升峰值为99.85%,加速比峰值为671,CTR模式的加速效率提升峰值为99.87%,加速比峰值为765;而与已有AES-256(ECB)和SKINNY_ECB并行算法比较,新提出的SKINNY-256(ECB)并行算法的吞吐量分别是它们的吞吐量的1.29倍和2.55倍。  相似文献   

7.
有限差分算法是一种基于偏微分方程的数值离散方法,被广泛应用于弹性波传播问题的数值模拟中。该算法访存跨度大、计算密度高、CPU利用率低,这在实际应用中成为了性能瓶颈。针对上述问题,在详析3D有限差分算法(3DFD)的基础上,基于Intel MIC架构,采用三步递进法对其进行优化:首先,通过分支消除、循环展开、不变量外提等基本优化法削减计算强度并为向量化扫除障碍;然后,通过分析数据依赖及循环分块,使用向量指令集改写核心算法等并行优化法,充分利用MIC协处理器多线程、长向量的机制;最后,在异构众核平台(CPU+MIC:Many Integra-ted Cores)下通过数据传输最小化、负载均衡等异构协同优化法实现CPU和MIC的并行计算。实验验证,与原有算法相比,优化后的算法在异构平台上获得了50~120倍的加速。  相似文献   

8.
现代高能物理研究需要使用高能量的粒子加速器,加速器束流动力学模拟软件具有重要的实用意义. 介绍了一个3维基于MIC的异构直线加速器并行束流动力学模拟软件NEWBEAM-MIC的开发进展. 目的是使用最新的超级异构计算机提高束流动力学模拟软件的性能,更好地完成加速器的设计和优化工作. 这个软件模拟了DTL和SOLENOID加速器装置中粒子的运动过程. NEWBEAM-MIC是在NEWBEAM-CPU软件基础上,将粒子推进部分分配到MIC卡上运行,从而利用MIC多线程的优势使计算加速的. 通过实际测试,这个软件在天河二号上使用100 CPUs和100 MICs可以模拟109个粒子,其中DTL场力计算、SOLENOID场力计算和粒子推进三个部分均可以比仅使用100 CPUs的NEWBEAM软件有100倍以上的加速效果. 再考虑MIC卡上的多线程,对同样规模的粒子,使用100 CPUs 和 100 MICs,当MIC线程数开到最大(224)时,NEWBEAM-MIC可以比单线程串行计算方式加速10000倍以上. 这表明本文开发的基于MIC的异构软件可以很好地加速原有的CPU软件,发挥现有MIC异构超级计算机的潜在性能.  相似文献   

9.
在多核中央处理器(CPU)—图形处理器(GPU)异构并行体系结构上,采用OpenMP和计算统一设备架构(CUDA)编程实现了基于AMBER力场的蛋白质分子动力学模拟程序。通过合理地将程序划分为CPU单线程、CPU多线程和GPU多线程执行部分,高效地利用了计算机的处理能力。性能测试结果表明,相对于优化后的CPU串行计算,多核CPU-GPU异构并行计算模型有强大的性能优势,特别是将占整个程序执行时间90%的作用力的计算移植到GPU上执行,获得了最高可达12倍的计算加速比。  相似文献   

10.
BLAS (Basic Linear Algebra Subprograms)是一个以向量和矩阵为操作对象的基础函数库.该库中函数分为3个级别,各个级别分别提供了向量-向量(1级)、向量-矩阵(2级)、矩阵-矩阵(3级)之间的基本运算.本文研究如何在申威众核处理器上BLAS-1、2级函数的并行实现,并充分利用平台特性对它们进行深度的性能调优,归纳总结程序在申威平台上的并行实现与优化技巧.申威26010 CPU采用了异构众核架构,众多计算核心提供的大规模并行处理能力,使单块芯片具有3 TFLOPS的双精度浮点计算性能.实验结果显示BLAS-1、2级函数相对于GotoBLAS参考实现版的平均加速比分别高达11.x和6.x,对于每一优化手段,均有明显的性能加速.  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号