期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

寿华好何苹缪永伟《计算机工程与应用》2010,46(1):150-153

自动微分是用于计算多变量函数的导数和偏导数的一种微分技术,在给定一个多变量光滑函数值的程序代码后,可以很容易地利用自动微分来实现有关导数和偏导数的精确计算。将自动微分技术与泰勒方法相结合应用到计算机图形学领域隐式函数曲线绘制的细分算法中,并与未使用自动微分技术前的隐式曲线绘制方法作比较和分析,展示了自动微分方法在绘制隐式曲线方面的优势。相似文献

2.

基于GPU的并行Cholesky分解及其应用

沈雁戴瑜兴《计算机工程》2019,45(2):284-289

在OpenCL并行计算框架的clMAGMA库中,Cholesky分解算法采用大尺寸分块并行方法,不能充分利用GPU的高速局部存储器,且在计算过程中存在多次GPU-CPU间的数据传递。为此,提出采用小尺寸分块并行方法,充分利用GPU中的高速局部存储器,使矩阵子块的逆矩阵得到复用,完成对称正定矩阵的高效Cholesky分解,并且其能够应用于三维视觉光束平差问题中的大型正定矩阵的分解。实验结果表明,该方法的Cholesky分解速度比clMAGMA提升50%以上,针对光束平差问题,比Ceres Solver中使用的Eigen库速度提升约38倍。相似文献

3.

Altera发布面向FPGA的OpenCL计划

《单片机与嵌入式系统应用》2012,12(1):23-23

Altera公司发布FPGA和SoC FPGA的开放计算语言（OpenCL）标准开发计划。OpenCL标准是基于C语言的开放标准,适用于并行编程。Altera的OpenCL计划结合了FPGA的并行能力以及OpenCL标准,实现强大的系统加速功能。相似文献

4.

基于光束平差法的双目视觉里程计研究

罗杨宇刘宏林《控制与决策》2016,31(11):1936-1944

机器人自定位是实现机器人自动导航及其他智能行为的前提, 一种基于光束平差法的移动机器人双目视觉里程计可以有效地实现机器人自定位. 为此, 首先采用点模式匹配方法建立相邻图像之间的特征匹配关系, 根据立体视觉算法得到匹配点对的三维对应关系; 然后, 计算摄像机的相对运动参数, 并采用光束平差分段优化算法对其进行优化. 所提出的双目视觉里程计能够避免车轮半径变化、空转、打滑等对里程计测量精度的影响, 相对定位精度较高.

相似文献

5.

敏度分析的数值方法比较研究 总被引：1，自引：0，他引：1

蒋占四吴义忠蒋慧《计算机与数字工程》2009,37(5)

为提高优化方法中敏度计算的精度和效率,比较研究了拟梯度、复数步长微分和自动微分三种数值方法原理及实现.采用操作符重载和模板技术实现复数步长微分和自动微分,并将自动微分前向模式和反向模式结合起来计算Hesse矩阵.研究表明:拟梯度能有效地减少函数求值次数,但对高度非线性函数其计算精度较差;复数步长微分方法简单,能得到机器精度的梯度值,但需要处理复数,计算量较大;自动微分在函数求值的同时并行计算高阶精确导数值,计算量适中. 相似文献

6.

TensorFlow中OpenCL核函数的实现与优化

陈锐孙羽菲程大果郭强陈禹乔石昌青隋轶丞张宇哲张玉志《计算机学报》2022,(11):2456-2474

目前,异构计算技术已经被广泛应用于人工智能领域,旨在利用以GPGPU为主的并行加速设备和CPU协同工作,更高效地完成大规模的并行计算.深度学习模型的构建、训练以及推理离不开机器学习框架的支持,但目前主流的机器学习框架基本仅支持CUDA异构编程模型.CUDA的私有性和封闭性导致机器学习框架严重依赖于英伟达GPGPU.众多其它厂商的硬件加速器,尤其是国产加速器难以充分发挥其在深度学习中的潜力.使用开源统一异构编程标准OpenCL代替私有的CUDA编程模型,是打破这一技术壁垒的有效方法.本文提出了TensorFlow中CUDA到OpenCL核函数的代码转换方案,总结整理了核函数转换的基本规则、典型难点问题的解决方法以及OpenCL核函数的性能优化等关键技术.本文首次完成了TensorFlow 2.2版本中135个OpenCL核函数的实现.经一系列测试验证,转换生成的135个OpenCL核函数能够在多种支持OpenCL标准的加速器上正确运行,优化后,近八成的OpenCL核函数在英伟达Tesla V100S上达到了与CUDA核函数相当的计算性能.测试结果验证了本文提出的CUDA到OpenCL核函... 相似文献

7.

无人机影像序贯实时平差 总被引：1，自引：0，他引：1

于英张永生薛武卢学良《遥感信息》2015,(1):22-25

无人机已经成为突发事件应急测绘的首要选择,但采用传统的POS辅助光束法平差仍是一种"延迟"事后处理的方法。本文将平差中序贯平差的方法作用于POS辅助光束法平差,极大地减少了平差的计算量。采用真实数据进行了实验,实验结果表明本文方法的测量平差精度优于40cm,且达到了实时平差的效果。相似文献

8.

自动微分转换系统及其应用

程强王斌马再忠《数值计算与计算机应用》2003,24(4):276-284

§1.引言计算微分大致经历了从差商微分,符号微分,手写代码到自动微分几个阶段,与其它几种微分方法相比,自动微分具有代码简练、计算精度高及投入人力少等优点。自动微分实现相似文献

9.

OpenCL的动态执行模式在静态编译支持下的实现

文延华何王全尉红梅《计算机应用与软件》2014,(10)

OpenCL的动态执行模式要求底层平台支持device文件的动态生成、编译和加载运行。对于不具备这些特性的平台,必须从软件层面考虑支持方法。通过采用函数更名技术解决同名函数正确识别问题,基于动态执行流的predo策略可以在静态编译环境下实现OpenCL的动态执行模式。相似文献

10.

基于多维伪随机序列的高级包标记策略算法

唐燕闾国年张红《计算机应用》2016,36(11):3093-3097

高级包标记策略（AMS）是对分布式拒绝服务（DDoS）攻击进行IP追踪的有效算法,但是,由于使用哈希函数实现边地址的压缩,AMS算法存在复杂度高、保密性差、误报率高等缺陷。为了提高追踪效率,设计了一种基于多维伪随机序列的AMS算法：一方面,在路由器上,以全硬件实现的边采样矩阵代替原有的哈希函数,完成IP地址的压缩编码;另一方面,在受害者端,结合边地址压缩码和边的权重计算过程,实现攻击路径图的输出。仿真实验中,基于多维伪随机序列的AMS算法与原始算法性能基本一致,但能有效减少误判的发生和快速判断伪造路径。实验结果表明,所提算法保密性能高,计算速度快,抗攻击能力强。相似文献

11.

Efficient and accurate derivatives for a software process chain in airfoil shape optimization

C.H. Bischof H.M. Bücker B. Lang A. Rasch E. Slusanschi 《Future Generation Computer Systems》2005,21(8):1421-1344

When using a Newton-based numerical algorithm to optimize the shape of an airfoil with respect to certain design parameters, a crucial ingredient is the derivative of the objective function with respect to the design parameters. In large-scale aerodynamics, this objective function is an output of a computational fluid dynamics program written in a high-level programming language such as Fortran or C. Numerical differentiation is commonly used to approximate derivatives but is subject to truncation and subtractive cancellation errors. For a particular two-dimensional airfoil, we instead apply automatic differentiation to compute accurate derivatives of the lift and drag coefficients with respect to geometric shape parameters. In automatic differentiation, a given program is transformed into another program capable of computing the original function together with its derivatives. In the problem at hand, the objective function consists of a sequence of programs: a MATLAB program followed by two Fortran 77 programs. It is shown how automatic differentiation is applied to a sequence of programs while keeping the computational complexity within reasonable limits. The derivatives computed by automatic differentiation are compared with approximations based on divided differences. 相似文献

12.

Enabling OpenCL support for GPGPU in Kernel‐based Virtual Machine

Tsan‐Rong Tien Yi‐Ping You 《Software》2014,44(5):483-510

The importance of heterogeneous multicore programming is increasing, and Open Computing Language (OpenCL) is an open industrial standard for parallel programming that provides a uniform programming model for programmers to write efficient, portable code for heterogeneous computing devices. However, OpenCL is not supported in the system virtualization environments that are often used to improve resource utilization. In this paper, we propose an OpenCL virtualization framework based on Kernel‐based Virtual Machine with API remoting to enable multiplexing of multiple guest virtual machines (guest VMs) over the underlying OpenCL resources. The framework comprises three major components: (i) an OpenCL library implementation in guest VMs for packing/unpacking OpenCL requests/responses; (ii) a virtual device, called virtio‐CL, that is responsible for the communication between guest VMs and the hypervisor (also called the VM monitor); and (iii) a thread, called CL thread, that is used for the OpenCL API invocation. Although the overhead of the proposed virtualization framework is directly affected by the amount of data to be transferred between the OpenCL host and devices because of the primitive nature of API remoting, experiments demonstrated that our virtualization framework has a small virtualization overhead (mean of 6.8%) for six common device‐intensive OpenCL programs and performs well when the number of guest VMs involved in the system increases. These results indirectly infer that the framework allows for effective resource utilization of OpenCL devices.Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

13.

面向异构架构的传递闭包并行算法

肖汉郭宝云李彩林周清雷《计算机工程》2021,47(8):131-139

传统求图传递闭包的方法存在计算量大与计算时间长的问题。为加快处理大数据量的传递闭包算法的计算速度,结合算法密集计算和开放式计算语言（OpenCL）框架的特征,采用本地存储器优化的并行子矩阵乘和分块的矩阵乘并行计算,提出一种基于OpenCL的传递闭包并行算法。利用本地存储器优化的并行子矩阵乘算法来优化计算步骤,提高图形处理器（GPU）的存储器利用率,降低数据获取延迟。通过分块矩阵乘并行计算算法实现大数据量的矩阵乘,提高GPU计算核心的利用率。数据结果表明,与CPU串行算法、基于开放多处理的并行算法和基于统一设备计算架构的并行算法相比,传递闭包并行算法在OpenCL架构下NVIDIA GeForce GTX 1070计算平台上分别获得了593.14倍、208.62倍和1.05倍的加速比。相似文献

14.

一种面向异构计算的结构化并行编程框架 总被引：1，自引：0，他引：1

李安民计卫星廖心怡高建花谈兆年王一拙石峰《计算机工程与科学》2019,41(3):424-432

随着人工智能时代的到来,异构计算在深度学习、科学计算等领域发挥着越来越重要的作用。目前异构计算系统在应用上的瓶颈之一在于缺少高效的软件开发框架,已有的OpenCL、CUDA等支持GPU、DSP及FPGA的编程框架基于C/C++语言和传统的并行编程方法,导致软件开发效率较低,软件推理和调试困难,难以灵活处理计算设备之间的协作和调度。提出一种面向异构计算平台的基于脚本语言的结构化并行编程框架,提供结构化的并行编程接口,支持计算任务到异构计算设备的映射,便于并行程序的推理和验证。设计并实现了基于遗传算法的结构化调度算法,充分利用异构计算系统的计算能力,提高了异构计算系统的软件开发效率。实验结果表明,提出的编程框架在CPU+GPU平台上实现了相对于单处理器1.5到2.5倍的加速比。相似文献

15.

Parallel implementation and optimization of high definition video real-time dehazing

Huailiang Tan Xiaofei He Zijian Wang Gaoming Liu 《Multimedia Tools and Applications》2017,76(22):23413-23434

In some warning applications, such as aircraft taking-off and landing, ship sailing, and traffic guidance in foggy weather, the high definition (HD) and rapid dehazing of images and videos is increasingly necessary. Existing technologies for the dehazing of videos or images have not completely exploited the parallel computing capacity of modern multi-core CPU and GPU, and leads to the long dehazing time or the low frame rate of video dehazing which cannot meet the real-time requirement. In this paper, we propose a parallel implementation and optimization method for the real-time dehazing of the high definition videos based on a single image haze removal algorithm. Our optimization takes full advantage of the modern CPU+GPU architecture, which increases the parallelism of the algorithm, and greatly reduces the computational complexity and the execution time. The optimized OpenCL parallel implementation is integrate into FFmpeg as an independent module. The experimental results show that for a single image, the performance of the optimized OpenCL algorithm is improved approximately 500% compared with the existing algorithm, and approximately 153% over the basic OpenCL algorithm. The 1080p (1920?×?1080) high definition hazy video can also processed at a real-time rate (more than 41 frames per second). 相似文献

16.

基于Chan-Vese模型的面向多核CPU和GPU的人脸轮廓提取并行算法

王丽娜史晓华《计算机应用》2014,34(11):3121-3125

针对人脸轮廓提取中Chan-Vese模型计算量大、分割速度缓慢等问题,采用开放计算语言(OpenCL)并行编程模型,提出了一种基于图形处理器(GPU)和多核CPU加速的并行算法。该算法首先将模型的框架进行重构,消除模型中的数据依赖关系;然后,利用开放计算语言对算法进行并行化以及相应的优化。实验结果表明,与单线程算法相比,在NVIDIA GTX660和AMD FX-8530下达到了较高的加速比。相似文献

17.

基于HXDSP的OpenCL运行时任务调度

顾经纬宁成明郑启龙《计算机系统应用》2022,31(11):130-138

OpenCL是一种开源免费的异构计算框架,被各类架构处理器广泛采用. HXDSP是中国电子科技集团公司第38研究所自主研发的国产高性能DSP芯片.为了解决HXDSP异构计算平台调度困难和硬件利用不充分,本文针对OpenCL运行时任务调度系统展开研究,设计了OpenCL运行时期间的任务图自动化提取方法,并结合HXDSP硬件特性和OpenCL执行模型特性对经典的静态调度算法HEFT进行改进,提出了一种异构双粒度最早完成时间优先调度算法HDGEFT,并在HXDSP异构计算平台上设计实验验证算法.实验结果表明经过特殊设计的调度算法在执行效率上有明显优势. 相似文献

18.

基于OpenCL的雷达外推算法改进与优化

王兴 ;苗春生 ;王秀君 ;樊仲欣《计算机与现代化》2014,(8):81-86

基于雷达资料的外推是临近预报中重要的方法之一,随着全国气象雷达网络建设规模的不断提高以及观测资料精细化程度的提升,基于区域乃至全国雷达拼图的外推预报,每次计算都需花费大量时间,甚至滞后于每6分钟一次的资料观测频次。为解决传统外推算法运算复杂度高,实时性差的问题,运用OpenCL构建基于GPU的异构计算模型对外推算法进行并行化改进。然后逐步分析影响算法性能的瓶颈,并通过改进和测试数据比对,阐述算法优化的过程。其中,内存与线程的映射优化、合理利用局部存储器作为高速缓存以及隐藏CPU执行时间等方法不仅对本算法的执行效率带来显著提升,也可为其他基于OpenCL异构计算的优化提供参考。以AMD Graphic Core Next和Northern Islands二代GPU架构作为测试平台,并以Intel CPU并行计算作为测试参考,测试结果表明,改进后的算法在硬件同等功耗的情况下,计算性能提升15~22倍。相似文献

19.

A GPGPU based program to solve the TDSE in intense laser fields through the finite difference approach

Cathal Ó Broin L.A.A. Nikolopoulos 《Computer Physics Communications》2014

We present a General-purpose computing on graphics processing units (GPGPU) based computational program and framework for the electronic dynamics of atomic systems under intense laser fields. We present our results using the case of hydrogen, however the code is trivially extensible to tackle problems within the single-active electron (SAE) approximation. Building on our previous work, we introduce the first available GPGPU based implementation of the Taylor, Runge–Kutta and Lanczos based methods created with strong field ab-initio simulations specifically in mind; CLTDSE. The code makes use of finite difference methods and the OpenCL framework for GPU acceleration. The specific example system used is the classic test system; Hydrogen. After introducing the standard theory, and specific quantities which are calculated, the code, including installation and usage, is discussed in-depth. This is followed by some examples and a short benchmark between an 8 hardware thread (i.e. logical core) Intel Xeon CPU and an AMD 6970 GPU, where the parallel algorithm runs 10 times faster on the GPU than the CPU. 相似文献

20.

Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform

Zhiyong Yuan Weixin Si Xiangyun Liao Zhaoliang Duan Yihua Ding Jianhui Zhao 《The Journal of supercomputing》2012,61(1):84-102

Open Computing Language (OpenCL) is an open royalty-free standard for general purpose parallel programming across Central Processing Units (CPUs), Graphic Processing Units (GPUs) and other processors. This paper introduces OpenCL to implement real-time smoking simulation in a virtual surgery training simulation system. Firstly, the Computational Fluid Dynamics (CFD) is adopted to construct the real-time smoking simulation model based on the Navier?CStokes (N-S) equations of an incompressible fluid under the condition of normal temperature and pressure. Then we propose a parallel computing technique based on OpenCL to accomplish the parallel computing of smoking simulation model on CPU and GPU, respectively. Finally, we render the smoke in real time by using a three-dimensional (3D) texture volume rendering method. Experimental results show that the parallel computing technique we have proposed achieve a satisfactory effect on image quality and rendering rate both on CPU and GPU. 相似文献