期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	82篇
免费	7篇
国内免费	11篇

专业分类

综合类	2篇
金属工艺	1篇
机械仪表	2篇
建筑科学	3篇
无线电	6篇
一般工业技术	2篇
原子能技术	1篇
自动化技术	83篇

出版年

2022年	3篇
2021年	5篇
2019年	4篇
2018年	6篇
2017年	6篇
2016年	9篇
2015年	7篇
2014年	20篇
2013年	19篇
2012年	9篇
2011年	8篇
2010年	3篇
2004年	1篇

排序方式： 共有100条查询结果，搜索用时 15 毫秒

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] 下一页 » 末页»

A uniform approach for programming distributed heterogeneous computing systems

《Journal of Parallel and Distributed Computing》2014,74(12):3228-3239

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal.We assess libWater’s performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations. 相似文献

基于OpenCL的MD5破解算法

下载免费PDF全文

翁捷吴强杨灿群《计算机工程》2011,37(4):119-121

在基于GPU的异构平台上,采用开放计算语言(OpenCL)实现破解算法,利用分轮生成攻击密码、图形渲染管线加速存取以及多密码并行等方法对算法进行优化,在Intel四核CPU Q8230(2.3 GHz)和一片NVIDIA GT200组成的平台上进行实验。实验结果表明,在相同CPU平台上该算法能够获得高于破解软件John the ripper 17倍的破解速度。相似文献

移动智能终端的SIFT特征检测并行算法

甘威张素文雷震李怡凡《计算机科学》2016,43(Z6):165-167

特征的检测和匹配在计算机视觉应用中是一个重要的组成部分,如图像匹配、物体识别和视频跟踪等。SIFT算法以其尺度不变性和旋转不变性在图像配准领域得到了广泛应用。传统的SIFT算法效率低,因此提出一种在移动智能终端上实现的高效方法。在Android平台利用OpenCL框架实现了移动智能终端的SIFT算法,通过计算任务的重新分配,优化SIFT算法在移动GPU上的并行实现。实验结果表明,移动平台的SIFT算法充分利用了GPU并行计算能力,大大提高了SIFT算法的执行效率,实现了高效的特征检测。相似文献

基于OpenCL的图像积分图算法优化研究 总被引：1，自引：0，他引：1

贾海鹏张云泉徐建良《计算机科学》2013,40(2):1-7

图像积分图算法在快速特征检测中有着广泛的应用,通过GPU对其进行性能加速有着重要的现实意义。然而由于GPU硬件架构的复杂性和不同硬件体系架构间的差异性,完成图像积分图算法在GPU上的优化,进而实现不同GPU平台间的性能移植是一件非常困难的工作。在分析不同CPU平台底层硬件架构的基础上,从片外访存带宽利用率、计算资源利用率和数据本地化等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响。并在此基础上实现了基于OpenCL的图像积分图算法。实验结果表明,优化后的算法在AMD和NVIDIA CPU上分别取得了11.26和12.38倍的性能加速,优化后的GPU kernel比NVIDIA NPP库中的相应函数也分别取得了55.01%和65.17%的性能提升。验证了提出的优化方法的有效性和性能可移植性。相似文献

并行化退火粒子滤波

卞亚涛赵旭宋健刘允才《吉林大学学报(工学版)》2013,(Z1):239-243

本文提出一种基于异构计算的并行化退火粒子滤波方法(P-APF),使用OpenCL框架实现了实时无标记运动跟踪任务。退火粒子滤波过程被分解成若干具有相应粒度的子任务。根据相应的并行度,每个计算任务被分配到标准或附属处理器进行处理,以充分利用OpenCL框架的异构计算能力。提出一种任务时延隐藏策略进一步减少时间消耗。在不同人体运动数据库的实验中,P-APF能在不降低跟踪精度的前提下实现实时处理。时间消耗随着粒子数或视角数目的增加基本保持不变,平均加速比为106。相似文献

基于OpenCL的图像模糊化算法优化研究

张樱张云泉龙国平《计算机科学》2012,39(3):260-264

现代GPU一般都提供特定硬件(如纹理部件、光栅化部件及各种片上缓存)以加速二维图像的处理和显示过程,相应的编程模型(CUDA、OpenCL)都定义了特定程序设计接口(CUDA的纹理内存,OpenCL的图像对象)以便图像应用能利用相关硬件支持。以典型图像模糊化处理算法在AMD平台GPU的优化为例,探讨了OpenCL的图像对象在图像算法优化上的适用范围,尤其是分析了其相对于更通用的基于全局内存加片上局部存储进行性能优化的方法的优劣。实验结果表明,图像对象只有在图像为四通道且计算过程中需要缓存的数据量较小时才能带来较好的性能改善,其余情况采用全局内存加局部存储都能获得较好性能。优化后的算法性能相对于精心实现的CPU版加速比为200～1000;相对于NVIDIA NPP库相应函数的性能加速比为1.3～5。相似文献

An investigation of the performance portability of OpenCL

S.J. Pennycook S.D. Hammond S.A. Wright J.A. Herdman I. Miller S.A. Jarvis 《Journal of Parallel and Distributed Computing》2013

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation. 相似文献

Optimized transmission line matrix model implementation for graphics processing units computing in built-up environment

Gwenaël Guillaume Nicolas Fortin 《Journal of Building Performance Simulation》2014,7(6):445-456

The increase in computational capabilities has made time-domain methods applicable for long-range sound propagation modelling. However, such approaches remain very demanding in terms of computational resources. Most current computers are supplied with a powerful device which is still little exploited: the graphics processing unit (GPU). The paper describes an implementation of a transmission line matrix model which allows parallel calculations on heterogeneous systems. A voxelization algorithm used to generate the computational domain is presented. A splitting process is also expounded which makes feasible performing huge domains simulations by accurately dividing the computational domain into subdomains. Each subdomain is enlarged by introducing extra cells containing neighbours subdomains data in order to run several computational iterations on a graphic device without data exchange with the system memory. The influence of the ghost layer depth and the speeding up of computation time with GPU are then illustrated in a realistic built-up area. 相似文献

An application-centric evaluation of OpenCL on multi-core CPUs

Jie Shen Jianbin Fang Henk Sips Ana Lucia Varbanescu 《Parallel Computing》2013

Although designed as a cross-platform parallel programming model, OpenCL remains mainly used for GPU programming. Nevertheless, a large amount of applications are parallelized, implemented, and eventually optimized in OpenCL. Thus, in this paper, we focus on the potential that these parallel applications have to exploit the performance of multi-core CPUs. Specifically, we analyze the method to systematically reuse and adapt the OpenCL code from GPUs to CPUs. We claim that this work is a necessary step for enabling inter-platform performance portability in OpenCL. 相似文献

10.

dOpenCL: Towards uniform programming of distributed heterogeneous multi-/many-core systems

Philipp Kegel Michel Steuwer Sergei Gorlatch 《Journal of Parallel and Distributed Computing》2013

Modern computer systems become increasingly distributed and heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the system’s full performance potential. In this paper, we present dOpenCL (distributed OpenCL)—a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL allows the user to run unmodified existing OpenCL applications in a heterogeneous distributed environment. We describe the challenges of implementing the OpenCL programming model for distributed systems, as well as its extension for running multiple applications concurrently. Using several example applications, we compare the performance of dOpenCL with MPI + OpenCL and standard OpenCL implementations. 相似文献

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] 下一页 » 末页»