期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	82篇
免费	7篇
国内免费	11篇

专业分类

综合类	2篇
金属工艺	1篇
机械仪表	2篇
建筑科学	3篇
无线电	6篇
一般工业技术	2篇
原子能技术	1篇
自动化技术	83篇

出版年

2022年	3篇
2021年	5篇
2019年	4篇
2018年	6篇
2017年	6篇
2016年	9篇
2015年	7篇
2014年	20篇
2013年	19篇
2012年	9篇
2011年	8篇
2010年	3篇
2004年	1篇

排序方式： 共有100条查询结果，搜索用时 15 毫秒

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] 下一页 » 末页»

A uniform approach for programming distributed heterogeneous computing systems

《Journal of Parallel and Distributed Computing》2014,74(12):3228-3239

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal.We assess libWater’s performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations. 相似文献

基于OpenCL的MD5破解算法

下载免费PDF全文

翁捷吴强杨灿群《计算机工程》2011,37(4):119-121

在基于GPU的异构平台上,采用开放计算语言(OpenCL)实现破解算法,利用分轮生成攻击密码、图形渲染管线加速存取以及多密码并行等方法对算法进行优化,在Intel四核CPU Q8230(2.3 GHz)和一片NVIDIA GT200组成的平台上进行实验。实验结果表明,在相同CPU平台上该算法能够获得高于破解软件John the ripper 17倍的破解速度。相似文献

移动智能终端的SIFT特征检测并行算法

甘威张素文雷震李怡凡《计算机科学》2016,43(Z6):165-167

特征的检测和匹配在计算机视觉应用中是一个重要的组成部分,如图像匹配、物体识别和视频跟踪等。SIFT算法以其尺度不变性和旋转不变性在图像配准领域得到了广泛应用。传统的SIFT算法效率低,因此提出一种在移动智能终端上实现的高效方法。在Android平台利用OpenCL框架实现了移动智能终端的SIFT算法,通过计算任务的重新分配,优化SIFT算法在移动GPU上的并行实现。实验结果表明,移动平台的SIFT算法充分利用了GPU并行计算能力,大大提高了SIFT算法的执行效率,实现了高效的特征检测。相似文献

并行化退火粒子滤波

卞亚涛赵旭宋健刘允才《吉林大学学报(工学版)》2013,(Z1):239-243

本文提出一种基于异构计算的并行化退火粒子滤波方法(P-APF),使用OpenCL框架实现了实时无标记运动跟踪任务。退火粒子滤波过程被分解成若干具有相应粒度的子任务。根据相应的并行度,每个计算任务被分配到标准或附属处理器进行处理,以充分利用OpenCL框架的异构计算能力。提出一种任务时延隐藏策略进一步减少时间消耗。在不同人体运动数据库的实验中,P-APF能在不降低跟踪精度的前提下实现实时处理。时间消耗随着粒子数或视角数目的增加基本保持不变,平均加速比为106。相似文献

An investigation of the performance portability of OpenCL

S.J. Pennycook S.D. Hammond S.A. Wright J.A. Herdman I. Miller S.A. Jarvis 《Journal of Parallel and Distributed Computing》2013

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation. 相似文献

Optimized transmission line matrix model implementation for graphics processing units computing in built-up environment

Gwenaël Guillaume Nicolas Fortin 《Journal of Building Performance Simulation》2014,7(6):445-456

The increase in computational capabilities has made time-domain methods applicable for long-range sound propagation modelling. However, such approaches remain very demanding in terms of computational resources. Most current computers are supplied with a powerful device which is still little exploited: the graphics processing unit (GPU). The paper describes an implementation of a transmission line matrix model which allows parallel calculations on heterogeneous systems. A voxelization algorithm used to generate the computational domain is presented. A splitting process is also expounded which makes feasible performing huge domains simulations by accurately dividing the computational domain into subdomains. Each subdomain is enlarged by introducing extra cells containing neighbours subdomains data in order to run several computational iterations on a graphic device without data exchange with the system memory. The influence of the ghost layer depth and the speeding up of computation time with GPU are then illustrated in a realistic built-up area. 相似文献

An application-centric evaluation of OpenCL on multi-core CPUs

Jie Shen Jianbin Fang Henk Sips Ana Lucia Varbanescu 《Parallel Computing》2013

Although designed as a cross-platform parallel programming model, OpenCL remains mainly used for GPU programming. Nevertheless, a large amount of applications are parallelized, implemented, and eventually optimized in OpenCL. Thus, in this paper, we focus on the potential that these parallel applications have to exploit the performance of multi-core CPUs. Specifically, we analyze the method to systematically reuse and adapt the OpenCL code from GPUs to CPUs. We claim that this work is a necessary step for enabling inter-platform performance portability in OpenCL. 相似文献

dOpenCL: Towards uniform programming of distributed heterogeneous multi-/many-core systems

Philipp Kegel Michel Steuwer Sergei Gorlatch 《Journal of Parallel and Distributed Computing》2013

Modern computer systems become increasingly distributed and heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the system’s full performance potential. In this paper, we present dOpenCL (distributed OpenCL)—a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL allows the user to run unmodified existing OpenCL applications in a heterogeneous distributed environment. We describe the challenges of implementing the OpenCL programming model for distributed systems, as well as its extension for running multiple applications concurrently. Using several example applications, we compare the performance of dOpenCL with MPI + OpenCL and standard OpenCL implementations. 相似文献

基于OpenCL的NDVI算法

熊英罗琼《计算机光盘软件与应用》2013,(18):99-100

随着计算机技术的不断发展,软件的规模也越来越大。一张遥感图像可达到数G以上,处理起来有时候可能需要数个小时。因此,针对这些大数据量的系统来说,加速比提高一倍,就会使运行时间减少几个小时,这对于系统来说就是一种非常可观的现实,非常值得去实现。本文将以NDVI算法为例,主要介绍了NDVI算法、NDVI的应用和性质、OpenCL介绍。相似文献

10.

Vector data flow analysis for SIMD optimizations on OpenCL programs

Yu‐Te Lin Jenq‐Kuen Lee 《Concurrency and Computation》2016,28(5):1629-1654

Multi‐core systems equipped with micro processing units and accelerators such as digital signal processors (DSPs) and graphics processing units (GPUs) have become a major trend in processor design in recent years in attempts to meet ever‐increasing application performance requirements. Open Computing Language (OpenCL) is one of the programming languages that include new extensions proposed to exploit the computing power of these kinds of processors. Among the newly extended language features, the single‐instruction multiple‐data (SIMD) linguistics and vector types are added to OpenCL to exploit hardware features of the accelerators. The addition makes it necessary to consider how traditional compiler data flow analysis can be adopted to meet the optimization requirements of vector linguistics. In this paper, we propose a calculus framework to support the data flow analysis of vector constructs for OpenCL programs that compilers can use to perform SIMD optimizations. We model OpenCL vector operations as data access functions in the style of mathematical functions. We then show that the data flow analysis for OpenCL vector linguistics can be performed based on the data access functions. Based on the information gathered from data flow analysis, we illustrate a set of SIMD optimizations on OpenCL programs. The experimental results incorporating our calculus and our proposed compiler optimizations show that the proposed SIMD optimizations can provide average performance improvements of 22% on x86 CPUs and 4% on advanced micro devices GPUs. For the selected 15 benchmarks, 11 of them are improved on x86 CPUs, and six of them are improved on advanced micro devices GPUs. The proposed framework has the potential to be used to construct other SIMD optimizations on OpenCL programs. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] 下一页 » 末页»