共查询到20条相似文献,搜索用时 31 毫秒
1.
Building a visual hull model from multiple two-dimensional images provides an effective way of understanding the three-dimensional geometries inherent in the images. In this paper, we present a GPU accelerated algorithm for volumetric visual hull reconstruction that aims to harness the full compute power of the many-core processor. From a set of binary silhouette images with respective camera parameters, our parallel algorithm directly outputs the triangular mesh of the resulting visual hull in the indexed face set format for a compact mesh representation. Unlike previous approaches, the presented method extracts a smooth silhouette contour on the fly from each binary image, which markedly reduces the bumpy artifacts on the visual hull surface due to a simple binary in/out classification. In addition, it applies several optimization techniques that allow an efficient CUDA implementation. We also demonstrate that the compact mesh construction scheme can easily be modified for also producing a time- and space-efficient GPU implementation of the marching cubes algorithm. 相似文献
2.
3.
In this work we describe some parallel algorithms for solving nonlinear systems using CUDA (Compute Unified Device Architecture) over a GPU (Graphics Processing Unit). The proposed algorithms are based on both the Fletcher–Reeves version of the nonlinear conjugate gradient method and a polynomial preconditioner type based on block two-stage methods. Several strategies of parallelization and different storage formats for sparse matrices are discussed. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment. 相似文献
4.
提出一种基于GPU的高程并行插值算法,实现了对三维地表上海量离散点的并行加速渲染。通过高程纹理组织三维地表网格高程数据作为离散点渲染的基础,并通过GLSL编写GPU着色器程序动态控制图形渲染管线,实现视点相关的高程并行插值算法。实验结果表明,提出的基于GPU的高程并行插值算法较传统的内存插值算法,将三维地表上海量离散点的渲染量级从百万级提高到了千万级。 相似文献
5.
为提高协同过滤算法的可伸缩性, 加快其运行速度, 提出了一种基于GPU(graphic processing unit)的并行协同过滤算法来实现高速并行处理。GPU的运算模式采用单指令多数据流, 适用于逻辑性弱、数据量巨大的运算, 而这正是协同过滤算法所具有的特点。使用统一计算设备框架(compute unified device architecture, CUDA)实现了此协同过滤算法。实验表明, 在中低端的GPU上该算法与在高端的四核CPU上的协同过滤算法相比, 其加速比达到40倍以上, 显著地提高了算法的可伸缩性, 而算法在准确率方面也有优秀的表现。 相似文献
6.
7.
We introduce a GPU-based parallel vertex substitution (pVS) algorithm for the p-median problem using the CUDA architecture by NVIDIA. pVS is developed based on the best profit search algorithm, an implementation of vertex substitution (VS), that is shown to produce reliable solutions for p-median problems. In our approach, each candidate solution in the entire search space is allocated to a separate thread, rather than dividing the search space into parallel subsets. This strategy maximizes the usage of GPU parallel architecture and results in a significant speedup and robust solution quality. Computationally, pVS reduces the worst case complexity from sequential VS’s O(p · n2) to O(p · (n ? p)) on each thread by parallelizing computational tasks on GPU implementation. We tested the performance of pVS on two sets of numerous test cases (including 40 network instances from OR-lib) and compared the results against a CPU-based sequential VS implementation. Our results show that pVS achieved a speed gain ranging from 10 to 57 times over the traditional VS in all test network instances. 相似文献
8.
9.
提出了一种针对离群数据规则挖掘的决策树构造方法。通过给出一个平均致密度的新定义和对离群数据产生机制的深入分析,提出离群数据的致密度往往比正常样本数据高的新认识,指出离群数据本质上也是不平衡数据,基于此提出了一种自动标记离群数据的新算法,并进一步在该算法和C4.5算法部分功能的基础上提出了一种基于离群数据自动标记的模糊决策树构造方法。仿真实验结果表明,该方法具有高效的离群数据规则挖掘能力,能处理不平衡数据,优化决策树的结构,挖掘出更高信任度的规则,有一定的实用价值。 相似文献
10.
Yue-Li Wang Hon-Chan Chen Wei-Kai Liu 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(12):1236-1240
A tree T is labeled when the n vertices are distinguished from one another by names such as v1, v2…vn . Two labeled trees are considered to be distinct if they have different vertex labels even though they might be isomorphic. According to Cayley's tree formula, there are nn-2 labeled trees on n vertices. Prufer used a simple way to prove this formula and demonstrated that there exists a mapping between a labeled tree and a number sequence. From his proof, we can find a naive sequential algorithm which transfers a labeled tree to a number sequence and vice versa. However, it is hard to parallelize. In this paper, we shall propose an O(log n) time parallel algorithm for constructing a labeled tree by using O(n) processors and O(n log n) space on the EREW PRAM computational model 相似文献
11.
In the past few decades, much success has been achieved in the use of artificial neural networks for classification, recognition, approximation and control. Flexible neural tree (FNT) is a special kind of artificial neural network with flexible tree structures. The most distinctive feature of FNT is its flexible tree structures. This makes it possible for FNT to obtain near-optimal network structures using tree structure optimization algorithms. But the modeling efficiency of FNT is always a problem due to its two-stage optimization. This paper designed a parallel evolving algorithm for FNT (PE-FNT). This algorithm uses PIPE algorithm to optimize tree structures and PSO algorithm to optimize parameters. The evaluation processes of tree structure populations and parameter populations were both parallelized. As an implementation of PE-FNT algorithm, two parallel programs were developed using MPI. A small data set, two medium data sets and three large data sets were applied for the performance evaluations of these programs. Experimental results show that PE-FNT algorithm is an effective parallel FNT algorithm especially for large data sets. 相似文献
12.
General Purpose Graphic Processing Unit (GPGPU) computing with CUDA has been effectively used in scientific applications, where huge accelerations have been achieved. However, while today’s traditional GPGPU can reduce the execution time of parallel code by many times, it comes at the expense of significant power and energy consumption. In this paper, we propose ubiquitous parallel computing approach for construction of decision tree on GPU. In our approach, we exploit parallelism of well-known ID3 algorithm for decision tree learning by two levels: at the outer level of building the tree node-by-node, and at the inner level of sorting data records within a single node. Thus, our approach not only accelerates the construction of decision tree via GPU computing, but also does so by taking care of the power and energy consumption of the GPU. Experiment results show that our approach outperforms purely GPU-based implementation and CPU-based sequential implementation by several times. 相似文献
13.
Robust optimization is a popular method to tackle uncertain optimization problems. However, traditional robust optimization can only find a single solution in one run which is not flexible enough for decision-makers to select a satisfying solution according to their preferences. Besides, traditional robust optimization often takes a large number of Monte Carlo simulations to get a numeric solution, which is quite time-consuming. To address these problems, this paper proposes a parallel double-level multiobjective evolutionary algorithm (PDL-MOEA). In PDL-MOEA, a single-objective uncertain optimization problem is translated into a bi-objective one by conserving the expectation and the variance as two objectives, so that the algorithm can provide decision-makers with a group of solutions with different stabilities. Further, a parallel evolutionary mechanism based on message passing interface (MPI) is proposed to parallel the algorithm. The parallel mechanism adopts a double-level design, i.e., global level and sub-problem level. The global level acts as a master, which maintains the global population information. At the sub-problem level, the optimization problem is decomposed into a set of sub-problems which can be solved in parallel, thus reducing the computation time. Experimental results show that PDL-MOEA generally outperforms several state-of-the-art serial/parallel MOEAs in terms of accuracy, efficiency, and scalability. 相似文献
14.
Diego José Bodas-Sagi Pablo Fernández-Blanco José Ignacio Hidalgo Francisco José Soltero-Domingo 《Natural computing》2013,12(2):195-207
This paper deals with the optimization of parameters of technical indicators for stock market investment. Price prediction is a problem of great complexity and, usually, some technical indicators are used to predict market trends. The main difficulty in using technical indicators lies in deciding a set of parameter values. We proposed the use of Multi-Objective Evolutionary Algorithms (MOEAs) to obtain the best parameter values belonging to a collection of indicators that will help in the buying and selling of shares. The experimental results indicate that our MOEA offers a solution to the problem by obtaining results that improve those obtained through technical indicators with standard parameters. In order to reduce execution time is necessary to parallelize the executions. Parallelization results show that distributing the workload of indicators in multiple processors to improve performance is recommended. This parallelization has been performed taking advantage of the idle time in a corporate technology infrastructure. We have configured a small parallel grid using the students Labs of a Computer Science University College. 相似文献
15.
In this paper we propose a simple GPU-based approach for discrete incremental approximation of 3D Voronoi diagram. By constructing region maps via GPU. Nearest sites, space clustering, and shortest distance query can be quickly answered by looking up the region map. In addition, we propose another representation of the 3D Voronoi diagram for visualization. 相似文献
16.
蒙特卡洛树搜索算法是一种常用的强化学习算法,博弈过程中动态空间的指数级增长是制约该算法学习效率的因素。基于并行方法对蒙特卡洛树搜索算法进行优化,提出基于胜率估值传递的并行蒙特卡洛树搜索算法。改进后的并行博弈搜索策略框架包含一个主进程和多个子进程,其中子进程用于探索,主进程根据子进程传递的胜率估值数据进行决策。结合多智能体博弈平台Pommerman进行实验验证,与传统的蒙特卡罗树搜索算法相比,并行蒙特卡罗树搜索算法有效提高了资源利用率、博弈胜率及决策效率。 相似文献
17.
18.
19.
提出了三种新的GPU并行的自适应邻域模拟退火算法,分别是GPU并行的遗传-模拟退火算法,多条马尔可夫链并行的退火算法,基于BLOCK分块的GPU并行模拟退火算法,并通过对GPU端的程序采取合并内存访问,避免bank冲突,归约法等方式进一步提升了性能。实验中选取了11个典型的基准函数,实验结果证明这三种GPU并行退火算法比nonu-SA算法具有更好的精度和更快的收敛速度。 相似文献
20.
Room impulse response (RIR) simulation based on the image-source method is widely used in room acoustic research. The calculation of the RIR in computer has to digitalize sound propagation delay into discrete samples. To carefully consider the digitalization error greatly increases the massive computational load of the image-source method. Therefore many real-time audio applications simply round-off the propagation delay to its nearest sample. This approximation, however, especially when the sampling frequency is low, degrades the phase precision that is required by applications such as microphone array. In this paper, by involving a Hanning-windowed ideal low-pass filter to reduce the digitalization error, a more precise image-source model is studied. We analyze its parallel calculation procedure and propose to use Graphics Processing Unit (GPU) to accelerate the calculation speed. The calculation procedure is divided into many parallel threads and arranged according the GPU architecture and its optimization criteria. We evaluate the calculation speeds of different RIRs using a general 5-core CPU, an ordinary GPU (GTX750) and an advanced GPU (K20C). The results show that, with similar precise RIR results, the speedup ratios of GTX750 and K20C over the general CPU can achieve 20 and 120 respectively. 相似文献