首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对多核处理器上并行程序执行不确定性所造成的并行调试难问题,提出了一种基于硬件的快速确定性重放方法——时间切割者。该方法采用面向并行的记录机制来区分出原执行中并行执行的访存指令块和非并行执行的指令块,并在重放执行中避免串行执行那些在原执行中并行执行的访存指令块,从而使得重放执行的性能开销小。在多核模拟器Sim-Godson上的仿真实验结果表明:该方法的重放速度快,其性能开销仅为2%左右。此外,该方法还具有硬件支持简单特点,未来有望应用于国产多核处理器研制中。  相似文献   

2.
We describe a novel algorithm for two-dimensional phase unwrapping. The technique combines the principles of agglomerative clustering and use of heuristics to construct a discontinuous quality-guided path. Unlike other quality-guided algorithms, which establish the path at the start of the unwrapping process, our technique constructs the path as the unwrapping process evolves. This makes the technique less prone to error propagation, although it presents higher execution times than other existing algorithms. The algorithm reacts satisfactorily to random noise and breaks in the phase distribution. A variation of the algorithm is also presented that considerably reduces the execution time without affecting the results significantly.  相似文献   

3.
A Petri Net model for the evaluation of reliability for the execution of a computer program in a distributed processing system (DPS) is developed. The execution of a program in a DPS may require access to several files residing at different sites and communication paths between several node pairs. The dynamic behaviour of the system under consideration is represented in the form of token movements within the model. Then, by using the reachability, firing, and marking concepts of Petri Nets, an algorithm is developed to study the two important reliability measures, namely, distributed computer program reliability (DCPR) and distributed processing system reliability (DPSR). The proposed algorithm is efficient in the sense that it directly generates all possible sets of path identifiers for the accessibility of a file which resides at more than one place and is required for the execution of a computer program residing at some other place without evaluating minimal S-T-paths for all S-T-connections. The developed algorithm has been implemented on a minicomputer.  相似文献   

4.
Abstract

To efficiently execute a finite element program on a hypercube, we need to map nodes of the corresponding finite element graph to processors of a hypercube such that each processor has approximately the same amount of computational load and the communication among processors is minimized. If the number of nodes of a finite element graph will not be increased during the execution of a program, the mapping only needs to be performed once. However, if a finite element graph is solution‐adaptive, that is, the number of nodes will be increased discretely due to the refinement of some finite elements during the execution of a program, a run‐time load balancing algorithm has to be performed many times in order to balance the computational load of processors while keeping the communication cost as low as possible. In this paper, we propose a parallel iterative load balancing algorithm (ILB) to deal with the load imbalancing problem of a solution‐adaptive finite element program. The proposed algorithm has three properties. First, the algorithm is simple and easy to be implemented. Second, the execution of the algorithm is fast. Third, it guarantees that the computational load will be balanced after the execution of the algorithm. We have implemented the proposed algorithm along with two parallel mapping algorithms, parallel orthogonal recursive bisection (ORB) [19] and parallel recursive mincut bipartitioning (MC) [8], on a 16‐node NCUBE‐2. Three criteria, the execution time of load balancing algorithms, the computation time of an application program under different load balancing algorithms, and the total execution time of an application program (under several refinement phases) are used for performance evaluation. Experimental results show that (1) the execution time of ILB is very short compared to those of MC and ORB; (2) the mappings produced by ILB are better than those of ORB and MC; and (3) the speedups produced by ILB are better than those of ORB and MC.  相似文献   

5.
In view of the satellite cloud-derived wind inversion has the characteristics of large scale, intensive computing and time-consuming serial inversion algorithm is very difficult to break through the bottleneck of efficiency. We proposed a parallel acceleration scheme of cloud-derived wind inversion algorithm based on MPI cluster parallel technique in this paper. The divide-and-conquer idea, assigning winds vector inversion tasks to each computing unit, is identified according to a certain strategy. Each computing unit executes the assigned tasks in parallel, namely divide-and-rule the inversion task, so as to reduce the efficiency bottleneck of long inversion time caused by serial time accumulation. In the scheme of parallel acceleration based on MPI cluster, an algorithm based on performance prediction is proposed to effectively implement load balance of MPI clusters. Through the comparative analysis of experiment data using the parallel scheme of this parallel technology framework, it shows that this parallel technology has a certain acceleration effect on the cloud-derived wind inversion algorithm. The speedup of the MPI-based parallel algorithm reaches 14.96, which achieved the expected estimate. At the same time, this paper also proposes an efficiency optimization algorithm for cloud-derived wind inversion. In the case that the inversion of wind vector accuracy loss is minimal, the optimized algorithm execution time can be up to 13 times faster.  相似文献   

6.
Motivated by the high yield variability in the semiconductor industry where the quality of the end products is uncertain and is graded into one of several quality levels according to performance before being shipped. We consider a dynamic multi-period yield management problem of a two-stage make-to-stock system faced by a semiconductor manufacturing firm. In the first stage, the firm invests in raw material before any actual demand is known, and produces multiple types of products with random yield rates because of the presence of randomness in the process. In the second stage, products are classified into different classes by quality, and allocated in a number of sequential periods. Demand is also random and can be classified into multiple classes corresponding to product levels. Demands can be upgraded when one type of product has been depleted. This paper presents a multi-period, multi-product, downward substitution model to determine the optimal production input and allocation of the different products to satisfy demands. The production and allocation problem is modelled as a stochastic dynamic program in which the objective is to maximise the profit of the firm. We show that the simple one step upgrade substitution policy is optimal, and the objective function is concave in production input. An iterative algorithm is designed to find the optimal production input and numerical experiments are used to illustrate its effectiveness.  相似文献   

7.
《成像科学杂志》2013,61(6):348-362
Abstract

SOM-based image quantisation requires a considerable amount of processing time even during the pixel mapping stage. Basically, a full search algorithm is employed to find a codeword, within a codebook, whose distance to the queried pixel is minimum. In this paper, we present a novel approach to accelerate the pixel mapping stage by utilisation of the spatial redundancy of pixels in the image and the inherent topological preservation nature of the resulting codebook. The experimental results confirm that the proposed approach outperforms ordinary solutions and is comparable to state-of-the-art solutions in terms of execution time. In addition, as the proposed approach does not require codebook sorting and a complex data structure with variable sizes, this simplifies its implementation and makes it feasible for hardware realisation.  相似文献   

8.
This paper presents a novel technique for improving the runtime of metaheuristic search optimisations. The technique was applied on a new practical problem: several-release-problem (SRP) that characterises the modern industry. Many modern products are replaced by their next version due to incessant R&D activity, resulting in a short marketable life length. There are numerous such examples including the automotive industry, electronic devices and software products. These intermediate releases enable organisations to maximise their value for a given investment. The challenge faced by the industry is to decide which features to include in which version. The paper proves that SRP is NP-hard, thus cannot be solved practically using analytical approaches. A near-optimal, simple technique for determining the feature content of all version releases of the planning horizon is presented. The innovative approach utilises techniques adopted from the clustering domain to enhance the optimisation. The clustering enables skipping significant amounts of unattractive zones of the space. Verification and validation of the proposed technique are presented. The paper compares different heuristics and the shows that embedding the suggested clustering into general methods, yields significantly shorter runtime, and improves the solution’s quality. The enhancement technique can be applied to other combinatorial problems and metaheuristics.  相似文献   

9.
Magnetic resonance imaging (MRI) brain image segmentation is essential at preliminary stage in the neuroscience research and computer‐aided diagnosis. However, presence of noise and intensity inhomogeneity in MRI brain images leads to improper segmentation. The fuzzy entropy clustering (FEC) is often used to deal with noisy data. One major disadvantage of the FEC algorithm is that it does not consider the local spatial information. In this article, we have proposed an improved fuzzy entropy clustering (IFEC) algorithm by introducing a new fuzzy factor, which incorporates both local spatial and gray‐level information. The IFEC algorithm is insensitive to noise, preserves the image detail during clustering, and is free of parameter selection. The efficacy of IFEC algorithm is demonstrated by comparing it quantitatively with the state‐of‐the‐art segmentation approaches in terms of similarity index on publically available real and simulated MRI brain images.  相似文献   

10.
In this paper, we present a meta-heuristic algorithm for the resource-constrained project scheduling problem with discounted cash flows. We assume fixed payments associated with the execution of project activities and develop a heuristic optimisation procedure to maximise the net present value of a project subject to the precedence and renewable resource constraints. We investigate the use of a bi-directional generation scheme and a recursive forward/backward improvement method from literature and embed them in a meta-heuristic scatter search framework. We generate a large dataset of project instances under a controlled design and report detailed computational results. The solutions and project instances can be downloaded from a website in order to facilitate comparison with future research attempts.  相似文献   

11.
Aim to that Neutrosophic C-mean clustering segmentation does not consider the membership distribution of every sample point to different classes. Herein, an image-segmentation algorithm based on wavelet and data-driven neutrosophic fuzzy clustering is proposed. When the maximum membership value of a sample point is far greater than other membership values, the centre of the class with the maximum membership value is taken as the centre of the fuzzy class. Otherwise, the average value of the centre of the two classes with the highest and second-highest membership values is used as the centre of the fuzzy class. In the preprocessing stage, wavelet technology is used to remove noise from the processed image, and the improved Bayesian algorithm is employed to calculate the filter threshold. The experiment results for synthetic and natural images show that the proposed method is more accurate and effective than the existing methods.  相似文献   

12.
For the safety and the control of a nuclear power plant it is necessary to simulate the constituent processes on a computer system. The three‐dimensional multigroup neutron diffusion equations are commonly used to describe the nuclear fission in the reactor core. They form a complicated system of coupled parabolic partial differential equations (PDEs) whose solution can involve very intensive computing. In this paper this system of PDEs is discretized using a special cell‐centred mixed finite volume method (NEM‐M0) in space, and a method that combines Crank–Nicholson and the BDF(2)‐method in time. The linear equation systems which arise are solved with multi‐grid as well as with preconditioned BiCGStab. The kernel of both solution methods is an effective Block‐SOR method that makes use of the particular structure of the linear equation systems. The parallelization strategy is based on a grid partitioning that distributes the data and the work homogeneously on the processors. Finally, the program was tested for three typical reactor simulation problems on grids with differing coarseness. The speedup achieved by parallelizing multi‐grid and preconditioned Bi‐CGStab was outstanding for all examples; even superlinear in some cases. Moreover, the parallel execution times were better than the parallel execution times of other established reactor simulation codes. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

13.
廖波 《工业工程》2011,14(1):53-57
针对传统调度算法寻优效率低的弱点,从MES功能出发,将其调度功能单独抽出,提出了基于聚类的粒子群优化算法,将聚类用于粒子群搜索空间的改进。仿真结果表明了该算法的有效性。  相似文献   

14.
Gutmann B  Weber H 《Applied optics》1999,38(26):5577-5593
The branch-cut method is a powerful tool for correct unwrapping of phase maps in optical metrology. However, this method encounters the problem of the correct setting of the cuts, which belongs to the class of nondeterministic-polynomial-time-complete problems. Simulated annealing is an algorithm used to solve problems of this kind in a polynomial-time execution. However, the algorithm still requires an enormous calculation time if the number of discontinuity sources and thus the number of branch cuts is high. We illustrate the motivation for the use of this algorithm and show how the running time can be severely reduced by use of reverse simulated annealing, starting from the nearest-neighbor solution to find a proper initial configuration, and by clustering of discontinuity sources.  相似文献   

15.
A parallel implementation of the contact algorithm discussed in Part I of this paper has been developed for a non-linear dynamic explicit finite element program to analyse the response of three-dimensional shell structures. The parallel contact algorithm takes advantage of the fact that in general only some parts of the structure will actually be in contact at any given time. Special interprocessor communication routines and a method which enables individual processors to dynamically build local contact domains during execution have been developed. The performance of the parallel contact algorithm has been studied by executing the program on 128 processors of a distributed-memory multiprocessor computer.  相似文献   

16.
In this article, we examine the use of several segmentation algorithms for medical image classification. This work detects the cancer region from magnetic resonance (MR) images in earlier stage. This is accomplished in three stages. In first stage, four kinds of region‐based segmentation techniques are used such as K‐means clustering algorithm, expectation–maximization algorithm, partial swarm optimization algorithm, and fuzzy c‐means algorithm. In second stage, 18 texture features are extracting using gray level co‐occurrence matrix (GLCM). In stage three, classification is based on multi‐class support vector machine (SVM) classifier. Finally, the performance analysis of SVM classifier is analyzed using the four types of segmentation algorithm for a group of 200 patients (32—Glioma, 32—Meningioma, 44—Metastasis, 8—Astrocytoma, 72—Normal). The experimental results indicate that EM is an efficient segmentation method with 100% accuracy. In SVM, quadratic and RBF (σ = 0.5) kernel methods provide the highest classification accuracy compared to all other SVM kernel methods. © 2016 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 26, 196–208, 2016  相似文献   

17.
基于减法聚类算法的视频运动目标定位   总被引:3,自引:2,他引:1  
针对视频运动目标定位的需要,本文给出了一种新的视频运动目标定位方法.该方法运用减法聚类算法对视频运动目标进行定位.分析了减法聚类算法的原理,给出了减法聚类算法的公式推导,目标定位的实现步骤及流程框图.研究了本文方法对不同类型视频运动目标的定位效果,并与基于区域生长的定位方法进行了详细比较.结合实验数据说明了本文方法的定位过程、处理时间及抗噪性能.实验结果表明,本文方法适用于待定位视频序列二值图像存在较大噪声斑点或空域连通特性较差的场合.  相似文献   

18.
王芳  王敬儒  张启衡 《光电工程》2004,31(Z1):22-25
利用 MATLAB/SIMUNKK 的代码产生原理,针对以 c6201 DSP 为核心的图像处理平台,开发目标跟踪实时仿真库,建立了目标跟踪实时仿真系统。在此环境下可将算法模型快速生成 DSP可执行程序,并可直接运行。此环境为算法研究者建立了算法到 DSP 程序开发的桥梁,使算法快速得到实时验证,大大节省 DSP 程序移植的时间。  相似文献   

19.
In this paper, a parallel Newton-Raphson algorithm with domain decomposition is developed to solve fully coupled heat, water and gas flow in deformable porous media. The model makes use of the modified effective stress concept together with the capillary pressure relationship. Phase change and latent heat transfer are also taken into account. The chosen macroscopic field variables are displacement, capillary pressure, gas pressure and temperature. The parallel program is developed on a cluster of workstations. The PVM (Parallel Virtual Machine) system is used to handle communications among networked workstations. An implementation of this parallel method on workstations is discussed, the speedup and efficiency of this method being demonstrated by numerical examples.  相似文献   

20.
为了克服现有多版本并发控制(MVCC)进行数据的并发访问控制中短暂阻塞的缺点,达到读写完全并发,提出了一种基于写时复制的多版本并发B+tree(BCMVBT)索引结构。BCMVBT通过复制分离读写的操作空间以使读写事务在任意时刻完全并发执行,规避比较与交换(CAS)操作带来的高CPU消耗,达到一写多读场景下的完全并发。同时针对现有多版本开发B+tree(MVBT)范围查询的复杂操作,提出了无锁的BCMVBT的范围查询算法和回收机制,从而实现了索引的插入、查询、更新与回收的无锁并发操作。通过与事务型MVBT(transaction MVBT)的对比,在读写并发环境下BCMVBT的时间消耗降低了50%,实验进一步表明BCMVBT在大事务的场景下具有更高的优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号