首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
The internationalization of the software talents is the trend of software industry development.This paper summarizes the exploration and practice of international talents cultivation of the School of Software in the University of Electronic Science and Technology of China(UESTC).The mode of a combination of "awareness,capacity and practice" is put forward.  相似文献   

2.
A fuzzy approach to perform diagnosis of fuzzy discrete event systems(FDESs)is proposed by constructing diagnosers,which may more effectively cope with the problems of vagueness and fuzziness arising from failure diagnosis of fuzzy systems.However,the complexity of constructing this kind of diagnosers is exponential in the state space and the number of fuzzy events of the system.In this paper,we present an algorithm for verifying the diagnosability of FDESs based on the construction of a nondeterministic automaton called F-verifier instead of diagnosers.Both the construction of F-verifiers and the verification of diagnosability of FDESs can be realized with a polynomial-time complexity.  相似文献   

3.
This paper is devoted to some studies on approaches of fuzzy comprehensive estimation of an information system. The redundant or insignificant attributes in fuzzy comprehensive evaluation data sets are removed to reduce knowledge expression of the system based on rough set. The significance of condition attributes is used for setting up a weight distribution of fuzzy evaluation so that some undesirable influences of the weights subjectively defined are eliminated. The precision of the comprehensive evaluation of a system reduced is realized by an approach of fuzzy comprehensive estimation on rough set. The feasibility of fuzzy comprehensive estimation proposed is shown by some of examples of planting rubber here.  相似文献   

4.
Enterprise architecture is a subject that has increased its importance in the small and medium enterprises in the manufacturing sector of the industry in Mexico. The global competitiveness of the markets has influenced the adoption of methodologies that support the strategic alignment of the processes with the goals and strategic objectives of the firms. The components of the business architecture like mission, vision, strategic objectives, products, organizational structure, business prdcesses, clients and geographic region, were collected from the firm of the case study for the design of the architecture. As a result of the practical application, an implementation model has been created and four strategic objectives were established for to improve productivity and competitiveness. This paper is a result of the architecture in a medium size manufacturing company like partial research project of analysis, design and implementing business architecture of an enterprise using ontologies for representing the core elements of the business architecture; the study presents clearly the importance of the strategic planning for the analysis and the detection of the main faults for the success of the achievement of goals and objectives.  相似文献   

5.
This paper provides a fast algorithm for Grobnerbases of homogenous ideals of F[x, y] over a finite field F. We show that only the 8-polynomials of neighbor pairs of a strictly ordered finite homogenours generating set are needed in the computing of a Grobner base of the homogenous ideal. It reduces dramatically the number of unnecessary 5-polynomials that are processed. We also show that the computational complexity of our new algorithm is O(N^2), where N is the maximum degree of the input generating polynomials. The new algorithm can be used to solve a problem of blind recognition of convolutional codes. This problem is a new generalization of the important problem of synthesis of a linear recurring sequence.  相似文献   

6.
Based on the necessity of three dimensional modeling with computer in teaching reform,this paper is the summarization of reform practice of teaching engineering drawing in our institute.The teaching reform begins with three dimensional modeling that used computer instead of board.On the basis of target of teaching reform,set of teaching content,arrangement of class hour and teaching method,the research of teaching practice have been done,and very good effects in teaching of engineering drawing have been achieved.  相似文献   

7.
The education of software talent is the basis of the development of China's software industry.This article demonstrates the software training goals for undergraduate students in the School of Software Engineering of Chongqing University,and put forwards the education program of the software talent based on project-driven.The program aims to cultivate software talents of "compound,application-oriented,international" characteristics to meet the development of China's software industry through a series of integrated courses on project practice and training.  相似文献   

8.
In these latter days software agents are used for the development and implementation of intellectual decision support systems. In order to implement intelligence in a system some or several dozen of software agents are used and the made system becomes multi-agent. For the development of these systems a set of methodologies, i.e., the sequence of consequent steps of analysis, designing and implementation, is offered. The carried out analysis of the methodologies showed that as a rule they are limited by the spectrum of their pending problem (within the pales of the requirements of specific applied task, within the pales of the possibilities of technical implementation) or within the pales of amount of detail. The variety of methodologies is influenced by the fact that for the development of these systems the requirements and attitudes are offered by the specialists of related spheres such as software, numeral intellect engineers. In the course of the development of hardware and software appeared possibilities to implement mobile multi-agents systems, however, there is no one united mobile multi-agent systems design methodology, whereas existing systems are underdeveloped and their number is small. In this article we introduce the course of the designing of an intellectual real time multi-agent investment management decision support information system adapting and combining some methodologies where the choice to use either communicating or mobile agents is the question of rather technical implementation than methodological. In the article we introduce two ways of system implementation by JADE platform: the first one-using communicating agents, and the second one-using mobile agents.  相似文献   

9.
It is suggested to teach mathematics for engineering and science students as exploration of mathematics-related classes. Similarity with classes and objects of object-oriented programming is demonstrated. In the framework of the suggested approach, each relatively self-contained unit of mathematics curriculum is assigned a data type and is considered a class. In such setting, a theorem proof may be viewed as an assignment of values to object properties. The approach augments the role of recognition of mathematical objects, their properties and methods (operations) and diminishes the value of comprehensive study of rigorous proofs. The approach emphasizes the importance of development of mathematical intuition and combines conceptual and operational approaches to teaching and learning mathematics. Prospective implementation assumes using of computer-based systems of formal proof.  相似文献   

10.
An algorithm to compute maximal contractions for Horn clauses   总被引:2,自引:0,他引:2  
In the theory of belief revision, the computation of all maximal subsets (maximal contractions) of a formula set with respect to a set of facts is one of the key problems. In this paper, we try to solve this problem by studying the algorithm to compute all maximal contractions for Horn clauses. First, we point out and prove the conversion relationship between minimal inconsistent subsets of union of the formula set and the set of facts and maximal contractions of the formula set with respect to th...  相似文献   

11.
We present an efficient implementation of 7-point and 27-point stencils on high-end Nvidia GPUs. A new method of reading the data from the global memory to the shared memory of thread blocks is developed. The method avoids conditional statements and requires only two coalesced instructions to load the tile data with the halo (ghost zone). Additional optimizations include storing only one XY tile of data at a time in the shared memory to lower shared memory requirements, common subexpression elimination to reduce the number of instructions, and software prefetching to overlap arithmetic and memory instructions, and enhance latency hiding. The efficiency of our implementation is analyzed using a simple stencil memory footprint model that takes into account the actual halo overhead due to the minimum memory transaction size on the GPUs. Through experiments we demonstrate that in our implementation the memory overhead due to the halos is largely eliminated by good reuse of the halo data in the memory caches, and that our method of reading the data is close to optimal in terms of memory bandwidth usage. Detailed performance analysis for single precision stencil computations, and performance results for single and double precision arithmetic on two Tesla cards are presented. Our stencil implementations are more efficient than any other implementation described in the literature to date. On Tesla C2050 with single and double precision arithmetic our 7-point stencil achieves an average throughput of 12.3 and 6.5 Gpts/s, respectively (98 GFLOP/s and 52 GFLOP/s, respectively). The symmetric 27-point stencil sustains a throughput of 10.9 and 5.8 Gpts/s, respectively.  相似文献   

12.
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.  相似文献   

13.
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines, where the exploitation of the memory hierarchy is critical to achieving high performance. Iterative data parallel loops with near-neighbor communication account for many important numerical applications. In such loops, the communication of partial results stresses the memory system performance. In this paper, we develop data placement schemes that minimize communication time where the near-neighbor interaction is determined by a stencil. Under a given loop partition, our compile-time algorithm partitions global data into four classes for each processor, with each class requiring specific consistency maintenance requirements. The ADAPT (Automatic Data Allocation and Partitioning Tool) system was implemented to automatically partition parallel code segments for the BBN TC2000, a scalable shared-memory multiprocessor. ADAPT caches global arrays and maintains data consistency in software through instructions that flush data from private caches. Restructuring of a fluid flow code segment by ADAPT improved performance by a factor of more than 3 on the BBN TC2000. Features in current generation pipelined processors with multiple functional units permit the overlap of memory accesses with computation. Our experiments on the BBN TC2000 show that the degree of overlap is limited by architectural parameters, such as the number of CPU registers.  相似文献   

14.
A CPU-GPU hybrid approach for the unsymmetric multifrontal method   总被引:1,自引:0,他引:1  
Multifrontal is an efficient direct method for solving large-scale sparse and unsymmetric linear systems. The method transforms a large sparse matrix factorization process into a sequence of factorizations involving smaller dense frontal matrices. Some of these dense operations can be accelerated by using a graphic processing unit (GPU). We analyze the unsymmetric multifrontal method from both an algorithmic and implementational perspective to see how a GPU, in particular the NVIDIA Tesla C2070, can be used to accelerate the computations. Our main accelerating strategies include (i) performing BLAS on both CPU and GPU, (ii) improving the communication efficiency between the CPU and GPU by using page-locked memory, zero-copy memory, and asynchronous memory copy, and (iii) a modified algorithm that reuses the memory between different GPU tasks and sets thresholds to determine whether certain tasks be performed on the GPU. The proposed acceleration strategies are implemented by modifying UMFPACK, which is an unsymmetric multifrontal linear system solver. Numerical results show that the CPU-GPU hybrid approach can accelerate the unsymmetric multifrontal solver, especially for computationally expensive problems.  相似文献   

15.
The most commonly used approach for solving reaction–diffusion systems relies upon stencil computations. Although stencil computations feature low compute intensity, they place high demands on memory bandwidth. Fortunately, GPU computing allows for the heavy reliance of stencil computations on neighboring data points to be exploited to significantly increase simulation speeds by reducing these memory bandwidth demands. Upon reviewing previously published works, a wide-variety of efforts have been made to optimize NVIDIA CUDA-based stencil computations. However, a critical aspect contributing to algorithm performance is commonly glossed over: the halo region loading technique utilized in conjunction with a given spatial blocking technique. This paper presents an in-depth examination of this aspect and the associated single iteration performance impacts when using symmetric, nearest neighbor 19-point stencils. This is accomplished by closely examining how the simulated space is partitioned into thread blocks and the balance between memory accesses, divergence, and computing threads. The resulting optimization strategy for accelerating 3-dimensional reaction–diffusion simulations offers up to 2.45 times speedup for single-precision floating point numbers in reference to GPU-based speedups found within the previously published work that this paper directly extends. In reference to our multithreaded CPU-based implementation, the resulting optimization strategy offers up to 8.69 times speedup for single-precision floating point numbers.  相似文献   

16.
We present a multigrid approach for simulating elastic deformable objects in real time on recent NVIDIA GPU architectures. To accurately simulate large deformations we consider the co-rotated strain formulation. Our method is based on a finite element discretization of the deformable object using hexahedra. It draws upon recent work on multigrid schemes for the efficient numerical solution of partial differential equations on such discretizations. Due to the regular shape of the numerical stencil induced by the hexahedral regime, and since we use matrix-free formulations of all multigrid steps, computations and data layout can be restructured to avoid execution divergence of parallel running threads and to enable coalescing of memory accesses into single memory transactions. This enables to effectively exploit the GPU’s parallel processing units and high memory bandwidth via the CUDA parallel programming API. We demonstrate performance gains of up to a factor of 27 and 4 compared to a highly optimized CPU implementation on a single CPU core and 8 CPU cores, respectively. For hexahedral models consisting of as many as 269,000 elements our approach achieves physics-based simulation at 11 time steps per second.  相似文献   

17.
针对基于立体视觉的直升机旋翼桨叶挥舞角测量CPU串行算法耗时多、效率不高的问题,利用图像处理单元(GPU)并行计算的优势,提出一种基于CUDA统一计算设备构架的并行处理快速算法.首先,对算法中最耗时的图像去噪、阈值分割、连通域标记三部分进行并行化设计;然后,采用多层次并行策略将大量密集运算分配到不同的图像处理单元上并行执行,利用共享内存和共享寄存器加速数据访问;最后,进行多次测量实验,结果表明该方法执行效率明显高于CPU串行方法,可满足旋翼桨叶挥舞角快速测量的要求.  相似文献   

18.
Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in “flat” three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction (NWP) models, and equations of a very similar structure arise in global ocean models, subsurface flow simulations and gas and oil reservoir modelling. The elliptic solve is often the bottleneck of the forecast, and to meet operational requirements an algorithmically optimal method has to be used and implemented efficiently. Graphics Processing Units (GPUs) have been shown to be highly efficient (both in terms of absolute performance and power consumption) for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures. In this article we describe the GPU implementation and optimisation of a Preconditioned Conjugate Gradient (PCG) algorithm for the solution of a three dimensional anisotropic elliptic PDE for the pressure correction in NWP. Our implementation exploits the strong vertical anisotropy of the elliptic operator in the construction of a suitable preconditioner. As the algorithm is memory bound, performance can be improved significantly by reducing the amount of global memory access. We achieve this by using a matrix-free implementation which does not require explicit storage of the matrix and instead recalculates the local stencil. Global memory access can also be reduced by rewriting the PCG algorithm using loop fusion and we show that this further reduces the runtime on the GPU. We demonstrate the performance of our matrix-free GPU code by comparing it both to a sequential CPU implementation and to a matrix-explicit GPU code which uses existing CUDA libraries. The absolute performance of the algorithm for different problem sizes is quantified in terms of floating point throughput and global memory bandwidth.  相似文献   

19.

Modern computer systems can use different types of hardware acceleration to achieve massive performance improvements. Some accelerators like FPGA and dedicated GPU (dGPU) need optimized data structures for the best performance and often use dedicated memory. In contrast, APUs, which are a combination of a CPU and an integrated GPU (iGPU), support shared memory and allow the iGPU to work together with the CPU on pointer-based data structures. First, we develop an approach for dGPU to accelerate queries in libcuckoo and robin-map and when looking at accelerating insert, updates and erase operations in the original libcuckoo using OneAPI on an APU. We evaluate the dGPU against the CPU variants and our dGPU approach adapted for the CPU and also in a hybrid context by using longer keys on the CPU and shorter keys on the dGPU. In comparison with the original libcuckoo algorithm, our dGPU approach achieves a speed-up of 2.1, and in comparison with the robin-map a speed-up of 1.5. For hybrid workloads, our approach is efficient if long keys are processed on the CPU and short keys are processed on the dGPU. By processing a mixture of 20% long keys on the CPU and 80% short keys on dGPU, our hybrid approach has a 40% higher throughput than the CPU only approach. In addition, we develop a hybrid APU approach for insert, update and erase operations in the original libcuckoo structure focusing on shared memory with iGPU accelerated look-ups of the positions for insert, update and erase operations.

  相似文献   

20.
《Parallel Computing》2014,40(8):425-447
EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components.Using the hybrid OpenMP–OpenCL model of parallel programming opens the way to harness the power of CPU–GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU–GPU platforms. The main contributions of the paper are:
  • •method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations;
  • •method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques;
  • •method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources;
  • •approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs.
Hybrid platforms tested in this study contain different numbers of CPUs and GPUs – from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems – both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号