首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Amarasinghe  S.K. Goh  W.L. He  S. 《Electronics letters》1998,34(20):1924-1925
A new tree decomposition strategy, EYBWC (enhanced young brothers wait concept), is proposed for two-player control game tree searching on massively parallel machines. Based on the YBWC (young brothers wait concept) which is one of the best existing tree decomposition strategies, this new strategy notably alleviates processor starvation in a massively parallel search by providing much many nodes for processors to search in parallel. An experiment on the Maspar MP-2 SIMD (single instruction multiple data) machine with 1024 processors shows that EYBWC is far more efficient than YBWC  相似文献   

2.
Genetic algorithms are on the rise in electromagnetics as design tools and problem solvers because of their versatility and ability to optimize in complex multimodal search spaces. This paper describes the basic genetic algorithm and recounts its history in the electromagnetics literature. Also, the application of advanced genetic operators to the field of electromagnetics is described, and design results are presented for a number of different applications  相似文献   

3.
A modular massively parallel computing (Modular-MPC) philosophy to image-related processing is discussed in this paper The approach is based on application-specific configurations of generic fine-grain Single Instruction stream operating on Multiple Data (SIMD) streams massively parallel computing modules to achieve high performance and maximal flexibility and programmability while remaining cost-effective. The need for a software architecture to allow programming such systems is highlighted and the implementations on current Modular-MPC systems are described. The experience with ASTRA, a Modular-MPC testbed system, in image related application development and system performance has led to a technology road-map using VLSI, MCM, and monolithic WSI technologies aiming at `future proofing' of the Modular-MPC concept and systems achieving T (1012) operations per second performance. This experience and progress are discussed together with the implementation of three image-related processing applications  相似文献   

4.
One of the most important steps in the development of any numerical code is the validation of the implementation by comparison of the results obtained for a set of test cases to the exact solution. In the context of codes developed for high-frequency electromagnetics, this usually means comparing computed results to analytical solutions. Obtaining these analytical solutions can be a nontrivial problem, although fortunately it need only be implemented once, and can then be used repeatedly to validate any new code. This paper concentrates on finding the analytical solution to eigenvalue problems for a range of standard geometries, as well as the near-field solution for plane-wave scattering from a PEC sphere. The solutions are implemented using the Python programming language and the SciPy library of scientific functions.  相似文献   

5.
An efficient two-dimensional finite difference time domain (2-D-FDTD) method combined with time signal prediction technique has been proposed for the frequency-dependent parameters computation of on-chip interconnects in high-speed integrated circuits (ICs). A graded mesh algorithm and lossy absorbing boundary condition are proposed and adopted in the 2-D FDTD analysis to reduce the number of spatial grid points in the simulation region. The introduction of time signal prediction technique to predict the future signal in the time domain or extract the parameters in the frequency domain of uniform transmission lines reduces the computation time drastically. With these, the substrate and conductor losses are both included in one analysis. This algorithm leads to a significant reduction in CPU time and storage requirements as compared with the conventional FDTD. The simulation results are in good agreement with the results obtained by other methods and measurements  相似文献   

6.
We present a framework for solving logical topology design (LTD) problems in a constrained amount of computation time. Our framework uses a search space dimensionality (SSD) reduction technique that exploits a tradeoff between computation time and solution quality. We have demonstrated that our framework offers improved solution quality in comparison to an existing SSD reduction technique reported in the literature.  相似文献   

7.
This paper describes the Sarnoff Engine, a 1.6 TeraFLOP, real-time, high definition, video and image processing computer. It is a second generation, scalable, linear array, multiuser, MIMD architecture focused on the applications of real-time high definition video and image processing (data compression encoders and decoders), 3D/4D data visualization, and neural network development.  相似文献   

8.
Derived from a proposed universal mathematical expression, this paper investigates a novel algorithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit sequence step by step and suits to various argument selections of CRC computation. The algorithm proposed is quite suitable for hardware implementation. The simulation implementation and performance analysis suggest that it could efficiently speed up the computation compared with the conventional ones. The algorithm is implemented in hardware at as high as 21Gbps, and its usefulness in high-speed CRC computations is implied, such as Asynchronous Transfer Mode (ATM) networks and 10G Ethernet.  相似文献   

9.
高性能巨型计算机的快速发展正在改变着人们在计算电磁学方面的一些传统观念,特别是IBM BlueGene/L巨型计算机的出现使计算电磁学所解决的问题尺寸和时间发生了巨大的变化.IBM BlueGene/L巨型计算机可以包括多达65,536处理器和 32 TB 内存,更重要的是由于它所使用的特殊体系结构使它在4000个处理器时时域有限差分程序的并行效率仍然在百分之九十左右.测试显示,在单个CPU速度接近的情况下,一台奔腾4计算机上运行五十二天的工作量在一个包含有4000个处理器的BlueGene/L巨型计算机上仅需10min左右.虽然普通的PC机群与单个处理器相比也能快速地求解相对大的问题,但是无论是使用千兆 Ethernet、Foundry、Myrinet或者Infiniband,普通PC机群的效率都会在处理其数量超过几十个的时候快速下降.为了验证并行时有预先差分程序的正确性,我们使用并行时域有限差分程序在巨型计算机IBM BlueGene/L上模拟一个144单元的对偶极化Vivaldi阵列的抛物面天线馈源.  相似文献   

10.
A parallel processing architecture based on multiple channel optical communication is described and compared with existing interconnection strategies for parallel computers. The proposed multiple channel architecture (MCA) provides a large number of independent, selectable channels (or virtual buses) using a single optical fiber. Arbitrary interconnection patterns and machine partitions can be emulated via appropriate channel assignments. Hierarchies of parallel architectures and simultaneous execution of parallel tasks are also possible. The authors describe previous attempts in processor, memory, and input/output device interconnection, a basic overview of the proposed architecture, various channel allocation strategies that can be utilized by the MCA, and a summary of advantages of the MCA compared with traditional interconnection techniques  相似文献   

11.
并行计算为时域有限差分(FDTD)方法仿真电大尺寸和复杂结构提供了强大的计算能力和内存资源。文章针对多核PC集群系统,提出了一种高性能并行FDTD算法,它采用Windows Socket(WinSock)实现高效的进程间通信,同时采用多线程技术充分利用多核处理器资源。在集群系统上的实际测试表明:以10个处理器(30个核)为例,该算法获得的加速比为16.0,并行效率为53.3%,优于单独使用消息传递接口(MPI)以及MPI结合OpenMP的传统FDTD并行算法,后两者在相同测试条件下仅分别获得13.7,12.2的加速比和45.8%,40.7%的并行效率。  相似文献   

12.
13.
Communication efficiency is one of the keys to the broad success of parallel computation, as one can see by looking at the successes of parallel computation, which are currently limited to applications that have small communication requirements, or applications that use a small number of processors. In order to use fine grain parallel computation for a broader range of applications, efficient algorithms to execute the underlying interprocessor communications have to be developed. In this paper we survey several generic static and dynamic communication problems that are important for parallel computation, and present some general methodologies for addressing these problems. Our objective is to obtain a collection of communication algorithms to execute certain prototype communication tasks that arise often in applications. These algorithms can be called as communication primitives by the programmer or the compiler of a multiprocessor computer, in the same way that subroutines implementing standard functions are called from a library of functions in a conventional computer. We discuss both algorithms to execute static (deterministic) primitive communication tasks, as well as schemes that are appropriate for dynamic (stochastic) environments. Our emphasis is on algorithms that apply to many similar problems and can be used in various network topologies. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

14.
A new interconnection network for massively parallel computing is introduced. This network is called an optical multi-mesh hypercube (OMMH) network. The OMMH integrates positive features of both hypercube (small diameter, high connectivity, symmetry, simple control and routing, fault tolerance, etc.) and mesh (constant node degree and scalability) topologies and at the same time circumvents their limitations (e.g., the lack of scalability of hypercubes, and the large diameter of meshes). The OMMH can maintain a constant node degree regardless of the increase in the network size. In addition, the flexibility of the OMMH network makes it well suited for optical implementations. This paper presents the OMMH topology, analyzes its architectural properties and potentials for massively parallel computing, and compares it to the hypercube. Moreover, it also presents a three-dimensional optical design methodology based on free-space optics. The proposed optical implementation has totally space-invariant connection patterns at every node, which enables the OMMH to be highly amenable to optical implementation using simple and efficient large space-bandwidth product space-invariant optical elements  相似文献   

15.
Due to the presence of the natural magnetic field, the ionosphere surrounding the earth is a gyrotropic medium. This paper presents a finite-difference time-domain scheme that can deal with such an anisotropic medium, allowing the propagation of VLF-LF radiowaves to be computed in the waveguiding structure composed of the earth surface and the ionosphere. The numerical scheme is described in detail, with a special emphasis on the problem of the numerical stability.  相似文献   

16.
A formulation of the finite-difference time-domain (FDTD) equations as a system of linear equations in matricial form is used to develop a novel parallel algorithm that solves the variables of a 2-D FDTD simulation using less memory than required by the common FDTD algorithm, at the cost of some increase in the number of operations. For the sake of speeding-up the simulation of urban channels for personal wireless communications, the geometry of those channels and its relation with a numerical Green's function is used to increase the speed of this algorithm. Simulations are carried out in order to propose a FDTD-based model of urban microcells and their delay profiles, and some comparisons against uniform theory of diffraction based models are discussed. Finally, propagation of time division multiple access and code division multiple access signals of high bit rate in those channels are simulated, and the results allows us to evaluate the effect of multipath interference in high speed wireless communications.  相似文献   

17.
Kurosh Madani 《电信纪事》1993,48(11-12):537-545
The increase in integration density and in complexity of moderns integrated circuits and systems revealed the necessity to consider the testability problem at the design level of circuits. One of the most active research areas in circuits design, over the past decade, has been the implementation of neural networks as electronic VLSI chips. Especially, the implementation of artificial neural networks (ANN) as CMOS integrated circuits shows several attractive features. Recent studies point out that classification is their most successful application field, and thus large networks will be required. Unfortunately, very few papers analyse the testability of electronic implementation of artificial neural networks. A large number of artificial neural networks models deal with binary output neurones. This paper presents and discuss a global current measurement based pseudo-analogue technique for digital-output electronic neural networks testing. Two approaches have been presented and their limitations have been discussed. Simulation results and a method validation test circuit have been presented.  相似文献   

18.
This paper presents a novel and fast scheme for signal denoising in the wavelet domain. It exploits the time scale structure of the wavelet coefficients by modeling them as superposition of simple atoms, whose spreading in the time scale plane formally is the solution of a couple of differential equations. In this paper, we will show how the numerical solution of such equations can be avoided leading to a speed up of the scale linking computation. This result is achieved through a suitable projection space of the wavelet local extrema, requiring just least squares and filtering operations. Intensive experimental results show the competitive performances of the proposed approach in terms of signal to noise ratio (SNR), visual quality and computing time.  相似文献   

19.
With the increasing number of processor cores available in modern computing architectures, task or data parallelism is required to maximally exploit the available hardware and achieve optimal processing speed. Current state-of-the-art data-parallel processing methods for decoding image and video bitstreams are limited in parallelism by dependencies introduced by the coding tools and the number of synchronization points introduced by these dependencies, only allowing task or coarse-grain data parallelism. In particular, entropy decoding and data prediction are bottleneck coding tools for parallel image and video decoding. We propose a new data-parallel processing scheme for block-based intra sample and coefficient prediction that allows fine-grain parallelism and is suitable for integration in current and future state-of-the-art image and video codecs. Our prediction scheme enables maximum concurrency, independent of slice or tile configuration, while minimizing synchronization points. This paper describes our data-parallel processing scheme for one- and two-dimensional prediction and investigates its application to block-based image and video codecs using JPEG XR and H.264/AVC Intra as a starting point. We show how our scheme enables faster decoding than the state-of-the-art wavefront method with speedup factors of up to 21.5 and 7.9 for JPEG XR and H.264/AVC Intra coding tools respectively. Using the H.264/AVC Intra coding tool, we discuss the requirements of the algorithm and the impact on decoded image quality when these requirements are not met. Finally, we discuss the impact on coding rate in order to allow for optimal parallel intra decoding.  相似文献   

20.
A new two-dimensional (2-D) finite-difference time domain (FDTD) method applied to scattering by infinite objects with oblique incidence is proposed. 2-D Maxwell's equations, differential equations, and perfectly matched layer (PML) absorbing boundary conditions (ABC) are derived. The incident wave, computed by the 1-D FDTD method, is set on the connecting boundary. The accuracy and the efficiency of the proposed method have been verified by comparing the results of the split-field periodic FDTD method, the sine-cosine method, and the transmission line theory method with the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号