期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance analysis of SSE and AVX instructions in multi-core CPUs and GPU computing on FDTD scheme for solid and fluid vibration problems

Jorge Francés Sergio Bleda Andrés Márquez Cristian Neipp Sergi Gallego Beatriz Otero Augusto Beléndez 《The Journal of supercomputing》2014,70(2):514-526

In this work a unified treatment of solid and fluid vibration problems is developed by means of the Finite-Difference Time-Domain (FDTD). The scheme here proposed takes advantage from a scaling factor in the velocity fields that improves the performance of the method and the vibration analysis in heterogenous media. Moreover, the scheme has been extended in order to simulate both the propagation in porous media and the lossy solid materials. In order to accurately reproduce the interaction of fluids and solids in FDTD both time and spatial resolutions must be reduced compared with the set up used in acoustic FDTD problems. This aspect implies the use of bigger grids and hence more time and memory resources. For reducing the time simulation costs, FDTD code has been adapted in order to exploit the resources available in modern parallel architectures. For CPUs the implicit usage of the advanced vectorial extensions (AVX) in multi-core CPUs has been considered. In addition, the computation has been distributed along the different cores available by means of OpenMP directives. Graphic Processing Units have been also considered and the degree of improvement achieved by means of this parallel architecture has been compared with the highly-tuned CPU scheme by means of the relative speed up. The speed up obtained by the parallel versions implemented were up to 3 (AVX and OpenMP) and 40 (CUDA) times faster than the best sequential version for CPU that also uses OpenMP with auto-vectorization techniques, but non includes implicitely vectorial instructions. Results obtained with both parallel approaches demonstrate that massive parallel programming techniques are mandatory in solid-vibration problems with FDTD. 相似文献

2.

High performance data clustering: a comparative analysis of performance for GPU,RASC, MPI,and OpenMP implementations

Luobin Yang Steve C. Chiu Wei-Keng Liao Michael A. Thomas 《The Journal of supercomputing》2014,70(1):284-300

Compared to Beowulf clusters and shared-memory machines, GPU and FPGA are emerging alternative architectures that provide massive parallelism and great computational capabilities. These architectures can be utilized to run compute-intensive algorithms to analyze ever-enlarging datasets and provide scalability. In this paper, we present four implementations of K-means data clustering algorithm for different high performance computing platforms. These four implementations include a CUDA implementation for GPUs, a Mitrion C implementation for FPGAs, an MPI implementation for Beowulf compute clusters, and an OpenMP implementation for shared-memory machines. The comparative analyses of the cost of each platform, difficulty level of programming for each platform, and the performance of each implementation are presented. 相似文献

3.

Chart parsing in Prolog

Neil K. Simpkins Peter Hancox 《New Generation Computing》1990,8(2):113-138

相似文献

4.

Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture

Shin-Kai Chen Cheng-Yu Hung Ching-Chih Chen Chih-Wei Liu 《International journal of parallel programming》2014,42(6):875-899

Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation \(^{\circledR }\) 3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at \(1280 \times 960\) resolution, 11.75 fps at \(640 \times 480\) resolution, and 62.52 fps at \(320 \times 240\) resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps. 相似文献

5.

Parallel relaxed and extrapolated algorithms for computing PageRank

Josep Arnal Héctor Migallón Violeta Migallón Juan A. Palomino José Penadés 《The Journal of supercomputing》2014,70(2):637-648

In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers. 相似文献

6.

Exploring aspects of cell intelligence with artificial reaction networks

Claire E. Gerrard John McCall George M. Coghill Christopher Macleod 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(10):1899-1912

The Artificial Reaction Network (ARN) is a Cell Signalling Network inspired connectionist representation belonging to the branch of A-Life known as Artificial Chemistry. Its purpose is to represent chemical circuitry and to explore computational properties responsible for generating emergent high-level behaviour associated with cells. In this paper, the computational mechanisms involved in pattern recognition and spatio-temporal pattern generation are examined in robotic control tasks. The results show that the ARN has application in limbed robotic control and computational functionality in common with Artificial Neural Networks. Like spiking neural models, the ARN can combine pattern recognition and complex temporal control functionality in a single network, however it offers increased flexibility. Furthermore, the results illustrate parallels between emergent neural and cell intelligence. 相似文献

7.

A heuristic method for solving integer-valued decompositional multiindex problems

L. G. Afraimovich 《Automation and Remote Control》2014,75(8):1357-1368

We consider NP-hard integer-valued multiindex problems of transportation type. We distinguish a subclass of polynomially solvable multiindex problems, namely multiindex problems with decomposition structure. We construct a general scheme for a heuristic method to solve a number of similar NP-hard decompositional multiindex problems. For one version of implementation for this scheme, we estimate its deviation from the optimum. We illustrate our results with the example of designing a class schedule. 相似文献

8.

A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming

Steffen Ernsting Herbert Kuchen 《International journal of parallel programming》2014,42(6):968-987

Multi-core processors and clusters of multi-core processors are ubiquitous. They provide scalable performance yet introducing complex and low-level programming models for shared and distributed memory programming. Thus, fully exploiting the potential of shared and distributed memory parallelization can be a tedious and error-prone task: programmers must take care of low-level threading and communication (e.g. message passing) details. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel and distributed programming patterns, thus shielding programmers from low-level aspects of parallel and distributed programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address the hybrid architecture of multi-core clusters we present a two-tier implementation built on top of MPI and OpenMP. On the basis of three benchmark applications, including a simple ray tracer, an interacting particles system, and an application for calculating the Mandelbrot set, we illustrate the advantages of both skeletal programming in general and this two-tier approach in particular. 相似文献

9.

Fabrication of gold nanoparticle pattern using imprinted hydrogen silsesquioxane pattern for surface-enhanced Raman scattering

Yuji Kang Takao Fukuoka Ryo Takahashi Yuichi Utsumi Yuichi Haruyama Shinji Matsui 《Microsystem Technologies》2014,20(10-11):1993-2000

This is the first report of surface-enhanced Raman scattering (SERS) substrate fabrication using a combination of imprinted hydrogen silsesquioxane (HSQ: HSiO_3/2) patterns and self-assembly of gold nanoparticles (AuNPs). To assemble the AuNPs inside the imprinted HSQ pattern, it is important to understand the interactions between AuNPs and AuNPs, and those between AuNPs and HSQ. The authors investigated the effects HSQ surface charges on the self-assembly of AuNPs. It was found that the negatively charged AuNPs were successfully assembled according to the geometry of the negatively charged HSQ pattern. In addition, it was shown that the SERS substrate fabricated from an HSQ consisting of an inorganic polymer was suitable for organic chemical analysis, by comparing it with a substrate fabricated using an organic polymer. 相似文献

10.

Logical diagnosis ofLDL programs

Oded Shmueli Shalom Tsur 《New Generation Computing》1991,9(3-4):277-303

The debuggers of Ref. 11) and most of their derivatives are of themeta-interpreter type. The computation of the debugger tracks the computation of the program to be diagnosed at the level of procedure call. This is adequate if the intuitive understanding of the programmer is in terms of procedure calls; as is indeed the case in Prolog. InLDL however, while the semantics of the language are described in a bottom-up, fixpoint model of computation,⁸⁾ theactual execution of a program is a complex sequence of low-level procedure calls determined (and optimized) by the compiler. Consequently, a trace of these procedure calls is of little use to the programmer. Further, one cannot “execute” anLDL program as if it was a Prolog program; the program may simply not terminate in its Prolog reading and severalLDL constructs have no obvious Prolog counterparts. We identify the origin of a fault in theLDL program by a top-down, query/subquery approach. The basic debugger, implemented in Prolog, is a shell program between the programmer and theLDL program: it poses queries and uses the results to drive the interaction with the user. It closely resembles the one presented in Ref. 11). The core of a more sophisticated debugger is presented as well. Several concepts are introduced in order to quantify debugging abilities. One is that of agenerated interpretation, in which the structureless intended interpretation of Ref. 11) is augmented with causality. Another is the (idealized) concept of areliable oracle. We show that given an incorrect program and a reliable oracle which uses a generated interpretation, a cause for the fault will be found in finitely many steps. This result carries over to the more sophisticated debugger. 相似文献

11.

A logic programming language based on the Andorra model

Seif Haridi 《New Generation Computing》1990,7(2-3):109-125

The Andorra model is a parallel execution model of logic programs which exploits the dependent and-parallelism and or-parallelism inherent in logic programming. We present a flat subset of a language based on the Andorra model, henceforth called Andorra Prolog, that is intended to subsume both Prolog and the committed choice languages. Flat Andorra, in addition todon’t know anddon’t care nondeterminism, supports control of or-parallel split, synchronisation on variables, and selection of clauses. We show the operational semantics of the language, and its applicability in the domain of committed choice languages. As an examples of the expressiveness of the language, we describe a method for communication between objects by time-stamped messages, which is suitable for expressing distributed discrete event simulation applications. This method depends critically on the ability to expressdon’t know nondeterminism and thus cannot easily be expressed in a committed choice language. 相似文献

12.

SAM meets MEMS: reliable fabrication of stable Au-patterns embedded in PDMS using dry peel-off process

Ikjoo Byun Anthony W. Coleman Beomjoon Kim 《Microsystem Technologies》2014,20(10-11):1783-1789

This paper describes a reliable method for fabrication of stable gold patterns embedded in polydimethylsiloxane (PDMS) using a direct peel-off process. Two different surface modifications with self-assembled monolayers were carried out for easy and reliable transfer of Au micro-patterns to the PDMS: (1) perfluorodecyltrichlorosilane on a Si substrate for easy release of the Au patterns from the Si substrate, and (2) (3-mercaptopropyl)trimethoxysilane on the Au patterns to promote the adhesion between the Au patterns and PDMS. Au features as small as 2 μm, in shapes of line and dots, were successfully transferred from the Si substrate to the PDMS over a 3-inch wafer. Transfer of Au patterns to PDMS using the dry peel-off process did not cause any contamination of PDMS, typically seen in wet chemical methods. Finally, the stability of the Au patterns embedded in PDMS was confirmed by the Scotch-tape adhesion test. 相似文献

13.

JackHare: a framework for SQL to NoSQL translation using MapReduce

Wu-Chun Chung Hung-Pin Lin Shih-Chang Chen Mon-Fong Jiang Yeh-Ching Chung 《Automated Software Engineering》2014,21(4):489-508

As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability. 相似文献

14.

The Aurora or-parallel Prolog system

Ewing Lusk Ralph Butler Terrence Disz Robert Olson Ross Overbeek Rick Stevens David H. D. Warren Alan Calderwood Péter Szeredi Seif Haridi Per Brand Mats Carlsson Andrzej Ciepielewski Bogumil Hausman 《New Generation Computing》1990,7(2-3):243-271

Aurora is a prototype or-parallel implementation of the full Prolog language for shared-memory multiprocessors, developed as part of an informal research collaboration known as the “Gigalips Project”. It currently runs on Sequent and Encore machines. It has been constructed by adapting Sicstus Prolog, a fast, portable, sequential Prolog system. The techniques for constructing a portable multiprocessor version follow those pioneered in a predecessor system, ANL-WAM. The SRI model was adopted as the means to extend the Sicstus Prolog engine for or-parallel operation. We describe the design and main implementation features of the current Aurora system, and present some experimental results. For a range of benchmarks, Aurora on a 20-processor Sequent Symmetry is 4 to 7 times faster than Quintus Prolog on a Sun 3/75. Good performance is also reported on some large-scale Prolog applications. 相似文献

15.

High performance Prolog on a RISC

Andrew Taylor 《New Generation Computing》1991,9(3-4):221-232

This paper presents some benchmark timings from an optimising Prolog compiler using global analysis for a RISC workstation, the MIPS R2030. These results are extremely promising. For example, the infamous naive reverse benchmark runs at 2 mega LIPS. We compare these timings with those for other Prolog implementations running on the same workstation and with published timings for the KCM, a recent piece of special purpose Prolog hardware. The comparison suggests that global analysis is a fruitful source of information for an optimising Prolog compiler and that the performance of special purpose Prolog hardware can be at least matched by the code from a compiler using such information. We include some analysis of the sources of the improvement global analysis yields. An overview of the compiler is given and some implementation issues are discussed. This paper is an extended version of Ref. 15) 相似文献

16.

Replication of sub-100 nm structures using h- and s-PDMS composite stamps

Christoph Huelsen Juergen Probst Bernd Loechel 《Microsystem Technologies》2014,20(10-11):2001-2004

Soft-UV-NIL as replication technique was used to replicate sub-100 nm structures. The aim of this work is the stamp production and the replication of structures with dimensions smaller than 100 nm in a simple manner. Composite stamps composed of two layers, a thin hard PDMS layer supported by a thick soft PDMS (s-PDMS) layer are compared to common s-PDMS stamps regarding the resolution by using a Siemens star (star burst pattern) as test structure. The master is fabricated by electron beam lithography in a 140 nm thick PMMA resist layer. The stamp is molded directly from the structured resist, without any additional anti sticking treatment. Therefore the resist thickness determines the aspect ratio, which is 1.5 at the resolution limit. The replication is done in a UV-curing cycloaliphatic epoxy material. The employed test structure provides good comparability, the resolution limit at a glance, and it integrates a smooth transition from micro- to nanostructures. Therefore it is a capable structure to characterize the UV-NIL. 相似文献

17.

Syringe-assisted point-of-care micropumping utilizing the gas permeability of polydimethylsiloxane

Linfeng Xu Hun Lee Kwang W. Oh 《Microfluidics and nanofluidics》2014,17(4):745-750

By utilizing the high gas permeability of polydimethylsiloxane (PDMS), a simple syringe-assisted pumping method was introduced. A dead-end microfluidic channel was partially surrounded by an embedded microchamber, with a thin PDMS wall isolating the dead-end channel and the embedded microchamber. A syringe was connected with the microchamber port by a short tube, and the syringe plunger was manually pulled out to generate low pressure inside the microchamber. When sample liquid was loaded in the inlet port, air trapped in the dead-end channel would diffuse into the surrounding microchamber through the PDMS wall, creating an instantaneous pumping of the liquid inside the dead-end channel. By only pulling the syringe manually, a constant low flow with a rate ranging from 0.089 to 4 nl/s was realized as functions of two key parameters: the PDMS wall thickness and the overlap area between the dead-end channel and the surrounded microchamber. This method enabled point-of-care pumping without pre-evacuating the PDMS devices in a bulky vacuum chamber. 相似文献

18.

Compiling OR-parallelism into AND-parallelism

Michael Codish Ehud Shapiro 《New Generation Computing》1987,5(1):45-61

This paper suggests a general method for compiling OR-parallelism into AND-parallelism. An interpreter for an AND/OR-parallel language written in the AND-parallel subset of the language induces a source-to-source transformation from the full language into the AND-parallel subset. This transformation can be identified and implemented as a special purpose compiler or applied using a general purpose partial evaluator. The method is demonstrated to compile a variant of Concurrent Prolog into an AND-parallel subset of the language called Flat Concurrent Prolog (FCP). It is also shown applicable to the compilation of OR-parallel Prolog to FCP. The transformation identified is simple and efficient. The performance of the method is discussed in the context of programming examples. These compare well with conventionally compiled Prolog programs. 相似文献

19.

Visually continuous quartics and quintics

D. Lasser 《Computing》1990,45(2):119-129

We present a Bézier representation of visually continuous quartics and quintics. Explicit formulas are given for the conversion of the Bézier representation to and vice versa a Hermite-like representation, defined by the continuity conditions. Positivity conditions which insure properties like convex hull and variation diminishing properties are given. 相似文献

20.

Orthogonal and fine lithographic structures attained from the next generation proton beam writing facility

Y. Yao P. Santhana Raman J. A. van Kan 《Microsystem Technologies》2014,20(10-11):2065-2069

A second generation proton beam writing (PBW) system has been built at the Centre for Ion Beam Applications at the National University of Singapore for fabrication of high aspect ratio 3D nano lithographic structures. System improvements and a few lithographic structures obtained with this facility are presented in this paper. Through accurate alignment of the magnetic quadrupole lenses and the electrostatic scanning system, orthogonal beam scanning has been achieved. The earlier constrain of limited beam scan area has been overcome by adopting a combination of beam and stage scanning as well as stitching. With these improvements smallest ever Ni structure of 65 nm in width has been fabricated using nickel electroplating on a proton beam written PMMA sample in the second generation PBW facility. Using this improved PBW facility, we have also demonstrated the fabrication of fine lithographic patterns with 19 nm line width and 60 nm spacing in 100 nm thick negative high resolution hydrogen silsesquioxane resist. Future possible system improvements leading to finer resolution will be discussed briefly. 相似文献