In the molecular dynamics simulation, an important step is the establishment of neighbor list for each particle, which involves the distance calculation for each particle pair in the simulation space. However, the distance calculation will cause costly floating-point operations. In this paper, we propose a novel algorithm, called Fast Neighbor List, which establishes the neighbor lists mainly using the bitwise operations. Firstly, we design a data layout, which uses an integer value to represent the three-dimensional coordinates of a particle. Then, a bunch of bitwise operations and two subtraction operations are used to judge whether the distance between a pair of particles is within the cutoff radius. We demonstrate that our algorithm can deal with the periodic boundary seamlessly. We also use single instruction multiple data (SIMD) instructions to further improve the performance. We implement our algorithm on Intel Xeon E5-2670, ARM v8, and Sunway many-core processors, respectively. Compared with the traditional method, our algorithm achieves on average 1.79x speedup on Intel Xeon E5-2670 processor, 3.43x speedup on ARM v8 processor, and 4.03x speedup on Sunway many-core processor. After using SIMD instructions, our algorithm achieves on average 2.64x speedup and 14.43x speedup on Intel Xeon E5-2670 and ARM v8 processors, respectively.
Researchers at Oak Ridge National Laboratory have developed an application code for calculating the electronic properties and energetics of disordered materials. The same source code has been compiled and run on workstations, Crays, and the Intel iPSC/860. This electronic structures code is capable of running at over 2 gigaflops on both an 8-processor CRAY Y-MP and a 128-processor Intel iPSC/860. Using this new KKR-CPA code, we executed density-of-state computations of a perovskite superconductor at a rate of 2527 megaflops on the Intel iPSC/860. This corresponds to a price/performance rate of 842 megaflops per $1 million based on the list price of this computer. Similar but smaller computations done on a network of ten IBM RS/6000 workstations executed at a price/performance rate of 1.3 gigaflops per $1 million.This research was supported by the Applied Mathematical Sciences Research Program, Office of Energy Research, and the Division of Materials Sciences, U.S. Department of Energy, under contract DE-AC05-84OR21400 with Martin Marietta Energy Systems, Inc. 相似文献
Investigates the context for the development of one of the earliest microprocessors, the Intel 4004. It considers the contributions made by Intel employees, most notably Marcian E. “Ted” Hoff, Jr. and Federico Faggin, and the contributions other people made to this development who are not generally known, most notably Tadashi Sasaki and Masatoshi Shima. This paper represents a case study of how corporate and national cultures affect technological development and of the many aspects of invention, including conceptualization, logical design, engineering, fabrication, capitalization and marketing 相似文献
Demand for mobile video applications is growing today in wireless handheld platforms. Optimizing instruction set architectures
and employing SIMD techniques is a logical approach towards attaining higher performance in mobile multimedia applications.
Intel? Wireless MMX™ technology has been designed to accelerate mobile multimedia and applications processing in a power efficient
manner. This paper provides an overview of Intel? Wireless MMX™ technology, a 64-bit Single Instruction Multiple Data (SIMD)
coprocessor for the Intel? XScale? microarchitecture, and the key features of the architecture that specifically enhance the
multi-media performance. Tools and techniques for optimization are also described.
Nigel C. Paver has 13 years experience with the ARM architecture, and in the Intel PCA Components group in Austin, Texas, he is responsible
for the architecture and implementation of multimedia coprocessors for the Intel XScale micro-architecture. He is also involved
in product architecture and definition of Intel PCA processors. Before Intel, Nigel was one of the lead designers of the early
AMULET asynchronous ARM microprocessors at the University of Manchester. He was also vice president in a startup company which
used asynchronous design techniques to produce a low-power asynchronous DSP core. Nigel holds a Master of Science degree and
Ph.D. in computer science from the University of Manchester and a Bachelor of Science degree in electronics from UMIST.
Moinul Khan is a multimedia product architect at Intel Corporation PCA Components group. He is responsible PCA graphics and security
architecture. His research interests are virtual prototyping, signal processing algorithms and architecture and communications
networking. Before joining Intel he was a technology specialist and founding member of a startup at ATDC, Georgia. He worked
on his doctoral research at Georgia Center for Advanced Telecommunications Technology at Georgia Institute of Technology.
He received his B.Tech form Indian Institute of Technology and MSEE from Georgia Tech. He also worked as a research member
for Canadian Institute for Telecommunications Research and Bell Communications Laboratories.
Bradley C. Aldrich joined Intel in 1997 where he is currently an architect within the PCA Components Group. His current work includes the development
of coprocessor instruction support in addition to image capture and display technologies for XScale based application processors.
He was previously a member of the Intel/Analog Devices joint development architecture team responsible for video enhancements
for the Micro Signal Architecture. Prior to that he was a video system architect in Intel's Digital Imaging and Video Division
working on CMOS sensors, still cameras, and tethered PC based video peripherals. He has also worked as a device engineer for
Motorola and as a test engineer for Tektronix. He received a BSEE in 1988 and MSEE in 1994 from the University of Texas at
San Antonio. 相似文献
With the advent of 3G mobile telephony, the demands placed on the memory and processing capabilities of SIM cards have increased dramatically. In order to cope with these demands, and to provide significantly enhanced capabilities, almost all the main players have now launched 128K SIM cards.This is a short news story only. Visit www.compseconline.com for the latest computer security news. 相似文献
The HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers—SGI Altix BX2, Cray X1, Cray Opteron Cluster, Dell Xeon Cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC Benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks results to study the performance of 11 MPI communication functions on these systems. 相似文献
We present an overview of the latest developments in the detection of metamorphic and virtualization-based malware using an
algebraic specification of the Intel 64 assembly programming language. After giving an overview of related work, we describe
the development of a specification of a subset of the Intel 64 instruction set in Maude, an advanced formal algebraic specification
tool. We develop the technique of metamorphic malware detection based on equivalence-in-context so that it is applicable to
imperative programming languages in general, and we give two detailed examples of how this might be used in a practical setting
to detect metamorphic malware. We discuss the application of these techniques within anti-virus software, and give a proof-of-concept
system for defeating detection counter-measures used by virtualization-based malware, which is based on our Maude specification
of Intel 64. Finally, we compare formal and informal approaches to malware detection, and give some directions for future
research. 相似文献
This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge–Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this work an implementation based on Intel Math Kernel Library (Intel MKL) routines and the authors’ own implementation, both using the CSR storage scheme and working on Intel Xeon Phi, were investigated. The implementation based on the Intel MKL library uses the high-performance BLAS and Sparse BLAS routines. In our application we focus on OpenMP style programming. We implement SpMV operation and vector addition using the basic optimizing techniques and the vectorization. We evaluate our approach in native and offload modes for various number of cores and thread allocation affinities. Both implementations (based on Intel MKL and made by the authors) were compared in respect of the time, the speedup and the performance. The numerical experiments on Intel Xeon Phi show that the performance of authors’ implementation is very promising and gives a gain of up to two times compared to the multithreaded implementation (based on Intel MKL) running on CPU (Intel Xeon processor) and even three times in comparison with the application which uses Intel MKL on Intel Xeon Phi. 相似文献