共查询到20条相似文献,搜索用时 15 毫秒
1.
The reverse mode of automatic differentiation executes the adjoint statements induced by each statement in the original program in the reverse order of the original program flow. This program flow reversal commonly requires storage of information on the control flow of the original program. In addition, intermediate values of variables that are overwritten have to be recorded, as these values may later be needed to compute the partial derivatives of the corresponding statement. The stored information will be accessed in reverse order of being written. This runs contrary to many assumptions made in standard implementations of file systems, operating systems, and input/output (I/O) libraries. A common buffering strategy aimed at speeding up future read requests is to employ read‐ahead. This strategy is useful for accesses in forward direction but is considered to be harmful to the performance of the reverse mode. To increase the performance of the reverse mode, it is also advantageous to interleave computations with the data storage and retrieval operations, which can be achieved using multithreading. To this end, we design and implement a novel software called reverse‐mode I/O stream (RIOS) that is adapted to these particular requirements of the reverse mode. We show the advantages of RIOS in two empirical case studies, an artificially constructed example of a typical I/O pattern in the reverse mode and a real‐world example arising from fluid mechanics, which is studied in Fortran90 and in Matlab where the reverse mode is generated via the automatic differentiation tools Tapenade and ADiMat, respectively. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
2.
An efficient numerical method for computing permanental polynomials of graphs is proposed. It adapts multi-entry expansion of FFT, and is parallel in nature. It is applied to fullerene-type graphs, and works for C56, while the largest fullerene computed before is C40. Extensive numerical computations show that the algorithm is fast and stable. 相似文献
3.
《Computers & chemistry》1986,10(2):153-161
We present a package of FORTRAN modules to perform general and efficient I/O operations in external sort, bin sort and related environments. A partition of a user logical record of length IRLU into NREC records of length IRL is carried out when IRLU exceeds the maximum permissible record length for a given computer and file organization. Efficient random direct access to a file requires that IRL be a multiple of the smallest addressable unit on a disk. Overcoming implied DO lists in I/O statements becomes significant in tight memory environments. Corresponding gains in execution times are discussed. 相似文献
4.
Steve C. Chiu 《The Journal of supercomputing》2008,46(2):105-107
The abundance of parallel and distributed computing platforms, such as MPP, SMP, and the Beowulf clusters, to name just a
few, has added many more possibilities and challenges to high performance computing (HPC), parallel I/O, mass data storage,
scalable architectures, and large-scale simulations, which traditionally belong to the realm of custom-tailored parallel systems.
The intent of this special issue is to discuss problems and solutions, to identify new issues, and to help shape future research
directions in these areas. From these perspectives, this special issue addresses the problems encountered at the hardware,
architectural, and application levels, while providing conceptual as well as empirical treatments to the current issues in
high performance computing, and the I/O architectures and systems utilized therein. 相似文献
5.
Virtualization is a key technology to enable cloud computing. Driver domain based model for network virtualization offers isolation and high levels of flexibility. However, it suffers from poor performance and lacks scalability. In this paper, we evaluate networking performance of virtual machines within Xen. The I/O channel transferring packets between the driver domain and the virtual machines is shown to be the bottleneck. To overcome this limitation, we proposed a packet aggregation based mechanism to transfer packets from the driver domain to the virtual machines. Packet aggregation, combined with an efficient core allocation, allows virtual machines throughput to scale up by 700%, while minimizing both memory and CPU consumption. Besides, aggregation impact on packets delay and jitter remains acceptable. Hence, the proposed I/O virtualization model satisfies infrastructure providers to offer Cloud computing services. 相似文献
6.
7.
Hakan Ferhatosmanoglu Aravind Ramachandran Divyakant Agrawal Amr El Abbadi 《Information Systems》2007
In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets. 相似文献
8.
《Information Systems》2002,27(1):41-74
Most algorithms for association rule mining are variants of the basic Apriori algorithm (Agarwal and Srikant, Fast algorithms for mining association rules in databases, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Santiago, Chile, 1994, pp. 487–499). One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds, with the size of the itemsets incremented by one per round. The number of database scans required by Apriori-based algorithms thus depends on the size of the biggest frequent itemsets. In this paper, we devise a more general candidate set generation algorithm, LGen, which generates candidate itemsets of multiple sizes during each database scan. We present an algorithm FindLarge which uses LGen to find frequent itemsets. We show that, given a reasonable set of suggested frequent itemsets, FindLarge can significantly reduce the number of I/O passes required. In the best cases, only two passes are sufficient to discover all the frequent itemsets irrespective of the size of the biggest ones.Two I/O-saving algorithms, namely DIC and Pincher-Search, are compared with FindLarge in a series of experiments. We discuss the conditions under which FindLarge significantly outperforms the others in terms of I/O efficiency. 相似文献
9.
An efficient I/O subsystem enables cost-effective network processing. To improve high-speed data transfer, the I/O subsystem sends data directly into the processing core's register file. An implementation of this subsystem in a single-chip network processor , the Pro/sup 3/, can sustain advanced inspection firewall processing of 2.5-Gbps TCP traffic. 相似文献
10.
Emerging non-volatile memory technologies, especially flash-based solid state drives (SSDs), have increasingly been adopted in the storage stack. They provide numerous advantages over traditional mechanically rotating hard disk drives (HDDs) and have a tendency to replace HDDs. Due to the long existence of HDDs as primary building blocks for storage systems, however, much of the system software has been specially designed for HDD and may not be optimal for non-volatile memory media. Therefore, in order to realistically leverage its superior raw performance to the maximum, the existing upper layer software has to be re-evaluated or re-designed. To this end, in this paper, we propose PASS, an optimized I/O scheduler at the Linux block layer to accommodate the changing trend of underlying storage devices toward flash-based SSDs. PASS takes the rich internal parallelism in SSDs into account when dispatching requests to the device driver in order to achieve high performance. Specifically, it parti-tions the logical storage space into fixed-size regions (preferably the component package sizes) as scheduling units. These scheduling units are serviced in a round-robin manner and for every chance that the chosen dispatching unit issues only a batch of either read or write requests to suppress the excessive mutual interference. Additionally, the requests are sorted according to their visiting addresses while waiting in the dispatching queues to exploit high sequential performance of SSD. The experimental results with a variety of workloads have shown that PASS outperforms the four Linux off-the-shelf I/O schedulers by a degree of 3%up to 41%, while at the same time it improves the lifetime significantly, due to reducing the internal write amplification. 相似文献
11.
采样数据的并行I/O制约一些并行应用的运行效率。设计、实现了采样数据的聚集并行I/O方法。该方法在客户端部署采样数据缓存,然后合并数据到输出进程,再存储到文件。为了保障并行程序长时间运行过程中采样数据的存储一致性,该方法在JASMIN框架中监测应用程序的运行状态,当并行程序发生负载平衡或者重启动时刷新或者恢复数据。I/O过程中,进一步使用HDF5的分块I/O提高列存储数据的读写效率。测试表明,新方法不仅具有较好的可扩展性,还能在具有负载平衡与重启动等复杂功能的并行应用中提高采样数据的并行 I/O 效率7.5倍以上。 相似文献
12.
目前Web环境中蕴涵着大量的Web服务和Web服务请求,基于语义的Web服务匹配能够提高Web服务发现的准确性,但由于其复杂的语义计算,导致系统响应速度慢。首先,对语义Web服务过程进行了分析,确定大量的语义计算主要集中在输入/输出(I/O)匹配环节;然后,在研究现有I/O匹配算法和分析影响语义相似度的主要因素基础上,给出了一种带有高效索引的语义Web服务I/O匹配优化方法,包括:高效索引的建立和基于哈希二次探测再散列的启发式筛选机制的提出;最后,通过实例证明了该方法切实可行。该方法通过筛除无关Web服务,减少了语义计算量,提高了系统响应速度,进而带来了更好的用户体验。 相似文献
13.
Computing clusters (CC) consisting of several connected machines, could provide a high-performance, multiuser, timesharing environment for executing parallel and sequential jobs. In order to achieve good performance in such an environment, it is necessary to assign processes to machines in a manner that ensures efficient allocation of resources among the jobs. The paper presents opportunity cost algorithms for online assignment of jobs to machines in a CC. These algorithms are designed to improve the overall CPU utilization of the cluster and to reduce the I/O and the interprocess communication (IPC) overhead. Our approach is based on known theoretical results on competitive algorithms. The main contribution of the paper is how to adapt this theory into working algorithms that can assign jobs to machines in a manner that guarantees near-optimal utilization of the CPU resource for jobs that perform I/O and IPC operations. The developed algorithms are easy to implement. We tested the algorithms by means of simulations and executions in a real system and show that they outperform existing methods for process allocation that are based on ad hoc heuristics. 相似文献
14.
15.
在设备自动化升级过程中.置换I/O连接通常被看作是一种最基本的设备升级。设备内部I/O系统能够减少制造过程与控制系统之间因置换I/O连接而带来的不便。而且.在某些情况下.I/O系统甚至能够成为控制系统。 相似文献
16.
17.
Consider a probabilistic graph G in which the edges of E(G) are perfectly reliable, but the vertices of V(G) may fail with some known probabilities. Given a subset K of V(G), the K-terminal residual reliability of G is the probability that all operational vertices in K are connected to each other. This problem can be considered to be a generalization of two well-known reliability problems – the K-terminal reliability problem and the residual connectedness reliability problem. 相似文献
18.
Seung Woo Son Saba Sehrish Wei-keng Liao Ron Oldfield Alok Choudhary 《The Journal of supercomputing》2017,73(5):2069-2097
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. The I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. We demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation. 相似文献
19.
Rotator graphs, a set of directed permutation graphs, are proposed as an alternative to star and pancake graphs. Rotator graphs are defined in a way similar to the recently proposed Faber-Moore graphs. They have smaller diameter, n -1 in a graph with n factorial vertices, than either the star or pancake graphs or the k -ary n -cubes. A simple optimal routing algorithm is presented for rotator graphs. The n -rotator graphs are defined as a subset of all rotator graphs. The distribution of distances of vertices in the n -rotator graphs is presented, and the average distance between vertices is found. The n -rotator graphs are shown to be optimally fault tolerant and maximally one-step fault diagnosable. The n -rotator graphs are shown to be Hamiltonian, and an algorithm for finding a Hamiltonian circuit in the graphs is given 相似文献
20.
U Kang Hanghang Tong Jimeng Sun Ching-Yung Lin Christos Faloutsos 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(5):637-650
Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time. 相似文献