期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Flash as cache extension for online transactional workloads

Woon-Hak Kang Sang-Won Lee Bongki Moon 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(5):673-694

Considering the current price gap between hard disk and flash memory SSD storages, for applications dealing with large-scale data, it will be economically more sensible to use flash memory drives to supplement disk drives rather than to replace them. This paper presents FaCE, which is a new low-overhead caching strategy that uses flash memory as an extension to the RAM buffer of database systems. FaCE aims at improving the transaction throughput as well as shortening the recovery time from a system failure. To achieve the goals, we propose two novel algorithms for flash cache management, namely multi-version FIFO replacement and group second chance. This was possible due to flash write optimization as well as disk access reduction obtained by the FaCE caching methods. In addition, FaCE takes advantage of the nonvolatility of flash memory to fully support database recovery by extending the scope of a persistent database to include the data pages stored in the flash cache. We have implemented FaCE in the PostgreSQL open-source database server and demonstrated its effectiveness for TPC-C benchmarks in comparison with existing caching methods such as Lazy Cleaning and Linux Bcache. 相似文献

2.

Bitwise dimensional co-clustering for analytical workloads

Stephan Baumann Peter Boncz Kai-Uwe Sattler 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):291-316

Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. In this paper we propose a new processing and storage framework called bitwise dimensional co-clustering (BDCC) that avoids replication and thus keeps updates fast, yet is able to accelerate all these foreign key joins, efficiently support grouping and pushes down most dimensional selections. The core idea of BDCC is to cluster each table on a mix of dimensions, each possibly derived from attributes imported over an incoming foreign key and this way creating foreign key connected tables with partially shared clusterings. These are later used to accelerate any join between two tables that have some dimension in common and additionally permit to push down and propagate selections (reduce I/O) and accelerate aggregation and ordering operations. Besides the general framework, we describe an algorithm to derive such a physical co-clustering database automatically and describe query processing and query optimization techniques that can easily be fitted into existing relational engines. We present an experimental evaluation on the TPC-H benchmark in the Vectorwise system, showing that co-clustering can significantly enhance its already high performance and at the same time significantly reduce the memory consumption of the system. 相似文献

3.

System optimization for OLTP workloads

Kunkel S. Armstrong B. Vitale P. 《Micro, IEEE》1999,19(3):56-64

Major performance enhancements in large commercial systems are best achieved when advances in hardware technology are matched with advances in software technology. This article connects recent AS/400 hardware advances with the corresponding approaches used to tune the system performance for large online transaction processing (OLTP) workloads. We particularly emphasize those tuning efforts that affect the memory system. OLTP workloads are large and complex, stressing many parts of both the software and hardware. These workloads quickly expose software bottlenecks caused by contention on software locks. They also have large working sets, populated with hard-to-predict access patterns that make cache miss rates high. This causes the processor to spend a significant part of its execution time waiting for memory accesses. In multiprocessor systems, compilers alone have minimal effect on cycles spent in storage latency. Other optimizations are needed to affect this portion of the execution time, and many of those require direct involvement of the system software 相似文献

4.

Transactional scheduling for read-dominated workloads

Hagit Attiya Alessia Milani 《Journal of Parallel and Distributed Computing》2012

The transactional approach to contention management guarantees atomicity by aborting transactions that may violate consistency. A major challenge in this approach is to schedule transactions in a manner that reduces the total time to perform all transactions (the makespan), since transactions are often aborted and restarted. The performance of a transactional scheduler can be evaluated by the ratio between its makespan and the makespan of an optimal, clairvoyant scheduler that knows the list of resource accesses that will be performed by each transaction, as well as its release time and duration. 相似文献

5.

Software transactional memories for Scala

Daniel Goodman Behram KhanAuthor VitaeSalman KhanAuthor Vitae Mikel LujánAuthor VitaeIan WatsonAuthor Vitae 《Journal of Parallel and Distributed Computing》2013

Transactional memory is an alternative to locks for handling concurrency in multi-threaded environments. Instead of providing critical regions that only one thread can enter at a time, transactional memory records sufficient information to detect and correct for conflicts if they occur. This paper surveys the range of options for implementing software transactional memory in Scala. Where possible, we provide references to implementations that instantiate each technique. As part of this survey, we document for the first time several techniques developed in the implementation of Manchester University Transactions for Scala. We order the implementation techniques on a scale moving from the least to the most invasive in terms of modifications to the compilation and runtime environment. This shows that, while the less invasive options are easier to implement and more common, they are more verbose and invasive in the codes using them, often requiring changes to the syntax and program structure throughout the code. 相似文献

6.

On synthetic workloads for multiplayer online games: a methodology for generating representative shooter game workloads

Max Lehn Tonio Triebel Robert Rehner Benjamin Guthier Stephan Kopf Alejandro Buchmann Wolfgang Effelsberg 《Multimedia Systems》2014,20(5):609-620

We present approaches to the generation of synthetic workloads for benchmarking multiplayer online gaming infrastructures. Existing techniques, such as mobility or traffic models, are often either too simple to be representative for this purpose or too specific for a particular network structure. Desirable properties of a workload are reproducibility, representativeness, and scalability to any number of players. We analyze different mobility models and AI-based workload generators. Real gaming sessions with human players using the prototype game Planet PI4 serve as a reference workload. Novel metrics are used to measure the similarity between real and synthetic traces with respect to neighborhood characteristics. We found that, although more complicated to handle, AI players reproduce real workload characteristics more accurately than mobility models. 相似文献

7.

MyBenchmark: generating databases for query workloads

Eric Lo Nick Cheng Wilfred W. K. Lin Wing-Kai Hon Byron Choi 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(6):895-913

To evaluate the performance of database applications and database management systems (DBMSs), we usually execute workloads of queries on generated databases of different sizes and then benchmark various measures such as respond time and throughput. This paper introduces MyBenchmark, a parallel data generation tool that takes a set of queries as input and generates database instances. Users of MyBenchmark can control the characteristics of the generated data as well as the characteristics of the resulting workload. Applications of MyBenchmark include DBMS testing, database application testing, and application-driven benchmarking. In this paper, we present the architecture and the implementation algorithms of MyBenchmark. Experimental results show that MyBenchmark is able to generate workload-aware databases for a variety of workloads including query workloads extracted from TPC-C, TPC-E, TPC-H, and TPC-W benchmarks. 相似文献

8.

Distributed transactional memory for general networks

Gokarna Sharma Costas Busch 《Distributed Computing》2014,27(5):329-362

We consider the problem of implementing transactional memory in large-scale distributed networked systems. We present Spiral, a novel distributed directory-based protocol for transactional memory, and theoretically analyze and experimentally evaluate it for the performance boundaries of this approach from the worst-case perspective. Spiral is designed for the data-flow distributed implementation of software transactional memory which supports three basic operations: publish, allowing a shared object to be inserted in the directory so that other nodes can find it; lookup, providing a read-only copy of the object to the requesting node; move, allowing the requesting node to write the object locally after the node gets it. The protocol runs on a hierarchical directory construction based on sparse covers, where clusters at each level are ordered to avoid race conditions while serving concurrent requests. Given a shared object the protocol maintains a directory path pointing to the object. The basic idea is to use “spiral” paths that grow outward to search for the directory path of the object in a bottom-up fashion. For general networks, this protocol guarantees an \(\mathcal{O}(\log ^2 n\cdot \log D)\) approximation in sequential and one-shot concurrent executions of a finite set of move requests, where \(n\) is the number of nodes and \(D\) is the diameter of the network. It also guarantees poly-log approximation for any single lookup request. Our bounds are deterministic and hold in the worst-case. Moreover, this protocol requires only polylogarithmic bits of memory per node. Experimental evaluations in real networks also confirm our theoretical findings. To the best of our knowledge, this is the first deterministic consistency protocol for distributed transactional memory that achieves poly-log approximation in general networks. 相似文献

9.

Acceptable workloads for three common mining materials

《Ergonomics》2012,55(9):1013-1031

A series of psychophysical lifting studies was conducted to establish maximum acceptable weights of lift (MAWL) for three supply items commonly handled in underground coal mines (rock dust bags, ventilation stopping blocks, and crib blocks). Each study utilized 12 subjects, all of whom had considerable experience working in underground coal mines. Effects of lifting in four postures (standing, stooping under a 1·5m ceiling, stooping under a l·2m ceiling, and kneeling) were investigated together with four lifting conditions (combinations of lifting symmetry and lifting height). The frequency of lifting was set at four per min, and the task duration was 15?min. Posture significantly affected the MAWL for the rock dust bag (standing MAWL was 7% greater than restricted postures and kneeling MAWL was 6·4% less than stooped); however, posture interacted with lifting conditions for both of the other materials. Physiological costs were found to be significantly greater in the stooped postures compared with kneeling for all materials. Other contrasts (standing versus restricted postures, stooping under 1·5?m ceiling versus stooping under l·2?m ceiling) did not exhibit significantly different levels of energy expenditure. Energy expenditure was significantly affected by vertical lifting height; however, the plane of lifting had little influence on metabolic cost. Recommended acceptable workloads for the three materials are 20·0?kg for the rock dust bag, 16·5?kg for the ventilation stopping block, and 14·7?kg for the crib block. These results suggest that miners are often required to lift supplies that are substantially heavier than psychophysically acceptable lifting limits. 相似文献

10.

Distributed transactional memory for metric-space networks

Maurice Herlihy Ye Sun 《Distributed Computing》2007,20(3):195-208

Transactional Memory is a concurrent programming API in which concurrent threads synchronize via transactions (instead of locks). Although this model has mostly been studied in the context of multiprocessors, it has attractive features for distributed systems as well. In this paper, we consider the problem of implementing transactional memory in a network of nodes where communication costs form a metric. The heart of our design is a new cache-coherence protocol, called the Ballistic protocol, for tracking and moving up-to-date copies of cached objects. For constant-doubling metrics, a broad class encompassing both Euclidean spaces and growth-restricted networks, this protocol has stretch logarithmic in the diameter of the network. Supported by NSF grant 0410042 and by grants from Intel Corporation and Sun Microsystems. 相似文献

11.

Evaluating ARM HPC clusters for scientific workloads

Jahanzeb Maqbool Sangyoon Oh Geoffrey C. Fox 《Concurrency and Computation》2015,27(17):5390-5410

The power consumption of modern high‐performance computing (HPC) systems that are built using power hungry commodity servers is one of the major hurdles for achieving Exascale computation. Several efforts have been made by the HPC community to encourage the use of low‐powered system‐on‐chip (SoC) embedded processors in large‐scale HPC systems. These initiatives have successfully demonstrated the use of ARM SoCs in HPC systems, but there is still a need to analyze the viability of these systems for HPC platforms before a case can be made for Exascale computation. The major shortcomings of current ARM‐HPC evaluations include a lack of detailed insights about performance levels on distributed multicore systems and performance levels for benchmarking in large‐scale applications running on HPC. In this paper, we present a comprehensive evaluation of results that covers major aspects of server and HPC benchmarking for ARM‐based SoCs. For the experiments, we built an unconventional cluster of ARM Cortex‐A9s that is referred to as Weiser and ran single‐node benchmarks (STREAM, Sysbench, and PARSEC) and multi‐node scientific benchmarks (High‐performance Linpack (HPL), NASA Advanced Supercomputing (NAS) Parallel Benchmark, and Gadget‐2) in order to provide a baseline for performance limitations of the system. Based on the experimental results, we claim that the performance of ARM SoCs depends heavily on the memory bandwidth, network latency, application class, workload type, and support for compiler optimizations. During server‐based benchmarking, we observed that when performing memory intensive benchmarks for database transactions, x86 performed 12% better for multithreaded query processing. However, ARM performed four times better for performance to power ratios for a single core and 2.6 times better on four cores. We noticed that emulated double precision floating point in Java resulted in three to four times slower performance as compared with the performance in C for CPU‐bound benchmarks. Even though Intel x86 performed slightly better in computation‐oriented applications, ARM showed better scalability in I/O bound applications for shared memory benchmarks. We incorporated the support for ARM in the MPJ‐Express runtime and performed comparative analysis of two widely used message passing libraries. We obtained similar results for network bandwidth, large‐scale application scaling, floating‐point performance, and energy‐efficiency for clusters in message passing evaluations (NBP and Gadget 2 with MPJ‐Express and MPICH). Our findings can be used to evaluate the energy efficiency of ARM‐based clusters for server workloads and scientific workloads and to provide a guideline for building energy‐efficient HPC clusters. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

12.

Managed acceleration for In-Memory database analytic workloads

Eoghan O’Neill John McGlone Peter Kilpatrick Dimitrios Nikolopoulos 《International Journal of Parallel, Emergent and Distributed Systems》2017,32(4):406-427

In-Memory Databases (IMDBs), such as SAP HANA, enable new levels of database performance by removing the disk bottleneck and by compressing data in memory. The consequence of this improved performance means that reports and analytic queries can now be processed on demand. Therefore, the goal is now to provide near real-time responses to compute and data intensive analytic queries. To facilitate this, much work has investigated the use of acceleration technologies within the database context. While current research into the application of these technologies has yielded positive results, they have tended to focus on single database tasks or on isolated single user requests. This paper uses SHEPARD, a framework for managing accelerated tasks across shared heterogeneous resources, to introduce acceleration into an IMDB. Results show how, using SHEPARD, multiple simultaneous user queries all receive speed-up by using a shared pool of accelerators. Results also show that offloading analytic tasks onto accelerators can have indirect benefits for other database workloads by reducing contention for CPU resources. 相似文献

13.

M-generalization for multipurpose transactional data publication

Xianxian LI Peipei SUI Yan BAI Li-E WANG 《Frontiers of Computer Science》2018,12(6):1241-1254

Transactional data collection and sharing currently face the challenge of how to prevent information leakage and protect data from privacy breaches while maintaining high-quality data utilities. Data anonymization methods such as perturbation, generalization, and suppression have been proposed for privacy protection. However, many of these methods incur excessive information loss and cannot satisfy multipurpose utility requirements. In this paper, we propose a multidimensional generalization method to provide multipurpose optimization when anonymizing transactional data in order to offer better data utility for different applications. Our methodology uses bipartite graphs with generalizing attribute, grouping item and perturbing outlier. Experiments on real-life datasets are performed and show that our solution considerably improves data utility compared to existing algorithms. 相似文献

14.

Wavelet synopsis for hierarchical range queries with workloads

Sudipto Guha Hyoungmin Park Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1079-1099

Synopses structures and approximate query answering have become increasingly important in DSS/ OLAP applications with stringent response time requirements. Range queries are an important class of problems in this domain, and have a wide variety of applications and have been studied in the context of histograms. However, wavelets have been shown to be quite useful in several scenarios and in fact their multi-resolution structure makes them especially appealing for hierarchical domains. Furthermore the fact that the Haar wavelet basis has a linear time algorithm for the computation of coefficients has made the Haar basis one of the important and widely used synopsis structures. Very recently optimal algorithms were proposed for the wavelet synopsis construction problem for equality/point queries. In this paper we investigate the problem of optimum Haar wavelet synopsis construction for range queries with workloads. We provide optimum algorithms as well as approximation heuristics and demonstrate the effectiveness of these algorithms with our extensive experimental evaluation using synthetic and real-life data sets. Research was supported in part by the Alfred P. Sloan Research Fellowship and NSF awards CCF-0430376, CCF-0644119. Research was supported by the Ministry of Information and Communication, Korea, under the College Information Technology Research Center Support Program, grant number IITA-2006-C1090-0603-0031. 相似文献

15.

Acceptable workloads for three common mining materials.

S Gallagher C A Hamrick 《Ergonomics》1992,35(9):1013-1031

A series of psychophysical lifting studies was conducted to establish maximum acceptable weights of lift (MAWL) for three supply items commonly handled in underground coal mines (rock dust bags, ventilation stopping blocks, and crib blocks). Each study utilized 12 subjects, all of whom had considerable experience working in underground coal mines. Effects of lifting in four postures (standing, stooping under a 1.5 m ceiling, stooping under a 1.2 m ceiling, and kneeling) were investigated together with four lifting conditions (combinations of lifting symmetry and lifting height). The frequency of lifting was set at four per min, and the task duration was 15 min. Posture significantly affected the MAWL for the rock dust bag (standing MAWL was 7% greater than restricted postures and kneeling MAWL was 6.4% less than stopped); however, posture interacted with lifting conditions for both of the other materials. Physiological costs were found to be significantly greater in the stooped postures compared with kneeling for all materials. Other contrasts (standing versus restricted postures, stooping under 1.5 m ceiling versus stopping under 1.2 m ceiling) did not exhibit significantly different levels of energy expenditure. Energy expenditure was significantly affected by vertical lifting height; however, the plane of lifting had little influence on metabolic cost. Recommended acceptable workloads for the three materials are 20.0 kg for the rock dust bag, 16.5 kg for the ventilation stopping block, and 14.7 kg for the crib block. These results suggest that miners are often required to lift supplies that are substantially heavier than psychophysically acceptable lifting limits. 相似文献

16.

Software transactional memory 总被引：1，自引：0，他引：1

Nir Shavit Dan Touitou 《Distributed Computing》1997,10(2):99-116

Summary. As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy. Received: January 1996 / Revised: June 1996 / Accepted: August 1996 相似文献

17.

A chip multithreaded processor for network-facing workloads

Sanjiv Kapil McGhan H. Lawrendra J. 《Micro, IEEE》2004,24(2):20-30

Throughput computing is based on chip multithreading processor design technology. In CMT technology, maximizing the amount of work accomplished per unit of time or other relevant resource, rather than minimizing the time needed to complete a given task or set of tasks, defines performance. By CMT standards, the best processor accomplishes the most work per second of time, per watt of expended power, per square millimeter of die area, and so on (that is, it operates most efficiently). The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads. Network-facing systems primarily service network clients and are often grouped together under die label "Web servers". The processor's dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area. Given the short design cycle Sun needed to create the processor, the result is a compelling early proof of the value of throughput computing. 相似文献

18.

软件事务内存的动态竞争管理策略

林菲《计算机工程与设计》2010,31(7)

软件事务内存是为了简化并行程序设计而出现的一种新的程序设计技术.为了降低软件事务内存系统中事务冲突的发生频率以提升系统整体性能,提出了一种新的基于动态控制和队列调度的竞争管理策略.定义了竞争强度的概念和系统总体框架,并在此基础上给出了利用运行时反馈信息动态调节竞争强度的方法.同时给出了事务序列化的设计方法与实现中应注意的问题,通过将冲突概率大的事务序列化以达到避免相同冲突再次发生的目的.结合常用的基准数据结构,对模型和算法进行了实验,最后结果表明了算法的正确性和有效性. 相似文献

19.

Robust fuzzy CPU utilization control for dynamic workloads

Can Basaran Author Vitae Xue Liu^{Author Vitae} 《Journal of Systems and Software》2010,83(7):1192-1204

In a number of real-time applications such as target tracking, precise workloads are unknown a priori but may dynamically vary, for example, based on the changing number of targets to track. It is important to manage the CPU utilization, via feedback control, to avoid severe overload or underutilization even in the presence of dynamic workloads. However, it is challenge to model a real-time system for feedback control, as computer systems cannot be modeled via physics laws. In this paper, we present a novel closed-loop approach for utilization control based on formal fuzzy logic control theory, which is very effective to support the desired performance in a nonlinear dynamic system without requiring a system model. We mathematically prove the stability of the fuzzy closed-loop system. Further, in a real-time kernel, we implement and evaluate our fuzzy logic utilization controller as well as two existing utilization controllers based on the linear and model predictive control theory for an extensive set of workloads. Our approach supports the specified average utilization set-point, while showing the best transient performance in terms of utilization control among the tested approaches. 相似文献

20.

Designing computer architecture research workloads

Eeckhout L. Vandierendonck H. De Bosschere K. 《Computer》2003,36(2):65-71

Although architectural simulators model microarchitectures at a high abstraction level, the increasing complexity of both the microarchitectures themselves and the applications that run on them make simulator use extremely time-consuming. Simulators must execute huge numbers of instructions to create a workload representative of real applications, creating an unreasonably long simulation time and stretching the time to market. Using reduced input sets instead of reference input sets helps to solve this problem. The authors have developed a methodology that reliably quantifies program behavior similarity to verify if reduced input sets result in program behavior similar to the reference inputs. 相似文献