期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A mixed transaction processing and operational reporting benchmark

Anja Bog Hasso Plattner Alexander Zeier 《Information Systems Frontiers》2011,13(3):321-335

The importance of reporting is ever increasing in today’s fast-paced market environments and the availability of up-to-date information for reporting has become indispensable. Current reporting systems are separated from the online transaction processing systems (OLTP) with periodic updates pushed in. A pre-defined and aggregated subset of the OLTP data, however, does not provide the flexibility, detail, and timeliness needed for today’s operational reporting. As technology advances, this separation has to be re-evaluated and means to study and evaluate new trends in data storage management have to be provided. This article proposes a benchmark for combined OLTP and operational reporting, providing means to evaluate the performance of enterprise data management systems for mixed workloads of OLTP and operational reporting queries. Such systems offer up-to-date information and the flexibility of the entire data set for reporting. We describe how the benchmark provokes the conflicts that are the reason for separating the two workloads on different systems. In this article, we introduce the concepts, logical data schema, transactions and queries of the benchmark, which are entirely based on the original data sets and real workloads of existing, globally operating enterprises. 相似文献

2.

Characterization of the Impact of Hardware Islands on OLTP

Danica Porobic Ippokratis Pandis Miguel Branco Pınar Tözün Anastasia Ailamaki 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(5):625-650

Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, non-uniform core topology does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. We show that no strategy is optimal for all cases and that the best choice depends on the combination of hardware topology and workload characteristics. Finally, we argue that transaction processing systems must be aware of the hardware topology in order to achieve predictably high performance. 相似文献

3.

The Manchester dataflow machine

J.R. Gurd 《Future Generation Computer Systems》1985,1(4):201-212

A prototype dataflow computer system has been constructed by a research team at the University of Manchester. The hardware has been operational since October 1981, but has been steadily enhanced since that time. Store capacities and I/O bandwidth are approaching the state where realistically large applications programs can be used to evaluate system performance.During the period of hardware enhancement, there have been parallel advances in the development of dataflow system software in the form of assemblers, compilers, debugging systems and sundry software tools. The software is also approaching readiness for application to large-scale benchmark programs.The Manchester system implements a tagged-token dataflow model of computation. This model imposes a tag-field penalty on data values in order to maximise asynchronousness of instruction execution. It is important to know to what extent this overhead is necessary, and how to minimise it whilst maintaining acceptable asynchronousness. In comparison with more conventional architectures it is important that useful measures of cost and performance be developed. With emphasis on these issues, the paper describes the structure of the Manchester dataflow hardware and software, and outlines the system performance results so far obtained. Whilst this work is far from complete, it suggests new avenues for hardware and software development which are being followed at Manchester and elsewhere. 相似文献

4.

Scalability of write-ahead logging on multicore and multisocket hardware

Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(2):239-263

The shift to multi-core and multi-socket hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important yet-to-be addressed potential bottlenecks, especially in OLTP workloads making frequent small changes to data. In this paper, we identify four logging-related impediments to database system scalability. Each issue challenges different level in the software architecture: (a) the high volume of small-sized I/O requests may saturate the disk, (b) transactions hold locks while waiting for the log flush, (c) extensive context switching overwhelms the OS scheduler with threads executing log I/Os, and (d) contention appears as transactions serialize accesses to in-memory log data structures. We demonstrate these problems and address them with techniques that, when combined, comprise a holistic, scalable approach to logging. Our solution achieves a 20–69% speedup over a modern database system when running log-intensive workloads, such as the TPC-B and TATP benchmarks, in a single-socket multiprocessor server. Moreover, it achieves log insert throughput over 2.2 GB/s for small log records on the single-socket server, roughly 20 times higher than the traditional way of accessing the log using a single mutex. Furthermore, we investigate techniques on scaling the performance of logging to multi-socket servers. We present a set of optimizations which partly ameliorate the latency penalty that comes with multi-socket hardware, and then we investigate the feasibility of applying a distributed log buffer design at the socket level. 相似文献

5.

Feasibility of decoupling memory management from the execution pipeline

Wentong Mehran Krishna Afrin Philip 《Journal of Systems Architecture》2007,53(12):927-936

In conventional architectures, the central processing unit (CPU) spends a significant amount of execution time allocating and de-allocating memory. Efforts to improve memory management functions using custom allocators have led to only small improvements in performance. In this work, we test the feasibility of decoupling memory management functions from the main processing element to a separate memory management hardware. Such memory management hardware can reside on the same die as the CPU, in a memory controller or embedded within a DRAM chip. Using Simplescalar, we simulated our architecture and investigated the execution performance of various benchmarks selected from SPECInt2000, Olden and other memory intensive application suites.

Hardware allocator reduced the execution time of applications by as much as 50%. In fact, the decoupled hardware results in a performance improvement even when we assume that both the hardware and software memory allocators require the same number of cycles. We attribute much of this improved performance to improved cache behavior since decoupling memory management functions reduces cache pollution caused by dynamic memory management software. We anticipate that even higher levels of performance can be achieved by using innovative hardware and software optimizations. We do not show any specific implementation for the memory management hardware. This paper only investigates the potential performance gains that can result from a hardware allocator. 相似文献

6.

基于NVM和HTM的低时延事务处理

魏星达陆放明陈榕陈海波臧斌宇《软件学报》2022,33(3):849-866

硬件事务内存(hardware transactional memory,HTM)能够极大地提升多核内存事务处理的吞吐.然而,为了避免慢速持久化设备对事务吞吐的影响,现有系统以批量的方式提交事务,这使得事务提交有极高的延迟.低时延非易失性内存(non-volatile memory,NVM)的出现,给降低基于HTM的内... 相似文献

7.

Engineering intelligence into real-time applications

L.L. Odette W. B. Dress 《Expert Systems》1987,4(4):228-239

相似文献

8.

Is it DSS or OLTP: automatically identifying DBMS workloads

Said Elnaffar Pat Martin Berni Schiefer Sam Lightstone 《Journal of Intelligent Information Systems》2008,30(3):249-271

The type of the workload on a database management system (DBMS) is a key consideration in tuning the system. Allocations for resources such as main memory can be very different depending on whether the workload type is Online Transaction Processing (OLTP) or Decision Support System (DSS). A DBMS also typically experiences changes in the type of workload it handles during its normal processing cycle. Database administrators must therefore recognize the significant shifts of workload type that demand reconfiguring the system in order to maintain acceptable levels of performance. We envision intelligent, autonomic DBMSs that have the capability to manage their own performance by automatically recognizing the workload type and then reconfiguring their resources accordingly. In this paper, we present an approach to automatically identifying a DBMS workload as either OLTP or DSS. Using data mining techniques, we build a classification model based on the most significant workload characteristics that differentiate OLTP from DSS and then use the model to identify any change in the workload type. We construct and compare classifiers built from two different sets of workloads, namely the TPC-C and TPC-H benchmarks and the Browsing and Ordering profiles from the TPC-W benchmark. We demonstrate the feasibility and success of these classifiers with TPC-generated workloads and with industry-supplied workloads. 相似文献

9.

Design space exploration of a software speculative parallelization scheme 总被引：2，自引：0，他引：2

Cintra M. Llanos D.R. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(6):562-576

With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software schemes require no changes to the hardware of existing shared-memory systems, but can suffer from significant overheads involved with the speculative execution. In fact, the performance of software schemes is highly dependent on application characteristics, the design and implementation of the scheme, and the system configuration and size. This paper explores the design space of a recently proposed software speculative parallelization scheme. In the process, we gain insight into the most beneficial features of software schemes for speculative parallelization, as well as the most influential application characteristics. For instance, experimental results show that, contrary to intuition, checking for data dependence violations on every speculative store, as opposed to at commit time, leads to little performance degradation in the worst case and to significantly better performance with large configurations. Also, scheduling policies based on windows can perform very close to fully dynamic policies with a fraction of the memory overhead. Finally, experimental results show consistent speedups in the execution of loops that cannot be parallelized at compile time, both with and without RAW data dependences, for 4 to 32 processors. 相似文献

10.

Rigorous embedded design: challenges and perspectives

Saddek Bensalem Axel Legay Marius Bozga 《International Journal on Software Tools for Technology Transfer (STTT)》2013,15(3):149-154

The design of embedded systems radically differs from pure software design in that it should take into account not only the functional, but also extra-functional specifications regarding the use of resources of the execution platform such as processing time, memory, and energy. Meeting extra-functional specifications is essential for the design of embedded systems. It requires predictability of the impact of design choices on the overall behavior of the designed system. It also implies a deep understanding of the interaction between application software and the underlying execution platform. We currently lack approaches for modeling mixed hardware–software systems. There are currently no established rigorous techniques for deriving global models of a given system from models of its application software and its execution platform. However, many researchers and industrials are nowadays working in this area and proposing solutions. The Rigorous Embedded Design Red workshop which took place at EUROSYS11 provided an unique opportunity to discuss several new methodologies for the rigorous design of embedded systems. Through a series of invited talks, the workshop appraised some of the challenges and emerging approaches in the area. A series of design flows has been presented and the workshop discussions focused on performance analysis, correctness (high confidence and security), code generation, and modeling aspects (including timed scheduling and software/hardware interactions). Those concepts have been illustrated with examples coming from the aeronautic, automotive, and robotic areas. The aim of this introduction paper is to briefly present the challenges for Embedded system design surveyed by Red. 相似文献

11.

Active disks for large-scale data processing

Riedel E. Faloutsos C. Gibson G.A. Nagle D. 《Computer》2001,34(6):68-74

As processor performance increases and memory cost decreases, system intelligence continues to move away from the CPU and into peripherals. Storage system designers use this trend toward excess computing power to perform more complex processing and optimizations inside storage devices. To date, such optimizations take place at relatively low levels of the storage protocol. Trends in storage density, mechanics, and electronics eliminate the hardware bottleneck and put pressure on interconnects and hosts to move data more efficiently. We propose using an active disk storage device that combines on-drive processing and memory with software downloadability to allow disks to execute application-level functions directly at the device. Moving portions of an application's processing to a storage device significantly reduces data traffic and leverages the parallelism already present in large systems, dramatically reducing the execution time for many basic data mining tasks 相似文献

12.

Conservative distributed discrete-event simulation on the Amazon EC2 cloud: An evaluation of time synchronization protocol performance and cost efficiency

《Simulation Modelling Practice and Theory》2013

Distributed execution of simulation models comes into play when memory limitations of a single computational resource prohibit their execution. In addition, the potential for parallel execution of a model on a distributed platform through the integration of multiple computational cores, can potentially reduce the execution time of a simulation. However, such gains can be voided by the overhead that time synchronization protocols for parallel and distributed simulation induce. This overhead is determined by the protocol used, the characteristics of the simulation model, as well as the architectural and performance characteristics of the hardware platform used. Recently, Infrastructure-as-a-Service offerings in the cloud computing domain have introduced flexibility in acquiring access to virtualized hardware platforms on a pay-as-you-go basis. At present, it is however unclear to what extent these offerings are suited for the distributed execution of discrete-event simulations, and how the characteristics of different resource types impact the performance of distributed simulation under different time synchronization protocols. Likewise, it is unclear which type of resources are most cost-efficient for this type of workload. To our knowledge, this paper is the first to investigate these aspects through an assessment of the performance and cost efficiency of different conservative time synchronization protocols on a range of cloud resource types that are currently available on Amazon EC2. Our analysis shows that performance levels comparable to those realized on commodity hardware based-clusters are attainable, and that the relative performance of different synchronization protocols is retained on high-end IaaS resources. In terms of cost-efficiency, we find that IaaS products tailored to traditional cluster workloads do not necessarily constitute the optimal choice, and we assess the impact of different packing configurations for logical processes in this regard. 相似文献

13.

RPM: a rapid prototyping engine for multiprocessor systems

Barroso L.A. Iman S. Dubois M. Ramamurthy K. 《Computer》1995,28(2):26-34

RPM enables rapid prototyping of different multiprocessor architectures. It uses hardware emulation for reliable design verification and performance evaluation. The major objective of the RPM project is to develop a common, configurable hardware platform to accurately emulate different MIMD systems with up to eight execution processors. Because emulation is orders of magnitude faster than simulation, an emulator can run problems with large data sets more representative of the workloads for which the target machine is designed. Because an emulation is closer to the target implementation than an abstracted simulation, it can accomplish more reliable performance evaluation and design verification. Finally, an emulator is a real computer with its own I/O; the code running on the emulator is not instrumented. As a result, the emulator looks exactly like the target machine (to the programmer) and can run several different workloads, including code from production compilers, operating systems, databases, and software utilities 相似文献

14.

Helper threads via virtual multithreading

《Micro, IEEE》2004,24(6):74-82

Memory latency dominates the performance of many applications on modern processors, despite advances in caches and prefetching techniques. Numerous prefetching techniques, both in hardware and software, try to alleviate the memory bottleneck. One such technique, known as helper threading improves single-thread performance on a simultaneous multithreaded architecture (SMT), which shares processor resources, including caches, among logical threads. It uses otherwise idle hardware thread contexts to execute speculative threads on behalf of the main thread. Helper threading accelerates a program by exploiting a processor's multithreading capability to run assist threads. Based on the helper threading usage model, virtual multithreading (VMT), a form of switch-on-event user-level multithreading, can improve performance for real-world workloads with a wall-clock speedup of 5.0 to 38.5 percent 相似文献

15.

Page Migration with Dynamic Space-Sharing Scheduling Policies: The Case of the SGI O2000

Julita Corbalan Xavier Martorell Jesus Labarta 《International journal of parallel programming》2004,32(4):263-288

In this paper, we claim that memory migration mechanism is a useful approach to improve the execution of parallel applications in dynamic execution environments, but that their performance depends on related system components such as the processor scheduling. To show that, we evaluate the automatic memory migration mechanism provided by IRIX in Origin systems, under different dynamic processor allocation policies when executing OpenMP parallel multiprogrammed workloads. We have focused the evaluation on the effects of the page migration mechanism on the CPU time consumed by each application, the processor allocation received, and the speedup. Results demonstrate that, if the processor scheduler is memory conscious, that is, it maintains as much as possible the system stable, the automatic memory page migration mechanism provided by IRIX improves the CPU time consumed by OpenMP applications. 相似文献

16.

嵌入式TCP／IP网络协议栈架构与效率研究

葛茜倩《计算机与现代化》2011,(12):17-19,26

嵌入式Internet技术关键问题在于如何在MCU中实现TCP／IP通信协议。现有的嵌入式网络协议研究多数停留在对协议的实现和协议实现的软件优化上,很多研究在内存管理上研究TCP／IP协议存储管理、设计等方面做了一些分析和实验,目前几乎没有在系统架构层面,从硬件和软件协作处理角度横向、纵向地比较研究。本文从硬件执行速度。软件系统内部结构分析,实现并比较嵌入式TCP/IP在不同应用中实现的不同技术,最终提出嵌入式TCP／IP协议栈系统在工业控制系统上最优化的方案。相似文献

17.

便携式无线电综合测试系统设计

下载免费PDF全文

柳颖宗长龙周婷《计算机测量与控制》2022,30(2):103-109

无线通信设备装备数量大、种类多,开机时间长,因此需定期维护;针对修理过程中,性能指标的测试项目多,维修人员任务重的问题研制便携式无线电综合测试系统;首先进行便携式无线电综合测试系统的总体设计;其次介绍综合测试系统的软硬件平台实现方法,包括测试资源硬件参数配置及选型、测试执行软件平台设计技术、通信装备自动测试方案和实现技术;最后对便携式无线电综合测试系统的试验结果进行分析;应用结果表明:该系统能实现无线通信设备收发信机整机主要参数的测量,操作流程简单,大大降低了性能指标测试环节对操作人员的专业水平要求;测试的结果数据由计算机进行有效的统计、处理、分析和查询,提高了测试效率。相似文献

18.

Database workload management through CBR and fuzzy based characterization

《Applied Soft Computing》2014

Database Management System (DBMS) is used as a data source with financial, educational, web and other applications from last many years. Users are connected with the DBMS to update existing records and retrieving reports by executing workloads that consist of complex queries. In order to get the sufficient level of performance, arrangement of workloads is necessary. Rapid growth in data, maximum functionality and changing behavior tends the database workload to be more complex and tricky. Each DBMS experiences complex workloads that are difficult to manage by the humans; human experts take much time to manage database workload efficiently; even in some cases it may become impossible and leads toward malnourishment. This problem leads database practitioners, vendors and researchers toward new challenges. To achieve a satisfactory level of performance, either Database Administrator (DBA) or DBMSs must have the knowledge about the workload shifts. Efficient execution and resource allocation of workload is dependent on the workload type that may be either On Line Transaction Processing (OLTP) or Decision Support System (DSS). The research introduces a way to manage the workload in DBMSs on the basis of the workload type. The main goal of the research is to manage the workload in DBMSs through characterization, scheduler and idleness detection modules. The database workload management is performed by using the case based reasoning characterization; Fuzzy logic based scheduling and finally detection of CPU Idleness. Results are validated through experiments that are performed on real time and benchmark workload to reveal effectiveness and efficiency. 相似文献

19.

Detailed and clock-driven simulation for HPC interconnection network

Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG 《Frontiers of Computer Science》2016,10(5):797-811

Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses. 相似文献

20.

Absolutely positively on time: what would it take? [embedded computing systems]

Lee E.A. 《Computer》2005,38(7):85-87

Despite considerable progress in software and hardware techniques, many recent computing advances do more harm than good when embedded computing systems absolutely must meet tight timing constraints. For example, while synchronous digital logic delivers precise timing determinacy, advances in computer architecture and software have made it difficult or impossible to estimate or predict software's execution time. Moreover, networking techniques introduce variability and stochastic behavior, while operating systems rely on best-effort techniques. Worse, programming language semantics do not handle time well, so developers can only specify timing requirements indirectly. Thus, achieving precise timeliness in a networked embedded system - an absolutely essential goal - requires sweeping changes. For embedded computing to realize its full potential, we must reinvent computer science. Resource limitations have influenced embedded software's evolution. Embedded software differs from other software in more fundamental ways. 相似文献